What is a confusion matrix?

Types of classification outputs

Positive and negative outputs

In a classification problem, there are 2 types of categories, positive and negative.

Positive categories are labels with a particular characteristic that we are interested in. All the other categories are negative.

For example, in a dataset of cancer diagnoses, the positive outputs are the cancerogenic ones, the ones we are interested in.

True and false predictions

We can identify 4 types of predictions:

  • True positives (TP): correct positive predictions.
  • False positives (FP): incorrect positive predictions.
  • True negatives (TN): correct negative predictions.
  • False negatives (FN): incorrect negative predictions.

What is a confusion matrix?

A confusion matrix is a simple table we can create from the predictions of a classification problem during the validation phase.

The columns represent the positive and negative predicted values, while the rows represent the actual values.

With this structure, we get a table where each cell corresponds to a prediction type we have seen before.

We can assess the efficiency of our model by looking at the true values on the diagonal. So we can consider the confusion matrix a validation metric.

Confusion matrix in Python

from sklearn.metrics import confusion_matrix


#feature engineering and model development


confusion_matrix = confusion_matrix(val_y, model.predict(val_X))

The output looks something like this:

[[TP FN]
 [FP TN]]
Share the knowledge