What is a confusion matrix?

Types of classification outputs

Positive and negative outputs

In a classification problem, there are 2 types of labels, positive and negative.

Positive outputs are labels with a particular characteristic that we are interested in. All the other labels are negative.
For example, in a dataset of cancer diagnoses, the positive outputs are precisely the carcinogenic ones, the ones we are interested in.

True and false predictions

We can identify 4 types of predictions:

  • True positives (TP): correct positive predictions
  • False positives (FP): incorrect positive predictions
  • True negatives (TN): correct negative predictions
  • False negatives (FN): incorrect negative predictions

What is a confusion matrix?

A confusion matrix is a simple table we can create from the predictions of a classification problem during the validation phase.

The columns represent the positive and negative predicted values, while the rows represent the actual values.

With this structure, we get a table where each cell corresponds to a prediction type we have seen before.

We can assess the efficiency of our model by looking at the true values on the diagonal. So we can consider the confusion matrix a validation metric.

Confusion matrix in Python

from sklearn.metrics import confusion_matrix


#feature engineering and model development


confusion_matrix = confusion_matrix(val_y, model.predict(val_X))

The output looks something like this:

[[TP FN]
 [FP TN]]
Share the knowledge