Table of Contents
Types of classification outputs
Positive and negative outputs
In a classification problem, there are 2 types of categories, positive and negative.
Positive categories are labels with a particular characteristic that we are interested in. All the other categories are negative.
For example, in a dataset of cancer diagnoses, the positive outputs are the cancerogenic ones, the ones we are interested in.
True and false predictions
We can identify 4 types of predictions:
- True positives (TP): correct positive predictions.
- False positives (FP): incorrect positive predictions.
- True negatives (TN): correct negative predictions.
- False negatives (FN): incorrect negative predictions.
What is a confusion matrix?
A confusion matrix is a simple table we can create from the predictions of a classification problem during the validation phase.
The columns represent the positive and negative predicted values, while the rows represent the actual values.
With this structure, we get a table where each cell corresponds to a prediction type we have seen before.
We can assess the efficiency of our model by looking at the true values on the diagonal. So we can consider the confusion matrix a validation metric.
Confusion matrix in Python
from sklearn.metrics import confusion_matrix
#feature engineering and model development
confusion_matrix = confusion_matrix(val_y, model.predict(val_X))
The output looks something like this:
[[TP FN]
[FP TN]]