Example of Confusion Matrix in Python

In this short tutorial, you’ll see a full example of a Confusion Matrix in Python.

Topics to be reviewed:

  • Creating a Confusion Matrix using Pandas
  • Displaying the Confusion Matrix using Matplotlib and Seaborn
  • Getting a classification report via scikit-learn
  • Working with non-numeric data

Creating a Confusion Matrix in Python using Pandas

To start, here is the dataset to be used for the Confusion Matrix:

y_actualy_predicted
11
01
00
11
00
11
01
00
11
00
10
00

You can then capture this data in Python by creating a DataFrame:

import pandas as pd

data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}

df = pd.DataFrame(data)

print(df)

Run the code, and you’ll get the following DataFrame:

    y_actual  y_predicted
0          1            1
1          0            1
2          0            0
3          1            1
4          0            0
5          1            1
6          0            1
7          0            0
8          1            1
9          0            0
10         1            0
11         0            0

To create the Confusion Matrix using Pandas, you’ll need to use pd.crosstab as follows:

import pandas as pd

data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}

df = pd.DataFrame(data)

confusion_matrix = pd.crosstab(
df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"]
)

print(confusion_matrix)

Run the code, and you’ll get the following matrix:

Predicted  0  1
Actual         
0          5  2
1          1  4

Displaying the Confusion Matrix using Matplotlib and Seaborn

You can use the Matplotlib and Seaborn packages in Python to get a more vivid display of the matrix.

First, install the Matplotlib package:

pip install matplotlib

Then, install the Seaborn package:

pip install seaborn

Finally, run the code below to display the matrix:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}

df = pd.DataFrame(data)
confusion_matrix = pd.crosstab(
df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"]
)

sn.heatmap(confusion_matrix, annot=True)
plt.show()

Optionally, you can also add the totals at the margins of the confusion matrix by setting margins=True.

So your Python code would look like this:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}

df = pd.DataFrame(data)

confusion_matrix = pd.crosstab(
df["y_actual"],
df["y_predicted"],
rownames=["Actual"],
colnames=["Predicted"],
margins=True,
)

sn.heatmap(confusion_matrix, annot=True)
plt.show()

Getting classification report using scikit-learn

You can get a classification report (that includes the Precision and Accuracy) by installing the scikit-learn package:

pip install scikit-learn

Here is the complete code:

import pandas as pd
from sklearn.metrics import classification_report

data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}

df = pd.DataFrame(data)

report = classification_report(df["y_actual"], df["y_predicted"])

print(report)

Run the code, and you’ll get the following classification report:

              precision    recall  f1-score   support

           0       0.83      0.71      0.77         7
           1       0.67      0.80      0.73         5

    accuracy                           0.75        12
   macro avg       0.75      0.76      0.75        12
weighted avg       0.76      0.75      0.75        12

Working with non-numeric data

So far you have seen how to create a Confusion Matrix using numeric data. But what if your data is non-numeric?

For example, what if your data contained non-numeric values, such as ‘Yes‘ and ‘No‘ (rather than ‘1’ and ‘0’)?

In that case:

  • Yes = 1
  • No = 0

So the dataset would look like this:

y_actualy_predicted
YesYes
NoYes
NoNo
YesYes
NoNo
YesYes
NoYes
NoNo
YesYes
NoNo
YesNo
NoNo

You can then apply a simple mapping exercise to map ‘Yes’ to 1, and ‘No’ to 0:

df["y_actual"] = df["y_actual"].map({"Yes": 1, "No": 0})
df["y_predicted"] = df["y_predicted"].map({"Yes": 1, "No": 0})

The complete code:

import pandas as pd

data = {
"y_actual": ["Yes", "No", "No", "Yes", "No", "Yes", "No", "No", "Yes", "No", "Yes", "No"],
"y_predicted": ["Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "No", "Yes", "No", "No", "No"]
}

df = pd.DataFrame(data)

df["y_actual"] = df["y_actual"].map({"Yes": 1, "No": 0})
df["y_predicted"] = df["y_predicted"].map({"Yes": 1, "No": 0})

confusion_matrix = pd.crosstab(
df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"]
)

print(confusion_matrix)

You’ll then get the same matrix:

Predicted  0  1
Actual         
0          5  2
1          1  4