Example of Confusion Matrix in Python

In this short tutorial, you’ll see a full example of a Confusion Matrix in Python.

Topics to be reviewed:

• Creating a Confusion Matrix using Pandas
• Displaying the Confusion Matrix using Matplotlib and Seaborn
• Getting a classification report via scikit-learn
• Working with non-numeric data

Creating a Confusion Matrix in Python using Pandas

To start, here is the dataset to be used for the Confusion Matrix:

You can then capture this data in Python by creating a DataFrame:

`import pandas as pddata = {    "y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],    "y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],}df = pd.DataFrame(data)print(df)`

Run the code, and you’ll get the following DataFrame:

``````    y_actual  y_predicted
0          1            1
1          0            1
2          0            0
3          1            1
4          0            0
5          1            1
6          0            1
7          0            0
8          1            1
9          0            0
10         1            0
11         0            0``````

To create the Confusion Matrix using Pandas, you’ll need to use pd.crosstab as follows:

`import pandas as pddata = {    "y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],    "y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],}df = pd.DataFrame(data)confusion_matrix = pd.crosstab(    df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"])print(confusion_matrix)`

Run the code, and you’ll get the following matrix:

``````Predicted  0  1
Actual
0          5  2
1          1  4``````

Displaying the Confusion Matrix usingMatplotlib and Seaborn

You can use the Matplotlib and Seaborn packages in Python to get a more vivid display of the matrix.

First, install the Matplotlib package:

`pip install matplotlib`

Then, install the Seaborn package:

`pip install seaborn`

Finally, run the code below to display the matrix:

`import pandas as pdimport seaborn as snimport matplotlib.pyplot as pltdata = {    "y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],    "y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],}df = pd.DataFrame(data)confusion_matrix = pd.crosstab(    df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"])sn.heatmap(confusion_matrix, annot=True)plt.show()`

Optionally, you can also add the totals at the margins of the confusion matrix by setting margins=True.

So your Python code would look like this:

`import pandas as pdimport seaborn as snimport matplotlib.pyplot as pltdata = {    "y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],    "y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],}df = pd.DataFrame(data)confusion_matrix = pd.crosstab(    df["y_actual"],    df["y_predicted"],    rownames=["Actual"],    colnames=["Predicted"],    margins=True,)sn.heatmap(confusion_matrix, annot=True)plt.show()`

Getting classification report using scikit-learn

You can get a classification report (that includes the Precision and Accuracy) by installing the scikit-learn package:

`pip install scikit-learn`

Here is the complete code:

`import pandas as pdfrom sklearn.metrics import classification_reportdata = {    "y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],    "y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],}df = pd.DataFrame(data)report = classification_report(df["y_actual"], df["y_predicted"])print(report)`

Run the code, and you’ll get the following classification report:

``````              precision    recall  f1-score   support

0       0.83      0.71      0.77         7
1       0.67      0.80      0.73         5

accuracy                           0.75        12
macro avg       0.75      0.76      0.75        12
weighted avg       0.76      0.75      0.75        12``````

Working with non-numeric data

So far you have seen how to create a Confusion Matrix using numeric data. But what if your data is non-numeric?

For example, what if your data contained non-numeric values, such as ‘Yes‘ and ‘No‘ (rather than ‘1’ and ‘0’)?

In that case:

• Yes = 1
• No = 0

So the dataset would look like this:

You can then apply a simple mapping exercise to map ‘Yes’ to 1, and ‘No’ to 0:

`df["y_actual"] = df["y_actual"].map({"Yes": 1, "No": 0})df["y_predicted"] = df["y_predicted"].map({"Yes": 1, "No": 0})`

The complete code:

`import pandas as pddata = {    "y_actual": ["Yes", "No", "No", "Yes", "No", "Yes", "No", "No", "Yes", "No", "Yes", "No"],    "y_predicted": ["Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "No", "Yes", "No", "No", "No"]}df = pd.DataFrame(data)df["y_actual"] = df["y_actual"].map({"Yes": 1, "No": 0})df["y_predicted"] = df["y_predicted"].map({"Yes": 1, "No": 0})confusion_matrix = pd.crosstab(    df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"])print(confusion_matrix)`

You’ll then get the same matrix:

``````Predicted  0  1
Actual
0          5  2
1          1  4``````