Example of Confusion Matrix in Python

In this tutorial, I’ll show you a full example of a Confusion Matrix in Python.

Topics to be reviewed:

  • Creating a Confusion Matrix using pandas
  • Displaying the Confusion Matrix using seaborn 
  • Getting additional stats via pandas_ml
  • Working with non-numeric data

Creating a Confusion Matrix in Python using Pandas

To start, here is the dataset to be used for the Confusion Matrix in Python:

y_Actualy_Predicted
11
01
00
11
00
11
01
00
11
00
10
00

You can then capture this data in Python by creating pandas DataFrame using this code:

import pandas as pd

data = {'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
        'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
print (df)

This is how the data would look like once you run the code:

Python DataFrame

To create the Confusion Matrix using pandas, you’ll need to apply the pd.crosstab as follows:

confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])
print (confusion_matrix)

And here is the full Python code to create the Confusion Matrix:

import pandas as pd

data = {'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
        'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])

confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])
print (confusion_matrix)

Run the code and you’ll get the following matrix:

Confusion Matrix

Displaying the Confusion Matrix using seaborn

The matrix you just created in the previous section was rather basic.

You can use the seaborn package in Python to get a more vivid display of the matrix. To accomplish this task, you’ll need to add the following two components into the code:

  • import seaborn as sn
  • sn.heatmap(confusion_matrix, annot=True)

You’ll also need to use the matplotlib package to plot the results by adding:

  • import matplotlib.pyplot as plt
  • plt.show()

Putting everything together:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
        'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])

sn.heatmap(confusion_matrix, annot=True)
plt.show()

And here is the display that you’ll get:

Example of Confusion Matrix in Python

Much better!

Optionally, you can also add the totals at the margins of the confusion matrix by setting margins = True.

So your Python code would look like this:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
        'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'], margins = True)

sn.heatmap(confusion_matrix, annot=True)
plt.show()

Run the code and you’ll get the following Confusion Matrix with the totals:

Confusion Matrix using seaborn

Getting additional stats using pandas_ml

You may print additional stats (such as the Accuracy) using the pandas_ml package in Python. You can install the pandas_ml package by using PIP:

pip install pandas_ml

You’ll then need to add the following syntax into the code:

Confusion_Matrix = ConfusionMatrix(df['y_Actual'], df['y_Predicted'])
Confusion_Matrix.print_stats()

Here is the complete code that you can use to get the additional stats:

import pandas as pd
from pandas_ml import ConfusionMatrix

data = {'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
        'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
Confusion_Matrix = ConfusionMatrix(df['y_Actual'], df['y_Predicted'])
Confusion_Matrix.print_stats()

Run the code, and you’ll see the measurements below (note that if you’re getting an error when running the code, you may consider to change the version of pandas. For example, you may change the version of pandas to 0.23.4 using this command: pip install pandas==0.23.4):

Confusion Matrix in Python - pandas_ml

For our example:

  • TP = True Positives = 4
  • TN = True Negatives = 5
  • FP = False Positives = 2
  • FN = False Negatives = 1

You can also observe the TP, TN, FP and FN directly from the Confusion Matrix:

Matrix

For a population of 12 the Accuracy is:

Accuracy = (TP+TN)/population = (4+5)/12 = 0.75

Working with non-numeric data

So far you have seen how to create a Confusion Matrix using numeric data. But what if your data is non-numeric?

For example, what if your data contained non-numeric values, such as ‘Yes’ and ‘No’ (rather than ‘1’ and ‘0’)?

In this case:

  • Yes = 1
  • No = 0

So the dataset would look like this:

y_Actualy_Predicted
YesYes
NoYes
NoNo
YesYes
NoNo
YesYes
NoYes
NoNo
YesYes
NoNo
YesNo
NoNo

You can then apply a simple mapping exercise to map ‘Yes’ to 1, and ‘No’ to 0.

Specifically, you’ll need to add the following portion to the code:

df['y_Actual'] = df['y_Actual'].map({'Yes': 1, 'No': 0})
df['y_Predicted'] = df['y_Predicted'].map({'Yes': 1, 'No': 0})

And this is how the complete Python code would look like:

import pandas as pd
from pandas_ml import ConfusionMatrix

data = {'y_Actual':    ['Yes', 'No',  'No', 'Yes', 'No', 'Yes', 'No',  'No', 'Yes', 'No', 'Yes', 'No'],
        'y_Predicted': ['Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'No',  'No']    
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
df['y_Actual'] = df['y_Actual'].map({'Yes': 1, 'No': 0})
df['y_Predicted'] = df['y_Predicted'].map({'Yes': 1, 'No': 0})

Confusion_Matrix = ConfusionMatrix(df['y_Actual'], df['y_Predicted'])
Confusion_Matrix.print_stats()

You would then get the same stats:

Example of Confusion Matrix