Example of Confusion Matrix in Python

In this tutorial, I’ll show you a full example of a Confusion Matrix in Python.

Topics to be reviewed:

  • Creating a Confusion Matrix using pandas
  • Displaying the Confusion Matrix using seaborn 
  • Getting additional stats via pandas_ml
  • Working with non-numerical data

Creating a Confusion Matrix in Python using Pandas

To start, here is the data-set that we’ll use to create the Confusion Matrix in Python:

 

y_Predictedy_Actual
11
10
00
11
00
11
10
00
11
00
01
00

 

You can then capture this data in Python by creating pandas DataFrame using this code:

 

import pandas as pd

data = {'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
        'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
print (df)

 

This is how the data would look like once you run the code:

 

Python DataFrame

 

To create the Confusion Matrix using pandas, you’ll need to apply the pd.crosstab as follows:

 

confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])
print (confusion_matrix)

 

And here is the full Python code to create the Confusion Matrix:

 

import pandas as pd

data = {'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
        'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])

confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])
print (confusion_matrix)

 

Run the code and you’ll get the following matrix:

 

Confusion Matrix

Displaying the Confusion Matrix using seaborn

The matrix you created in the previous section was rather basic.

You can use the seaborn package in Python to get a more vivid display of the matrix. To accomplish this task, you’ll need to add the following two components into the code:

  • import seaborn as sn
  • sn.heatmap(confusion_matrix, annot=True)

Putting everything together:

 

import pandas as pd
import seaborn as sn

data = {'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
        'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])

sn.heatmap(confusion_matrix, annot=True)

 

And here is the display that you’ll get:

 

Example of Confusion Matrix in Python

 

Much better!

Optionally, you can also add the totals at the margins of the confusion matrix by setting margins = True.

So your Python code would look like this:

 

import pandas as pd
import seaborn as sn

data = {'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
        'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'], margins = True)

sn.heatmap(confusion_matrix, annot=True)

 

Run the code and you’ll get the following Confusion Matrix with the totals:

 

Confusion Matrix using seaborn

 

Getting additional stats using pandas_ml

You may print additional stats (such as the Accuracy) using the pandas_ml package in Python. You can install the pandas_ml package by using PIP.

You’ll then need to add the following syntax into the code:

 

Confusion_Matrix = ConfusionMatrix(df['y_Actual'], df['y_Predicted'])
Confusion_Matrix.print_stats()

 

Here is the complete code that you can use to get the additional stats:

 

import pandas as pd
from pandas_ml import ConfusionMatrix

data = {'y_Predicted': [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
        'y_Actual':    [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
Confusion_Matrix = ConfusionMatrix(df['y_Actual'], df['y_Predicted'])
Confusion_Matrix.print_stats()

 

Run the code, and you’ll see the following measurements:

 

Confusion Matrix in Python - pandas_ml

 

For our example:

  • TP = True Positives = 4
  • TN = True Negatives = 5
  • FP = False Positives = 2
  • FN = False Negatives = 1

You can also observe the TP, TN, FP and FN directly from the Confusion Matrix:

 

Example of Confusion Matrix in Python

 

For a population of 12 the Accuracy is:

Accuracy = (TP+TN)/population = (4+5)/12 = 0.75

Working with non-numerical data

So far you have seen how to create a Confusion Matrix using numerical data. But what if your data is non-numerical?

For example, what if your data contained non-numerical values, such as ‘Yes’ and ‘No’ (rather than ‘1’ and ‘0’)?

In this case:

  • Yes = 1
  • No = 0

So that the data-set would look like this:

 

y_Predictedy_Actual
YesYes
YesNo
NoNo
YesYes
NoNo
YesYes
YesNo
NoNo
YesYes
NoNo
NoYes
NoNo

 

You can then apply a simple mapping exercise to map ‘Yes’ to 1, and ‘No’ to 0.

Specifically, you’ll need to add the following portion to the code:

 

df['y_Predicted'] = df['y_Predicted'].map({'Yes': 1, 'No': 0})
df['y_Actual'] = df['y_Actual'].map({'Yes': 1, 'No': 0})

 

And this is how the complete Python code would look like:

 

import pandas as pd
from pandas_ml import ConfusionMatrix

data = {'y_Predicted': ['Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'No',  'No'],
        'y_Actual':    ['Yes', 'No',  'No', 'Yes', 'No', 'Yes', 'No',  'No', 'Yes', 'No', 'Yes', 'No']
        }

df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
df['y_Predicted'] = df['y_Predicted'].map({'Yes': 1, 'No': 0})
df['y_Actual'] = df['y_Actual'].map({'Yes': 1, 'No': 0})

Confusion_Matrix = ConfusionMatrix(df['y_Actual'], df['y_Predicted'])
Confusion_Matrix.print_stats()

 

You would then get the same stats:

 

Example of Confusion Matrix