In this short tutorial, you’ll see a full example of a Confusion Matrix in Python.
Topics to be reviewed:
- Creating a Confusion Matrix using Pandas
- Displaying the Confusion Matrix using Matplotlib and Seaborn
- Getting a classification report via scikit-learn
- Working with non-numeric data
Creating a Confusion Matrix in Python using Pandas
To start, here is the dataset to be used for the Confusion Matrix:
y_actual | y_predicted |
1 | 1 |
0 | 1 |
0 | 0 |
1 | 1 |
0 | 0 |
1 | 1 |
0 | 1 |
0 | 0 |
1 | 1 |
0 | 0 |
1 | 0 |
0 | 0 |
You can then capture this data in Python by creating a DataFrame:
import pandas as pd
data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}
df = pd.DataFrame(data)
print(df)
Run the code, and you’ll get the following DataFrame:
y_actual y_predicted
0 1 1
1 0 1
2 0 0
3 1 1
4 0 0
5 1 1
6 0 1
7 0 0
8 1 1
9 0 0
10 1 0
11 0 0
To create the Confusion Matrix using Pandas, you’ll need to use pd.crosstab as follows:
import pandas as pd
data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}
df = pd.DataFrame(data)
confusion_matrix = pd.crosstab(
df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"]
)
print(confusion_matrix)
Run the code, and you’ll get the following matrix:
Predicted 0 1
Actual
0 5 2
1 1 4
Displaying the Confusion Matrix using Matplotlib and Seaborn
You can use the Matplotlib and Seaborn packages in Python to get a more vivid display of the matrix.
First, install the Matplotlib package:
pip install matplotlib
Then, install the Seaborn package:
pip install seaborn
Finally, run the code below to display the matrix:
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}
df = pd.DataFrame(data)
confusion_matrix = pd.crosstab(
df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"]
)
sn.heatmap(confusion_matrix, annot=True)
plt.show()
Optionally, you can also add the totals at the margins of the confusion matrix by setting margins=True.
So your Python code would look like this:
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}
df = pd.DataFrame(data)
confusion_matrix = pd.crosstab(
df["y_actual"],
df["y_predicted"],
rownames=["Actual"],
colnames=["Predicted"],
margins=True,
)
sn.heatmap(confusion_matrix, annot=True)
plt.show()
Getting classification report using scikit-learn
You can get a classification report (that includes the Precision and Accuracy) by installing the scikit-learn package:
pip install scikit-learn
Here is the complete code:
import pandas as pd
from sklearn.metrics import classification_report
data = {
"y_actual": [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
"y_predicted": [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
}
df = pd.DataFrame(data)
report = classification_report(df["y_actual"], df["y_predicted"])
print(report)
Run the code, and you’ll get the following classification report:
precision recall f1-score support
0 0.83 0.71 0.77 7
1 0.67 0.80 0.73 5
accuracy 0.75 12
macro avg 0.75 0.76 0.75 12
weighted avg 0.76 0.75 0.75 12
Working with non-numeric data
So far you have seen how to create a Confusion Matrix using numeric data. But what if your data is non-numeric?
For example, what if your data contained non-numeric values, such as ‘Yes‘ and ‘No‘ (rather than ‘1’ and ‘0’)?
In that case:
- Yes = 1
- No = 0
So the dataset would look like this:
y_actual | y_predicted |
Yes | Yes |
No | Yes |
No | No |
Yes | Yes |
No | No |
Yes | Yes |
No | Yes |
No | No |
Yes | Yes |
No | No |
Yes | No |
No | No |
You can then apply a simple mapping exercise to map ‘Yes’ to 1, and ‘No’ to 0:
df["y_actual"] = df["y_actual"].map({"Yes": 1, "No": 0})
df["y_predicted"] = df["y_predicted"].map({"Yes": 1, "No": 0})
The complete code:
import pandas as pd
data = {
"y_actual": ["Yes", "No", "No", "Yes", "No", "Yes", "No", "No", "Yes", "No", "Yes", "No"],
"y_predicted": ["Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "No", "Yes", "No", "No", "No"]
}
df = pd.DataFrame(data)
df["y_actual"] = df["y_actual"].map({"Yes": 1, "No": 0})
df["y_predicted"] = df["y_predicted"].map({"Yes": 1, "No": 0})
confusion_matrix = pd.crosstab(
df["y_actual"], df["y_predicted"], rownames=["Actual"], colnames=["Predicted"]
)
print(confusion_matrix)
You’ll then get the same matrix:
Predicted 0 1
Actual
0 5 2
1 1 4