How to Create a Correlation Matrix using Pandas

In this short guide, I’ll show you how to create a Correlation Matrix using Pandas. I’ll also review the steps to display the matrix using Seaborn.

To start, here is a template that you can apply in order to create a correlation matrix using pandas:

df.corr()

Next, I’ll show you an example with the steps to create a correlation matrix for a given dataset.

Steps to Create a Correlation Matrix using Pandas

Step 1: Collect the Data

Firstly, collect the data that will be used for the correlation matrix.

For example, I collected the following data for 3 variables:

ABC
453810
373115
422617
352821
393312

Step 2: Create a DataFrame using Pandas

Next, create a DataFrame in order to capture the above dataset in Python:

from pandas import DataFrame

Data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = DataFrame(Data,columns=['A','B','C'])
print (df)

Once you run the code, you’ll get the following DataFrame:

Pandas DataFrame

Step 3: Create a Correlation Matrix using Pandas

Now, create a correlation matrix using this template:

df.corr()

This is the complete Python code that you can use to create the correlation matrix for our example:

from pandas import DataFrame

Data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = DataFrame(Data,columns=['A','B','C'])

corrMatrix = df.corr()
print (corrMatrix)

Run the code in Python, and you’ll get the following matrix:

How to Create a Correlation Matrix using Pandas

Step 4 (optional): Get a Visual Representation of the Correlation Matrix using Seaborn

You can use the seaborn package to get a visual representation of the correlation matrix.

First import the seaborn package:

import seaborn as sn

Then, add the following syntax to the bottom of the code:

sn.heatmap(corrMatrix, annot=True)

So the complete Python code would look like this:

from pandas import DataFrame
import seaborn as sn

Data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = DataFrame(Data,columns=['A','B','C'])

corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)

Run the code, and you’ll get the following correlation matrix:

How to Create a Correlation Matrix using Pandas

That’s it! You may also want to review the following source that explains the steps to create a Confusion Matrix using Python.