How to Create a Correlation Matrix using Pandas

To create a correlation matrix using Pandas:

df.corr()

Next, you’ll see an example with the full steps to create a correlation matrix for a given dataset.

Steps to Create a Correlation Matrix using Pandas

Step 1: Collect the Data

Firstly, collect the data that will be used for the correlation matrix.

For example, let’s use the following data about 3 variables:

ABC
453810
373115
422617
352821
393312

Step 2: Create a DataFrame using Pandas

Next, create a DataFrame in order to capture the above dataset in Python:

import pandas as pd

data = {'A': [45, 37, 42, 35, 39],
        'B': [38, 31, 26, 28, 33],
        'C': [10, 15, 17, 21, 12]
        }

df = pd.DataFrame(data)

print(df)

Once you run the code, you’ll get the following DataFrame:

    A   B   C
0  45  38  10
1  37  31  15
2  42  26  17
3  35  28  21
4  39  33  12

Step 3: Create the Correlation Matrix using Pandas

Now, create the correlation matrix using this template:

df.corr()

The complete Python code to create the correlation matrix for our example:

import pandas as pd

data = {'A': [45, 37, 42, 35, 39],
        'B': [38, 31, 26, 28, 33],
        'C': [10, 15, 17, 21, 12]
        }

df = pd.DataFrame(data)

corr_matrix = df.corr()

print(corr_matrix)

Run the code in Python, and you’ll get the following matrix:

          A         B         C
A  1.000000  0.518457 -0.701886
B  0.518457  1.000000 -0.860941
C -0.701886 -0.860941  1.000000

Step 4 (optional): Get a Visual Representation of the Correlation Matrix using Seaborn and Matplotlib

You may use the seaborn and matplotlib packages in order to get a visual representation of the correlation matrix.

First, import the seaborn and matplotlib packages:

import seaborn as sn
import matplotlib.pyplot as plt

Then, add the following syntax at the bottom of the code:

sn.heatmap(corr_matrix, annot=True)
plt.show()

So the complete Python code would look like this:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {'A': [45, 37, 42, 35, 39],
        'B': [38, 31, 26, 28, 33],
        'C': [10, 15, 17, 21, 12]
        }

df = pd.DataFrame(data)

corr_matrix = df.corr()

sn.heatmap(corr_matrix, annot=True)
plt.show()

You may also want to review the following source that explains the steps to create a Confusion Matrix using Python. Alternatively, you may check this guide about creating a Covariance Matrix in Python.