How to Create a Correlation Matrix using pandas

In this tutorial, you will create a correlation matrix using pandas.

TLDR solution

df.corr()

Step-by-Step Example

Step 1: Install the pandas Package

If you don't have pandas already installed, execute the following command in your terminal:

pip install pandas

Step 2: Load the data into a DataFrame

Let's say, you have the following data on three variables:

ABC
114235
93198
78219
841811

Load it into a DataFrame:

import pandas as pd

data = {'A': [114, 93, 78, 84],
        'B': [23, 19, 21, 18],
        'C': [5, 8, 9, 11]
        }

df = pd.DataFrame(data)

Step 3: Create the Correlation Matrix

Run the following code:

corr_matrix = df.corr()

print(corr_matrix)

You should see the following output:

          A         B         C
A  1.000000  0.636869 -0.882206
B  0.636869  1.000000 -0.856876
C -0.882206 -0.856876  1.000000

Optional step: Get a Visual Representation of the Correlation Matrix

The following code uses the seaborn and Matplotlib packages to create a visual representation of the correlation matrix:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {'A': [114, 93, 78, 84],
        'B': [23, 19, 21, 18],
        'C': [5, 8, 9, 11]
        }

df = pd.DataFrame(data)

corr_matrix = df.corr()

sn.heatmap(corr_matrix, annot=True)
plt.show()

The resulting plot:

Correlation Matrix

That's it! You just learned how to generate a correlations matrix using pandas.