How to Create a Correlation Matrix using pandas
In this tutorial, you will create a correlation matrix using pandas.
TLDR solution
df.corr()
Step-by-Step Example
Step 1: Install the pandas Package
If you don't have pandas already installed, execute the following command in your terminal:
pip install pandas
Step 2: Load the data into a DataFrame
Let's say, you have the following data on three variables:
| A | B | C |
|---|---|---|
| 114 | 23 | 5 |
| 93 | 19 | 8 |
| 78 | 21 | 9 |
| 84 | 18 | 11 |
Load it into a DataFrame:
import pandas as pd
data = {'A': [114, 93, 78, 84],
'B': [23, 19, 21, 18],
'C': [5, 8, 9, 11]
}
df = pd.DataFrame(data)
Step 3: Create the Correlation Matrix
Run the following code:
corr_matrix = df.corr()
print(corr_matrix)
You should see the following output:
A B C
A 1.000000 0.636869 -0.882206
B 0.636869 1.000000 -0.856876
C -0.882206 -0.856876 1.000000
Optional step: Get a Visual Representation of the Correlation Matrix
The following code uses the seaborn and Matplotlib packages to create a visual representation of the correlation matrix:
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
data = {'A': [114, 93, 78, 84],
'B': [23, 19, 21, 18],
'C': [5, 8, 9, 11]
}
df = pd.DataFrame(data)
corr_matrix = df.corr()
sn.heatmap(corr_matrix, annot=True)
plt.show()
The resulting plot:

That's it! You just learned how to generate a correlations matrix using pandas.