To create a correlation matrix using Pandas:
df.corr()
Steps to Create a Correlation Matrix
Step 1: Collect the Data
Firstly, collect the data for the correlation matrix.
For example, here is a dataset that contains 3 variables:
A | B | C |
45 | 38 | 10 |
37 | 31 | 15 |
42 | 26 | 17 |
35 | 28 | 21 |
39 | 33 | 12 |
Step 2: Create a DataFrame using Pandas
Next, create a DataFrame in order to capture the above dataset in Python:
import pandas as pd data = {'A': [45, 37, 42, 35, 39], 'B': [38, 31, 26, 28, 33], 'C': [10, 15, 17, 21, 12] } df = pd.DataFrame(data) print(df)
Once you run the code, you’ll get the following DataFrame:
A B C
0 45 38 10
1 37 31 15
2 42 26 17
3 35 28 21
4 39 33 12
Step 3: Create the Correlation Matrix using Pandas
Now, create the correlation matrix using this template:
df.corr()
The complete code:
import pandas as pd data = {'A': [45, 37, 42, 35, 39], 'B': [38, 31, 26, 28, 33], 'C': [10, 15, 17, 21, 12] } df = pd.DataFrame(data) corr_matrix = df.corr() print(corr_matrix)
Run the code in Python, and you’ll get the following matrix:
A B C
A 1.000000 0.518457 -0.701886
B 0.518457 1.000000 -0.860941
C -0.701886 -0.860941 1.000000
Step 4 (optional): Get a Visual Representation of the Correlation Matrix using Seaborn and Matplotlib
You may use the seaborn and matplotlib packages in order to get a visual representation of the correlation matrix.
First, import the seaborn and matplotlib packages:
import seaborn as sn import matplotlib.pyplot as plt
Then, add the following syntax at the bottom of the code:
sn.heatmap(corr_matrix, annot=True)
plt.show()
So the complete Python code would look like this:
import pandas as pd import seaborn as sn import matplotlib.pyplot as plt data = {'A': [45, 37, 42, 35, 39], 'B': [38, 31, 26, 28, 33], 'C': [10, 15, 17, 21, 12] } df = pd.DataFrame(data) corr_matrix = df.corr() sn.heatmap(corr_matrix, annot=True) plt.show()
You may also want to review the following source that explains the steps to create a Confusion Matrix using Python. Alternatively, you may check this guide about creating a Covariance Matrix in Python.