In this short guide, you’ll see how to create a Covariance Matrix using Python.
Steps to Create a Covariance Matrix
Step 1: Gather the Data
To start, you’ll need to gather the data that will be used for the covariance matrix.
For demonstration purposes, let’s use the following data about 3 variables:
A | B | C |
45 | 38 | 10 |
37 | 31 | 15 |
42 | 26 | 17 |
35 | 28 | 21 |
39 | 33 | 12 |
Step 2: Get the Population Covariance Matrix using Python
To get the population covariance matrix (based on N), you’ll need to set the bias to True in the code below.
This is the complete Python code to derive the population covariance matrix using the NumPy package:
import numpy as np A = [45, 37, 42, 35, 39] B = [38, 31, 26, 28, 33] C = [10, 15, 17, 21, 12] data = np.array([A, B, C]) cov_matrix = np.cov(data, bias=True) print(cov_matrix)
Run the code, and you’ll get the following matrix:
[[ 12.64 7.68 -9.6 ]
[ 7.68 17.36 -13.8 ]
[ -9.6 -13.8 14.8 ]]
Step 3: Get a Visual Representation of the Matrix
You can use the seaborn and matplotlib packages in order to visually represent the covariance matrix.
Here is the complete code that you can apply in Python:
import numpy as np import seaborn as sn import matplotlib.pyplot as plt A = [45, 37, 42, 35, 39] B = [38, 31, 26, 28, 33] C = [10, 15, 17, 21, 12] data = np.array([A, B, C]) cov_matrix = np.cov(data, bias=True) sn.heatmap(cov_matrix, annot=True, fmt='g') plt.show()
Derive the Sample Covariance Matrix
To get the sample covariance (based on N-1), you’ll need to set the bias to False in the code below.
Here is the code based on the NumPy package:
import numpy as np A = [45, 37, 42, 35, 39] B = [38, 31, 26, 28, 33] C = [10, 15, 17, 21, 12] data = np.array([A, B, C]) cov_matrix = np.cov(data, bias=False) print(cov_matrix)
And this is the matrix that you’ll get:
[[ 15.8 9.6 -12. ]
[ 9.6 21.7 -17.25]
[-12. -17.25 18.5 ]]
You can also use the Pandas package in order to get the sample covariance matrix.
You may then apply the following code using Pandas:
import pandas as pd data = {'A': [45, 37, 42, 35, 39], 'B': [38, 31, 26, 28, 33], 'C': [10, 15, 17, 21, 12] } df = pd.DataFrame(data) cov_matrix = pd.DataFrame.cov(df) print(cov_matrix)
You’ll get the same matrix as derived by NumPy:
A B C
A 15.8 9.60 -12.00
B 9.6 21.70 -17.25
C -12.0 -17.25 18.50
Finally, you may visually represent the covariance matrix using the seaborn and matplotlib packages:
import pandas as pd import seaborn as sn import matplotlib.pyplot as plt data = {'A': [45, 37, 42, 35, 39], 'B': [38, 31, 26, 28, 33], 'C': [10, 15, 17, 21, 12] } df = pd.DataFrame(data) cov_matrix = pd.DataFrame.cov(df) sn.heatmap(cov_matrix, annot=True, fmt='g') plt.show()
You may also want to check the following source that explains the full steps to create a Confusion Matrix using Python. Alternatively, you may check this guide for the steps to create a Correlation Matrix in Python.