How to Create a Covariance Matrix using Python

In this short guide, you’ll see how to create a Covariance Matrix using Python.

Steps to Create a Covariance Matrix

Step 1: Gather the Data

To start, you’ll need to gather the data that will be used for the covariance matrix.

For demonstration purposes, let’s use the following data about 3 variables:

ABC
453810
373115
422617
352821
393312

Step 2: Get the Population Covariance Matrix using Python

To get the population covariance matrix (based on N), you’ll need to set the bias to True in the code below.

This is the complete Python code to derive the population covariance matrix using the NumPy package:

import numpy as np

A = [45, 37, 42, 35, 39]
B = [38, 31, 26, 28, 33]
C = [10, 15, 17, 21, 12]

data = np.array([A, B, C])

cov_matrix = np.cov(data, bias=True)

print(cov_matrix)

Run the code, and you’ll get the following matrix:

[[ 12.64   7.68  -9.6 ]
 [  7.68  17.36 -13.8 ]
 [ -9.6  -13.8   14.8 ]]

Step 3: Get a Visual Representation of the Matrix

You can use the seaborn and matplotlib packages in order to visually represent the covariance matrix.

Here is the complete code that you can apply in Python:

import numpy as np
import seaborn as sn
import matplotlib.pyplot as plt

A = [45, 37, 42, 35, 39]
B = [38, 31, 26, 28, 33]
C = [10, 15, 17, 21, 12]

data = np.array([A, B, C])

cov_matrix = np.cov(data, bias=True)
sn.heatmap(cov_matrix, annot=True, fmt='g')
plt.show()

Derive the Sample Covariance Matrix

To get the sample covariance (based on N-1), you’ll need to set the bias to False in the code below.

Here is the code based on the NumPy package:

import numpy as np

A = [45, 37, 42, 35, 39]
B = [38, 31, 26, 28, 33]
C = [10, 15, 17, 21, 12]

data = np.array([A, B, C])

cov_matrix = np.cov(data, bias=False)

print(cov_matrix)

And this is the matrix that you’ll get:

[[ 15.8    9.6  -12.  ]
 [  9.6   21.7  -17.25]
 [-12.   -17.25  18.5 ]]

You can also use the Pandas package in order to get the sample covariance matrix.

You may then apply the following code using Pandas:

import pandas as pd

data = {'A': [45, 37, 42, 35, 39],
        'B': [38, 31, 26, 28, 33],
        'C': [10, 15, 17, 21, 12]
        }

df = pd.DataFrame(data)

cov_matrix = pd.DataFrame.cov(df)

print(cov_matrix)

You’ll get the same matrix as derived by NumPy:

      A      B      C
A  15.8   9.60 -12.00
B   9.6  21.70 -17.25
C -12.0 -17.25  18.50

Finally, you may visually represent the covariance matrix using the seaborn and matplotlib packages:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {'A': [45, 37, 42, 35, 39],
        'B': [38, 31, 26, 28, 33],
        'C': [10, 15, 17, 21, 12]
        }

df = pd.DataFrame(data)

cov_matrix = pd.DataFrame.cov(df)
sn.heatmap(cov_matrix, annot=True, fmt='g')
plt.show()

You may also want to check the following source that explains the full steps to create a Confusion Matrix using Python. Alternatively, you may check this guide for the steps to create a Correlation Matrix in Python.