How to Create a Covariance Matrix using Python

Looking to create a Covariance Matrix using Python?

If so, I’ll show you how to create such a matrix using both numpy and pandas.

Steps to Create a Covariance Matrix using Python

Step 1: Gather the Data

To start, you’ll need to gather the data that will be used for the covariance matrix.

For example, I gathered the following data about 3 variables:

ABC
453810
373115
422617
352821
393312

Step 2: Get the Population Covariance Matrix using Python

To get the population covariance matrix (based on N), you’ll need to set the bias to True in the code below.

This is the complete Python code to derive the population covariance matrix using the numpy package:

import numpy as np

A = [45,37,42,35,39]
B = [38,31,26,28,33]
C = [10,15,17,21,12]

Data = np.array([A,B,C])

covMatrix = np.cov(Data,bias=True)
print (covMatrix)

Run the code, and you’ll get the following matrix:

Matrix

Step 3: Get a Visual Representation of the Matrix

You can use the seaborn package in order to visually represent the covariance matrix.

Here is the complete code that you can apply in Python:

import numpy as np
import seaborn as sn

A = [45,37,42,35,39]
B = [38,31,26,28,33]
C = [10,15,17,21,12]

Data = np.array([A,B,C])

covMatrix = np.cov(Data,bias=True)
sn.heatmap(covMatrix, annot=True, fmt='g')

Once you run the code, you’ll get the following matrix:

How to Create a Covariance Matrix using Python

Derive the Sample Covariance Matrix

To get the sample covariance (based on N-1), you’ll need to set the bias to False in the code below.

Here is the code based on the numpy package:

import numpy as np

A = [45,37,42,35,39]
B = [38,31,26,28,33]
C = [10,15,17,21,12]

Data = np.array([A,B,C])

covMatrix = np.cov(Data,bias=False)
print (covMatrix)

And this is the matrix that you’ll get:

Covariance Matrix

You can also use the pandas package in order to get the sample covariance matrix.

You may then apply the following code using pandas:

from pandas import DataFrame

Data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = DataFrame(Data,columns=['A','B','C'])

covMatrix = DataFrame.cov(df)
print (covMatrix)

You’ll get the same matrix as derived using numpy:

Pandas Python

Finally, you can visually represent the covariance matrix using the seaborn package:

from pandas import DataFrame
import seaborn as sn

Data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = DataFrame(Data,columns=['A','B','C'])

covMatrix = DataFrame.cov(df)
sn.heatmap(covMatrix, annot=True, fmt='g')

Run the code, and you’ll get the visual representation of the matrix:

Covariance Matrix using Python

You may also want to check the following source that explains the full steps to create a Confusion Matrix using Python.