Looking to create a Covariance Matrix using Python?

If so, I’ll show you how to create such a matrix using both numpy and pandas.

## Steps to Create a Covariance Matrix using Python

### Step 1: Gather the Data

To start, you’ll need to gather the data that will be used for the covariance matrix.

For example, I gathered the following data about 3 variables:

A |
B |
C |

45 | 38 | 10 |

37 | 31 | 15 |

42 | 26 | 17 |

35 | 28 | 21 |

39 | 33 | 12 |

### Step 2: Get the Population Covariance Matrix using Python

To get the population covariance matrix (based on N), you’ll need to set the bias to *True *in the code below.

This is the complete Python code to derive the population covariance matrix using the numpy package:

import numpy as np A = [45,37,42,35,39] B = [38,31,26,28,33] C = [10,15,17,21,12] data = np.array([A,B,C]) covMatrix = np.cov(data,bias=True) print (covMatrix)

Run the code, and you’ll get the following matrix:

### Step 3: Get a Visual Representation of the Matrix

You can use the seaborn and matplotlib packages in order to visually represent the covariance matrix.

Here is the complete code that you can apply in Python:

import numpy as np import seaborn as sn import matplotlib.pyplot as plt A = [45,37,42,35,39] B = [38,31,26,28,33] C = [10,15,17,21,12] data = np.array([A,B,C]) covMatrix = np.cov(data,bias=True) sn.heatmap(covMatrix, annot=True, fmt='g') plt.show()

Once you run the code, you’ll get the following matrix:

## Derive the Sample Covariance Matrix

To get the sample covariance (based on N-1), you’ll need to set the bias to *False *in the code below.

Here is the code based on the numpy package:

import numpy as np A = [45,37,42,35,39] B = [38,31,26,28,33] C = [10,15,17,21,12] data = np.array([A,B,C]) covMatrix = np.cov(data,bias=False) print (covMatrix)

And this is the matrix that you’ll get:

You can also use the pandas package in order to get the sample covariance matrix.

You may then apply the following code using pandas:

import pandas as pd data = {'A': [45,37,42,35,39], 'B': [38,31,26,28,33], 'C': [10,15,17,21,12] } df = pd.DataFrame(data,columns=['A','B','C']) covMatrix = pd.DataFrame.cov(df) print (covMatrix)

You’ll get the same matrix as derived by numpy:

Finally, you can visually represent the covariance matrix using the seaborn and matplotlib packages:

import pandas as pd import seaborn as sn import matplotlib.pyplot as plt data = {'A': [45,37,42,35,39], 'B': [38,31,26,28,33], 'C': [10,15,17,21,12] } df = pd.DataFrame(data,columns=['A','B','C']) covMatrix = pd.DataFrame.cov(df) sn.heatmap(covMatrix, annot=True, fmt='g') plt.show()

Run the code, and you’ll get the visual representation of the matrix:

You may also want to check the following source that explains the full steps to create a Confusion Matrix using Python. Alternatively, you may check this guide for the steps to create a Correlation Matrix in Python.