Use Pandas to Calculate Stats from an Imported CSV file

In this short guide, you’ll see how to use Pandas to calculate stats from an imported CSV file.

The Example

To demonstrate how to calculate stats from an imported CSV file, let’s review a simple example with the following dataset:

personsalarycountry
A40000USA
B32000Brazil
C45000Italy
D54000USA
E72000USA
F62000Brazil
G92000Italy
H55000USA
I35000Italy
J48000Brazil

Steps to Calculate Stats from an Imported CSV File

Step 1: Copy the Dataset into a CSV file

To begin, you’ll need to copy the above dataset into a CSV file. Then rename the CSV file as stats.

Step 2: Import the CSV File into Python

Next, you’ll need to import the CSV file into Python using this template:

import pandas as pd

df = pd.read_csv(r'Path where the CSV file is stored\File name.csv')

print(df)

Here is an example of a path where the CSV file is stored:

C:\Users\Ron\Desktop\stats.csv

So the complete code to import the stats CSV file is captured below (note that you’ll need to modify the path to reflect the location where the CSV file is stored on your computer):

import pandas as pd

df = pd.read_csv(r'C:\Users\Ron\Desktop\stats.csv')

print(df)

Once you run the code in Python (adjusted to your path), you’ll get the following DataFrame:

  person  salary  country
0      A   40000      USA
1      B   32000   Brazil
2      C   45000    Italy
3      D   54000      USA
4      E   72000      USA
5      F   62000   Brazil
6      G   92000    Italy
7      H   55000      USA
8      I   35000    Italy
9      J   48000   Brazil

Step 3: Use Pandas to Calculate Stats from an Imported CSV File

For the final step, the goal is to calculate the following statistics using the Pandas package:

  • Mean salary
  • Total sum of salaries
  • Maximum salary
  • Minimum salary
  • Count of salaries
  • Median salary
  • Standard deviation of salaries
  • Variance of of salaries

In addition, we’ll also do some grouping calculations:

  • Sum of salaries, grouped by the Country column
  • Count of salaries, grouped by the Country column

Once you’re ready, run the code (adjusted to your path) to calculate the stats from the imported CSV file using Pandas:

import pandas as pd

df = pd.read_csv(r'C:\Users\Ron\Desktop\stats.csv') 

# Simple stats
mean1 = df['salary'].mean()
sum1 = df['salary'].sum()
max1 = df['salary'].max()
min1 = df['salary'].min()
count1 = df['salary'].count()
median1 = df['salary'].median() 
std1 = df['salary'].std() 
var1 = df['salary'].var()  

# Group by
groupby_sum1 = df.groupby(['country']).sum() 
groupby_count1 = df.groupby(['country']).count()

# Display simple stats
print('mean salary: ' + str(mean1))
print('sum of salaries: ' + str(sum1))
print('max salary: ' + str(max1))
print('min salary: ' + str(min1))
print('count of salaries: ' + str(count1))
print('median salary: ' + str(median1))
print('std of salaries: ' + str(std1))
print('var of salaries: ' + str(var1))

# Display group by
print('sum of values, grouped by the country: ' + str(groupby_sum1))
print('count of values, grouped by the country: ' + str(groupby_count1))

After you run the code in Python, you’ll get the following results:

mean salary: 53500.0
sum of salaries: 535000
max salary: 92000
min salary: 32000
count of salaries: 10
median salary: 51000.0
std of salaries: 18222.391598128816
var of salaries: 332055555.5555556
sum of values, grouped by the country:
country        
Brazil   142000
Italy    172000
USA      221000
count of values, grouped by the country:
country                
Brazil        3       3
Italy         3       3
USA           4       4

You just saw how to calculate simple stats using Pandas. You may also want to check the Pandas Documentation to learn more about this library.