How to Calculate Stats From an Imported CSV File using pandas

In this short tutorial, you will learn how to use pandas to calculate basic stats from an imported CSV file.

TLDR solution

calculate_stats.py
import pandas as pd

df = pd.read_csv("path-to-file/file-name.csv")

mean = df['numeric_column'].mean()
sum = df['numeric_column'].sum()
max = df['numeric_column'].max()
min = df['numeric_column'].min()
count = df['numeric_column'].count()
median = df['numeric_column'].median() 
std = df['numeric_column'].std() 
var = df['numeric_column'].var()  
groupby_sum = df.groupby(['dimension']).sum(numeric_only=True) 
groupby_count = df.groupby(['dimension']).count()

Step-by-Step Example

Step 1: Install the pandas Package

If you don't have pandas already installed, execute the following command in your terminal:

pip install pandas

Step 2: Have a CSV File Ready

Let's say, you have a CSV saved on your desktop with the following content:

Desktop/fish_size.csv
Fish,Species,Size_inch,Size_cm
salmon1,salmon,28,71
salmon2,salmon,30,76
pufferfish1,pufferfish,3,8
pufferfish2,pufferfish,4,10
shark1,shark,60,152
shark2,shark,70,178

Step 3: Create a Python Script

Let's create a Python script that imports the CSV file on your desktop, calculate some statistics and prints them to your terminal. In particular, we are interested in calculating the mean, sum, maximum, minimum, count, median, standard deviation, variation, grouped sum and grouped count:

calculate_stats.py
import os
import pandas as pd

desktop_path = os.path.expanduser("~/Desktop")

df = pd.read_csv(desktop_path + "/fish_size.csv")

mean = df['Size_cm'].mean()
sum = df['Size_cm'].sum()
max = df['Size_cm'].max()
min = df['Size_cm'].min()
count = df['Size_cm'].count()
median = df['Size_cm'].median() 
std = df['Size_cm'].std() 
var = df['Size_cm'].var()  
groupby_sum = df.groupby(['Species']).sum(numeric_only=True) 
groupby_count = df.groupby(['Species']).count()

print('mean size: ' + str(mean))
print('sum of size: ' + str(sum))
print('max size: ' + str(max))
print('min size: ' + str(min))
print('count of size: ' + str(count))
print('median size: ' + str(median))
print('std of size: ' + str(std))
print('var of size: ' + str(var))
print('grouped sum: ' + str(groupby_sum))
print('grouped count: ' + str(groupby_count))

Create a new file using a text editor of your choice, copy-paste the above Python code into it, and save it as calculate_stats.py on your desktop.

Verify that it works by navigating your terminal to your desktop and running the script:

cd Desktop
python calculate_stats.py

You should see the following output in your terminal:

mean size: 82.5
sum of size: 495
max size: 178
min size: 8
count of size: 6
median size: 73.5
std of size: 70.61373804012928
var of size: 4986.3
grouped sum:             Size_inch  Size_cm
Species                       
pufferfish          7       18
salmon             58      147
shark             130      330
grouped count:             Fish  Size_inch  Size_cm
Species                             
pufferfish     2          2        2
salmon         2          2        2
shark          2          2        2

That's it! You just learned how to calculate basic stats from a CSV file with Python and pandas.