How to Calculate Stats From an Imported CSV File using pandas
In this short tutorial, you will learn how to use pandas to calculate basic stats from an imported CSV file.
TLDR solution
import pandas as pd
df = pd.read_csv("path-to-file/file-name.csv")
mean = df['numeric_column'].mean()
sum = df['numeric_column'].sum()
max = df['numeric_column'].max()
min = df['numeric_column'].min()
count = df['numeric_column'].count()
median = df['numeric_column'].median()
std = df['numeric_column'].std()
var = df['numeric_column'].var()
groupby_sum = df.groupby(['dimension']).sum(numeric_only=True)
groupby_count = df.groupby(['dimension']).count()
Step-by-Step Example
Step 1: Install the pandas Package
If you don't have pandas already installed, execute the following command in your terminal:
pip install pandas
Step 2: Have a CSV File Ready
Let's say, you have a CSV saved on your desktop with the following content:
Fish,Species,Size_inch,Size_cm
salmon1,salmon,28,71
salmon2,salmon,30,76
pufferfish1,pufferfish,3,8
pufferfish2,pufferfish,4,10
shark1,shark,60,152
shark2,shark,70,178
Step 3: Create a Python Script
Let's create a Python script that imports the CSV file on your desktop, calculate some statistics and prints them to your terminal. In particular, we are interested in calculating the mean, sum, maximum, minimum, count, median, standard deviation, variation, grouped sum and grouped count:
import os
import pandas as pd
desktop_path = os.path.expanduser("~/Desktop")
df = pd.read_csv(desktop_path + "/fish_size.csv")
mean = df['Size_cm'].mean()
sum = df['Size_cm'].sum()
max = df['Size_cm'].max()
min = df['Size_cm'].min()
count = df['Size_cm'].count()
median = df['Size_cm'].median()
std = df['Size_cm'].std()
var = df['Size_cm'].var()
groupby_sum = df.groupby(['Species']).sum(numeric_only=True)
groupby_count = df.groupby(['Species']).count()
print('mean size: ' + str(mean))
print('sum of size: ' + str(sum))
print('max size: ' + str(max))
print('min size: ' + str(min))
print('count of size: ' + str(count))
print('median size: ' + str(median))
print('std of size: ' + str(std))
print('var of size: ' + str(var))
print('grouped sum: ' + str(groupby_sum))
print('grouped count: ' + str(groupby_count))
Create a new file using a text editor of your choice, copy-paste the above Python code into it, and save it as calculate_stats.py on your desktop.
Verify that it works by navigating your terminal to your desktop and running the script:
cd Desktop
python calculate_stats.py
You should see the following output in your terminal:
mean size: 82.5
sum of size: 495
max size: 178
min size: 8
count of size: 6
median size: 73.5
std of size: 70.61373804012928
var of size: 4986.3
grouped sum: Size_inch Size_cm
Species
pufferfish 7 18
salmon 58 147
shark 130 330
grouped count: Fish Size_inch Size_cm
Species
pufferfish 2 2 2
salmon 2 2 2
shark 2 2 2
That's it! You just learned how to calculate basic stats from a CSV file with Python and pandas.