# Use Pandas to Calculate Stats from an Imported CSV file

Pandas is a powerful Python package that can be used to perform statistical analysis. In this short post, I’ll show you how to use pandas to calculate stats from an imported CSV file.

## The Example

To demonstrate how to calculate stats from an imported CSV file, I’ll use a simple example with the following data-set:

 Name Salary Country Dan 40000 USA Elizabeth 32000 Brazil Jon 45000 Italy Maria 54000 USA Mark 72000 USA Bill 62000 Brazil Jess 92000 Italy Julia 55000 USA Jeff 35000 Italy Ben 48000 Brazil

## Steps to Calculate Stats from an Imported CSV file

### Step 1: Copy the data-set into a CSV file

To start, you’ll need to copy the above data-set into a CSV file. Then rename the CSV file as stats: ### Step 2: Import the CSV file into Python

Next, you’ll need to import the CSV file into Python using this structure:

```import pandas as pd
df = pd.read_csv (r'Path where the CSV file is stored\File name.csv')
print (df)```

For example, this is the path where I stored the CSV file on my computer:

C:\Users\Ron\Desktop\stats.csv

So in my case, I applied the following code to import the stats CSV file:

```import pandas as pd
print (df)
```

### Step 3: Use Pandas to Calculate Stats from an Imported CSV file

For the final step, the goal is to calculate the following statistics using the pandas package:

• Mean salary
• Total sum of salaries
• Maximum salary
• Minimum salary
• Count of salaries
• Median salary
• Standard deviation of salaries
• Variance of of salaries

In addition, we’ll also do some grouping calculations:

• Sum of salaries, grouped by the Country column
• Count of salaries, grouped by the Country column

Once you’re ready, run the code below in order to calculate the stats from the imported CSV file using pandas. Note that you’ll need to change the path name (2nd row in the code) to reflect the location where the CSV file is stored on your computer.

```import pandas as pd

# block 1 - simple stats
mean1 = df['Salary'].mean()
sum1 = df['Salary'].sum()
max1 = df['Salary'].max()
min1 = df['Salary'].min()
count1 = df['Salary'].count()
median1 = df['Salary'].median()
std1 = df['Salary'].std()
var1 = df['Salary'].var()

# block 2 - group by
groupby_sum1 = df.groupby(['Country']).sum()
groupby_count1 = df.groupby(['Country']).count()

# print block 1
print ('Mean salary: ' + str(mean1))
print ('Sum of salaries: ' + str(sum1))
print ('Max salary: ' + str(max1))
print ('Min salary: ' + str(min1))
print ('Count of salaries: ' + str(count1))
print ('Median salary: ' + str(median1))
print ('Std of salaries: ' + str(std1))
print ('Var of salaries: ' + str(var1))

# print block 2
print ('Sum of values, grouped by the Country: ' + str(groupby_sum1))
print ('Count of values, grouped by the Country: ' + str(groupby_count1))
```

After you run the code in Python, you’ll get the following results: Here is a table that summarizes the operations performed in the code: You just saw how to calculate simple stats using pandas. You may also want to check the pandas documentation to learn more about the power of this great library!