*Pandas *is a powerful Python package that can be used to perform statistical analysis. In this short post, I’ll show you how to use pandas to calculate stats from an imported CSV file.

## The Example

To demonstrate how to calculate stats from an imported CSV file, I’ll use a simple example with the following data-set:

Name | Salary | Country |

Dan | 40000 | USA |

Elizabeth | 32000 | Brazil |

Jon | 45000 | Italy |

Maria | 54000 | USA |

Mark | 72000 | USA |

Bill | 62000 | Brazil |

Jess | 92000 | Italy |

Julia | 55000 | USA |

Jeff | 35000 | Italy |

Ben | 48000 | Brazil |

## Steps to Calculate Stats from an Imported CSV file

### Step 1: Copy the data-set into a CSV file

To start, you’ll need to copy the above data-set into a CSV file. Then rename the CSV file as **stats**:

### Step 2: Import the CSV file into Python

Next, you’ll need to import the CSV file into Python using this structure:

import pandas as pd df = pd.read_csv (r'Path where the CSV file is stored\File name.csv')

For example, this is the path where I stored the CSV file:

*C:\Users\doron\Desktop\stats.csv*

So the code to import the **stats** CSV file in my case is:

import pandas as pd df = pd.read_csv (r'C:\Users\doron\Desktop\stats.csv')

In the Python code below, you’ll need to change the path to the location where the CSV file is stored on *your* computer.

### Step 3: Use Pandas to Calculate Stats from an Imported CSV file

For the final step, the goal is to calculate the following statistics using the pandas package:

- Mean salary
- Total sum of salaries
- Maximum salary
- Minimum salary
- Count of salaries
- Median salary
- Standard deviation of salaries
- Variance of of salaries

In addition, we will also do some grouping calculations:

- Sum of salaries, grouped by the Country column
- Count of salaries, grouped by the Country column

Run the following Python code (after adjusting the path name) in order to calculate the stats from the imported CSV file using *pandas:*

import pandas as pd df = pd.read_csv (r'C:\Users\doron\Desktop\stats.csv') # block 1 - simple stats mean1 = df['Salary'].mean() sum1 = df['Salary'].sum() max1 = df['Salary'].max() min1 = df['Salary'].min() count1 = df['Salary'].count() median1 = df['Salary'].median() std1 = df['Salary'].std() var1 = df['Salary'].var() # block 2 - group by groupby_sum1 = df.groupby(['Country']).sum() groupby_count1 = df.groupby(['Country']).count() # print block 1 print ('Mean salary: ' + str(mean1)) print ('Sum of salaries: ' + str(sum1)) print ('Max salary: ' + str(max1)) print ('Min salary: ' + str(min1)) print ('Count of salaries: ' + str(count1)) print ('Median salary: ' + str(median1)) print ('Std of salaries: ' + str(std1)) print ('Var of salaries: ' + str(var1)) # print block 2 print ('Sum of values, grouped by the Country: ' + str(groupby_sum1)) print ('Count of values, grouped by the Country: ' + str(groupby_count1))

Once you run the code in Python, you’ll get the following results:

Here is a table that summarizes the operations performed in the code:

Variable | Syntax used | Description |

mean1 | df['Salary'].mean() | Average of all values under the Salary column |

sum1 | df['Salary'].sum() | Sum of all values under the Salary column |

max1 | df['Salary'].max() | Max of all values under the Salary column |

min1 | df['Salary'].min() | Min of all values under the Salary column |

count1 | df['Salary'].count() | Count of all values under the Salary column |

median1 | df['Salary'].median() | Median of all values under the Salary column |

std1 | df['Salary'].std() | Standard deviation of all values under the Salary column |

var1 | df['Salary'].var() | Variance of all values under the Salary column |

groupby_sum1 | df.groupby(['Country']).sum() | Sum of salaries, grouped by the Country column |

groupby_count1 | df.groupby(['Country']).count() | Count of salaries, grouped by the Country column |

You just saw how to calculate simple stats using *pandas.* You may also want to check the pandas documentation to learn more about the power of this great library!