*Pandas *is a powerful Python package to perform statistical analysis for a given set of data. In this short post, I’m going to show you how to use *pandas*, in order to calculate stats from an imported CSV file.

To illustrate, I’ll use a simple example, where the following data will be saved in a CSV file called ‘stats’ (and where the salary column is saved as a number format):

Name | Salary | Country |

Dan | 40000 | USA |

Elizabeth | 32000 | Brazil |

Jon | 45000 | Italy |

Maria | 54000 | USA |

Mark | 72000 | USA |

Bill | 62000 | Brazil |

Jess | 92000 | Italy |

Julia | 55000 | USA |

Jeff | 35000 | Italy |

Ben | 48000 | Brazil |

The above data-set is saved on my computer under the following path (you’ll need to change the path, in the Python code below, to the location where your CSV file is stored on your computer):

C:\Users\Doron E\Desktop\pandas\stats.csv

Now that we have the data-set ready, and it is saved at our desired location within a CSV file, we will calculate the following statistics using the *pandas* package:

- Mean (i.e., average) salary
- Total sum of salaries
- Maximum salary
- Minimum salary
- Count of salaries
- Median salary
- Standard deviation of salaries
- Variance of of salaries

In addition, we will also do some grouping calculations:

- Sum of salaries, grouped by the Country column
- Count of salaries, grouped by the Country column

## Use Pandas to Calculate Stats from an Imported CSV file

Now let’s look at the Python code used in order to calculate stats from an import CSV file using pandas. Please note that I included additional comments within the code to clarify the syntax used.

import pandas as pd #read the CSV file df = pd.read_csv (r'C:\Users\Doron E\Desktop\pandas\stats.csv',header='infer') # put 'r' before the path name to address any special characters such as '\'. Also do not forget to put the file name in the path (here it is 'stats') as well as the suffix '.csv' # block 1 - simple stats mean1 = df['Salary'].mean() # calculates the average of all values under the Salary column sum1 = df['Salary'].sum() # calculates the sum of all values under the Salary column max1 = df['Salary'].max() # calculates the max of all values under the Salary column min1 = df['Salary'].min() # calculates the min of all values under the Salary column count1 = df['Salary'].count() # calculates the count of all values under the Salary column median1 = df['Salary'].median () # calculates the median of all values under the Salary column std1 = df['Salary'].std () # calculates the std of all values under the Salary column var1 = df['Salary'].var () # calculates the var of all values under the Salary column # block 2 - group by groupby_sum1 = df.groupby(['Country']).sum() # calculates the sum of salaries, grouped by the Country column groupby_count1 = df.groupby(['Country']).count() # calculates the count of salaries, grouped by the Country column # print block 1 print ('The average of values under the Salary column is: ' + str(mean1)) print ('The sum of values under the Salary column is: ' + str(sum1)) print ('The max of values under the Salary column is: ' + str(max1)) print ('The min of values under the Salary column is: ' + str(min1)) print ('The count of values under the Salary column is: ' + str(count1)) print ('The median of values under the Salary column is: ' + str(median1)) print ('The std of values under the Salary column is: ' + str(std1)) print ('The var of values under the Salary column is: ' + str(var1)) # print block 2 print ('The sum of values, grouped by the Country column, is: ' + str(groupby_sum1)) print ('The count of values, grouped by the Country column, is: ' + str(groupby_count1))

You should get the following results when you run the above code in Python:

We just saw how you can calculate simple stats using *pandas.* You may also want to check the pandas documentation to learn more about the power of this great library!