Get the Descriptive Statistics for Pandas DataFrame

To get the descriptive statistics for a specific column in your DataFrame:

df['dataframe_column'].describe()

To get the descriptive statistics for an entire DataFrame:

df.describe(include='all')

Steps to Get the Descriptive Statistics for Pandas DataFrame

Step 1: Collect the Data

To start, you’ll need to collect the data for your DataFrame.

For example, here is a simple dataset that can be used for our DataFrame:

product price year
A 22000 2014
B 27000 2015
C 25000 2016
C 29000 2017
D 35000 2018

Step 2: Create the DataFrame

Next, create the DataFrame based on the data collected.

Here is the code to create the DataFrame for our example:

import pandas as pd

data = {'product': ['A', 'B', 'C', 'C', 'D'],
        'price': [22000, 27000, 25000, 29000, 35000],
        'year': [2014, 2015, 2016, 2017, 2018]
        }

df = pd.DataFrame(data)
print(df)

Run the code in Python, and you’ll get the following DataFrame:

  product  price  year
0       A  22000  2014
1       B  27000  2015
2       C  25000  2016
3       C  29000  2017
4       D  35000  2018

Step 3: Get the Descriptive Statistics for Pandas DataFrame

Once you have your DataFrame ready, you’ll be able to get the descriptive statistics using the template that you saw at the beginning of this guide:

df['dataframe_column'].describe()

Let’s say that you want to get the descriptive statistics for the ‘price‘ field, which contains numerical data. In that case, the syntax that you’ll need to apply is:

df['price'].describe()

So the complete Python code would look like this:

import pandas as pd

data = {'product': ['A', 'B', 'C', 'C', 'D'],
        'price': [22000, 27000, 25000, 29000, 35000],
        'year': [2014, 2015, 2016, 2017, 2018]
        }

df = pd.DataFrame(data)

stats_numeric = df['price'].describe()
print(stats_numeric)

Once you run the code, you’ll get the descriptive statistics for the ‘price’ field:

count        5.000000
mean     27600.000000
std       4878.524367
min      22000.000000
25%      25000.000000
50%      27000.000000
75%      29000.000000
max      35000.000000
Name: price, dtype: float64

You’ll notice that the output contains 6 decimal places. You may then add astype(int) to the code to get integer values.

This is how the code would look like:

import pandas as pd

data = {'product': ['A', 'B', 'C', 'C', 'D'],
        'price': [22000, 27000, 25000, 29000, 35000],
        'year': [2014, 2015, 2016, 2017, 2018]
        }

df = pd.DataFrame(data)

stats_numeric = df['price'].describe().astype(int)
print(stats_numeric)

Run the code, and you’ll get only integers:

count        5
mean     27600
std       4878
min      22000
25%      25000
50%      27000
75%      29000
max      35000
Name: price, dtype: int32

Descriptive Statistics for Categorical Data

So far, you have seen how to get the descriptive statistics for numerical data. The ‘price’ field was used for that purpose.

Yet, you can also get the descriptive statistics for categorical data.

For instance, you can get some descriptive statistics for the ‘product‘ field using this code:

import pandas as pd

data = {'product': ['A', 'B', 'C', 'C', 'D'],
        'price': [22000, 27000, 25000, 29000, 35000],
        'year': [2014, 2015, 2016, 2017, 2018]
        }

df = pd.DataFrame(data)

stats_categorical = df['product'].describe()
print(stats_categorical)

Here are the results:

count     5
unique    4
top       C
freq      2
Name: product, dtype: object

Get the Descriptive Statistics for the Entire DataFrame

Finally, you may apply the following template to get the descriptive statistics for the entire DataFrame:

df.describe(include='all')

So the complete Python code would look like this:

import pandas as pd

data = {'product': ['A', 'B', 'C', 'C', 'D'],
        'price': [22000, 27000, 25000, 29000, 35000],
        'year': [2014, 2015, 2016, 2017, 2018]
        }

df = pd.DataFrame(data)

stats = df.describe(include='all')
print(stats)

Run the code, and you’ll get the following result:

       product         price         year
count        5      5.000000     5.000000
unique       4           NaN          NaN
top          C           NaN          NaN
freq         2           NaN          NaN
mean       NaN  27600.000000  2016.000000
std        NaN   4878.524367     1.581139
min        NaN  22000.000000  2014.000000
25%        NaN  25000.000000  2015.000000
50%        NaN  27000.000000  2016.000000
75%        NaN  29000.000000  2017.000000
max        NaN  35000.000000  2018.000000

Breaking Down the Descriptive Statistics

You can further breakdown the descriptive statistics into the following:

Count:

df['dataframe_column'].count()

Mean:

df['dataframe_column'].mean()

Standard deviation:

df['dataframe_column'].std()

Minimum:

df['dataframe_column'].min()

0.25 Quantile:

df['dataframe_column'].quantile(q=0.25)

0.50 Quantile (Median):

df['dataframe_column'].quantile(q=0.50)

0.75 Quantile:

df['dataframe_column'].quantile(q=0.75)

Maximum:

df['dataframe_column'].max()

For our example, the df[‘dataframe_column’] is df[‘price’].

Therefore, the full Python code would look as follows:

import pandas as pd

data = {'product': ['A', 'B', 'C', 'C', 'D'],
        'price': [22000, 27000, 25000, 29000, 35000],
        'year': [2014, 2015, 2016, 2017, 2018]
        }

df = pd.DataFrame(data)

count1 = df['price'].count()
print('count: ' + str(count1))

mean1 = df['price'].mean()
print('mean: ' + str(mean1))

std1 = df['price'].std()
print('std: ' + str(std1))

min1 = df['price'].min()
print('min: ' + str(min1))

quantile1 = df['price'].quantile(q=0.25)
print('25%: ' + str(quantile1))

quantile2 = df['price'].quantile(q=0.50)
print('50%: ' + str(quantile2))

quantile3 = df['price'].quantile(q=0.75)
print('75%: ' + str(quantile3))

max1 = df['price'].max()
print('max: ' + str(max1))

Once you run the code in Python, you’ll get the following stats:

count: 5
mean: 27600.0
std: 4878.524367060188
min: 22000
25%: 25000.0
50%: 27000.0
75%: 29000.0
max: 35000