Get the Descriptive Statistics in Pandas DataFrame

To get the descriptive statistics for a specific column in your DataFrame:

df["dataframe_column"].describe()

To get the descriptive statistics for an entire DataFrame:

df.describe(include="all")

Steps

Step 1: Collect the Data

To start, collect the data for your DataFrame.

Here is an example of a dataset:

productpriceyear
A220002014
B270002015
C250002016
C290002017
D350002018

Step 2: Create the DataFrame

Next, create the DataFrame based on the data collected:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

print(df)

Run the code in Python, and you’ll get the following DataFrame:

  product  price  year
0       A  22000  2014
1       B  27000  2015
2       C  25000  2016
3       C  29000  2017
4       D  35000  2018

Step 3: Get the Descriptive Statistics

To get the descriptive statistics for the “price” column, which contains numerical data:

df["price"].describe()

The full code:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats_numeric = df["price"].describe()

print(stats_numeric)

The resulted descriptive statistics for the “price” column:

count        5.000000
mean     27600.000000
std       4878.524367
min      22000.000000
25%      25000.000000
50%      27000.000000
75%      29000.000000
max      35000.000000
Name: price, dtype: float64

Notice that the output contains 6 decimal places. You can convert the values to integers using astype(int):

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats_numeric = df["price"].describe().astype(int)

print(stats_numeric)

Run the code, and you’ll get only integers:

count        5
mean     27600
std       4878
min      22000
25%      25000
50%      27000
75%      29000
max      35000
Name: price, dtype: int32

Descriptive Statistics for Categorical Data

To get the descriptive statistics for the “product” column, which contains categorical data:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats_categorical = df["product"].describe()

print(stats_categorical)

Here are the results:

count     5
unique    4
top       C
freq      2
Name: product, dtype: object

Get the Descriptive Statistics for the Entire DataFrame

To get the descriptive statistics for the entire DataFrame:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

stats = df.describe(include="all")

print(stats)

The result:

       product         price         year
count        5      5.000000     5.000000
unique       4           NaN          NaN
top          C           NaN          NaN
freq         2           NaN          NaN
mean       NaN  27600.000000  2016.000000
std        NaN   4878.524367     1.581139
min        NaN  22000.000000  2014.000000
25%        NaN  25000.000000  2015.000000
50%        NaN  27000.000000  2016.000000
75%        NaN  29000.000000  2017.000000
max        NaN  35000.000000  2018.000000

Breaking Down the Descriptive Statistics

You can further breakdown the descriptive statistics into the following:

Count:

df["dataframe_column"].count()

Mean:

df["dataframe_column"].mean()

Standard deviation:

df["dataframe_column"].std()

Minimum:

df["dataframe_column"].min()

0.25 Quantile:

df["dataframe_column"].quantile(q=0.25)

0.50 Quantile (Median):

df["dataframe_column"].quantile(q=0.50)

0.75 Quantile:

df["dataframe_column"].quantile(q=0.75)

Maximum:

df["dataframe_column"].max()

Putting everything together:

import pandas as pd

data = {
"product": ["A", "B", "C", "C", "D"],
"price": [22000, 27000, 25000, 29000, 35000],
"year": [2014, 2015, 2016, 2017, 2018],
}

df = pd.DataFrame(data)

statistics = {
"count": df["price"].count(),
"mean": df["price"].mean(),
"std": df["price"].std(),
"min": df["price"].min(),
"quantile_25": df["price"].quantile(q=0.25),
"quantile_50": df["price"].quantile(q=0.50),
"quantile_75": df["price"].quantile(q=0.75),
"max": df["price"].max(),
}

for stat, value in statistics.items():
print(f"{stat}: {value}")

Once you run the code in Python, you’ll get the following stats:

count: 5
mean: 27600.0
std: 4878.524367060188
min: 22000
quantile_25: 25000.0
quantile_50: 27000.0
quantile_75: 29000.0
max: 35000

Leave a Comment