How to Get the Descriptive Statistics for Pandas DataFrame

Need to get the descriptive statistics for pandas DataFrame?

If so, you can use the following template to get the descriptive statistics for a specific column in your DataFrame:

df['DataFrame Column'].describe()

Alternatively, you may use this template to get the descriptive statistics for the entire DataFrame:

df.describe(include='all')

In the next section, I’ll show you the steps to derive the descriptive statistics using an example.

Steps to Get the Descriptive Statistics for Pandas DataFrame

Step 1: Collect the Data

To start, you’ll need to collect the data for your DataFrame. For example, I collected the following data about cars:

BrandPriceYear
Honda Civic220002014
Ford Focus270002015
Toyota Corolla250002016
Toyota Corolla290002017
Audi A4350002018

Step 2: Create the DataFrame

Next, you’ll need to create the DataFrame based on the data collected.

For our example, the code to create the DataFrame is:

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])
print (df)

Run the code in Python, and you’ll get this DataFrame:

dataframe python

Step 3: Get the Descriptive Statistics for Pandas DataFrame

Once you have your DataFrame ready, you’ll be able to get the descriptive statistics using the template that you saw at the beginning of this guide:

df['DataFrame Column'].describe()

Let’s say that you want to get the descriptive statistics for the ‘Price’ field, which contains numerical data. In that case, the syntax that you’ll need to apply is:

df['Price'].describe()

So the complete Python code would look like this:

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

stats_numeric = df['Price'].describe()
print (stats_numeric)

Once you run the code, you’ll get the descriptive statistics for the ‘Price’ field:

How to get the Descriptive Statistics for pandas DataFrame

You’ll notice that the output contains 6 decimal places. You may then add the syntax of astype (int) to the code to get integer values.

This is how the code would look like:

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

stats_numeric = df['Price'].describe().astype (int)
print (stats_numeric)

Run the code, and you’ll get only integers:

integer python

Descriptive Statistics for Categorical Data

So far, you have seen how to get the descriptive statistics for numerical data. The ‘Price’ field was used for that purpose.

Yet, you can also get the descriptive statistics for categorical data.

For instance, you can get some descriptive statistics for the ‘Brand’ field using this code:

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

stats_categorical = df['Brand'].describe()
print (stats_categorical)

And this is the result that you’ll get:

Get the Descriptive Statistics for pandas DataFrame

Get the Descriptive Statistics for the Entire Pandas DataFrame

Finally, you may apply the following template to get the descriptive statistics for the entire DataFrame:

df.describe(include='all')

So the complete Python code would look like this:

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

stats = df.describe(include='all')
print (stats)

Run the code, and you’ll get the following result:

Stats in Python

Breaking Down the Descriptive Statistics

You can further breakdown the descriptive statistics into the following:

Count:

df['DataFrame Column'].count()

Mean:

df['DataFrame Column'].mean()

Standard deviation:

df['DataFrame Column'].std()

Minimum:

df['DataFrame Column'].min()

0.25 Quantile:

df['DataFrame Column'].quantile(q=0.25)

0.50 Quantile (Median):

df['DataFrame Column'].quantile(q=0.50)

0.75 Quantile:

df['DataFrame Column'].quantile(q=0.75)

Maximum:

df['DataFrame Column'].max()

For our example, the df[‘DataFrame Column’] is df[‘Price’].

Therefore, the full Python code for our example would look like this:

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

count1 = df['Price'].count()
print('count: ' + str(count1))

mean1 = df['Price'].mean()
print('mean: ' + str(mean1))

std1 = df['Price'].std()
print('std: ' + str(std1))

min1 = df['Price'].min()
print('min: ' + str(min1))

quantile1 = df['Price'].quantile(q=0.25)
print('25%: ' + str(quantile1))

quantile2 = df['Price'].quantile(q=0.50)
print('50%: ' + str(quantile2))

quantile3 = df['Price'].quantile(q=0.75)
print('75%: ' + str(quantile3))

max1 = df['Price'].max()
print('max: ' + str(max1))

Once you run the code in Python, you’ll get the following stats:

How to get the Descriptive Statistics for pandas DataFrame