How to get the Descriptive Statistics for Pandas DataFrame

Need to get the descriptive statistics for pandas DataFrame?

If so, you can use the following template to get the descriptive statistics for your DataFrame:

 

DataFrame.describe(df['DataFrame Field'])

 

In the next section, I’ll show you the steps to derive the descriptive statistics using an example.

Steps to get the Descriptive Statistics for Pandas DataFrame

Step 1: Collect the data

To start, you’ll need to collect the data for your DataFrame. For example, I collected the following data about cars:

 

BrandPriceYear
Honda Civic220002014
Ford Focus270002015
Toyota Corolla250002016
Toyota Corolla290002017
Audi A4350002018

Step 2: Create the DataFrame

Next, you’ll need to create the DataFrame based on the data collected.

For our example, the code to create the DataFrame is:

 

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])
print (df)

 

Run the code in Python, and you’ll get this DataFrame:

 

dataframe python

Step 3: Get the Descriptive Statistics for Pandas DataFrame

Once you have your DataFrame ready, you’ll be able to get the descriptive statistics using the template that we saw at the beginning of this post:

 

DataFrame.describe(df['DataFrame Field'])

 

Let’s say that you want to get the descriptive statistics for the ‘Price’ field, which contains numerical data. In that case, the syntax that you’ll need to use is:

 

DataFrame.describe(df['Price'])

 

And the complete Python code would look like this:

 

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

stats_numeric = DataFrame.describe(df['Price'])
print (stats_numeric)

 

Once you run the code, you’ll get the descriptive statistics for the ‘Price’ field:

 

How to get the Descriptive Statistics for pandas DataFrame

 

You’ll notice that the output contains 6 decimal places. You may then add the syntax of astype (int) to the code to get integer values.

This is how the code would look like:

 

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

stats_numeric = DataFrame.describe(df['Price']).astype (int)
print (stats_numeric)

 

Run the code, and you’ll get only integers:

 

integer python

Breaking down the Descriptive Statistics

You can further breakdown the descriptive statistics into the following measures:

 

MeasurePython code
Count
df['DataFrame Field'].count()
Mean
df['DataFrame Field'].mean()
Standard deviation
df['DataFrame Field'].std()
Minimum
df['DataFrame Field'].min()
0.25 Quantile
df['DataFrame Field'].quantile(q=0.25)
0.50 Quantile (=Median)
df['DataFrame Field'].quantile(q=0.50)
0.75 Quantile
df['DataFrame Field'].quantile(q=0.75)
Maximum
df['DataFrame Field'].max()
Median
df['DataFrame Field'].median()
Variance
df['DataFrame Field'].var()
Skewness
df['DataFrame Field'].skew()
Kurtosis
df['DataFrame Field'].kurt()

 

For our example, the df[‘DataFrame Field’] is df[‘Price’].

Therefore, the full Python code for our example would look like this:

 

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

count1 = df['Price'].count()
print('count: ' + str(count1))

mean1 = df['Price'].mean()
print('mean: ' + str(mean1))

std1 = df['Price'].std()
print('std: ' + str(std1))

min1 = df['Price'].min()
print('min: ' + str(min1))

quantile1 = df['Price'].quantile(q=0.25)
print('25%: ' + str(quantile1))

quantile2 = df['Price'].quantile(q=0.50)
print('50%: ' + str(quantile2))

quantile3 = df['Price'].quantile(q=0.75)
print('75%: ' + str(quantile3))

max1 = df['Price'].max()
print('max: ' + str(max1))

# median = 0.5 quantile
median1 = df['Price'].median()
print('median: ' + str(median1))

var1 = df['Price'].var() 
print('var: ' + str(var1))

skew1 = df['Price'].skew() 
print('skew: ' + str(skew1))

kurt1 = df['Price'].kurt() 
print('kurt: ' + str(kurt1))

 

Once you run the code in Python, you’ll get the following stats:

 

How to get the Descriptive Statistics for pandas DataFrame

Descriptive Statistics for Categorical data

So far we have seen how to get the descriptive statistics for numerical data. We used the ‘Price’ field for that purpose.

Yet, you can also get the descriptive statistics for categorical data.

For instance, you can get some descriptive statistics for the ‘Brand’ field using this code:

 

from pandas import DataFrame

Cars = {'Brand': ['Honda Civic','Ford Focus','Toyota Corolla','Toyota Corolla','Audi A4'],
        'Price': [22000,27000,25000,29000,35000],
         'Year': [2014,2015,2016,2017,2018]
        }

df = DataFrame(Cars, columns= ['Brand', 'Price','Year'])

stats_categorical = DataFrame.describe(df['Brand'])
print (stats_categorical)

 

And this is the result that you’ll get:

 

Get the Descriptive Statistics for pandas DataFrame