# Get the Descriptive Statistics in Pandas DataFrame

To get the descriptive statistics for a specific column in your DataFrame:

`df["dataframe_column"].describe()`

To get the descriptive statistics for an entire DataFrame:

`df.describe(include="all")`

## Steps

### Step 1: Collect the Data

To start, collect the data for your DataFrame.

Here is an example of a dataset:

### Step 2: Create the DataFrame

Next, create the DataFrame based on the data collected:

`import pandas as pddata = {    "product": ["A", "B", "C", "C", "D"],    "price": [22000, 27000, 25000, 29000, 35000],    "year": [2014, 2015, 2016, 2017, 2018],}df = pd.DataFrame(data)print(df)`

Run the code in Python, and you’ll get the following DataFrame:

``````  product  price  year
0       A  22000  2014
1       B  27000  2015
2       C  25000  2016
3       C  29000  2017
4       D  35000  2018``````

### Step 3: Get the Descriptive Statistics

To get the descriptive statistics for the “price” column, which contains numerical data:

`df["price"].describe()`

The full code:

`import pandas as pddata = {    "product": ["A", "B", "C", "C", "D"],    "price": [22000, 27000, 25000, 29000, 35000],    "year": [2014, 2015, 2016, 2017, 2018],}df = pd.DataFrame(data)stats_numeric = df["price"].describe()print(stats_numeric)`

The resulted descriptive statistics for the “price” column:

``````count        5.000000
mean     27600.000000
std       4878.524367
min      22000.000000
25%      25000.000000
50%      27000.000000
75%      29000.000000
max      35000.000000
Name: price, dtype: float64``````

Notice that the output contains 6 decimal places. You can convert the values to integers using astype(int):

`import pandas as pddata = {    "product": ["A", "B", "C", "C", "D"],    "price": [22000, 27000, 25000, 29000, 35000],    "year": [2014, 2015, 2016, 2017, 2018],}df = pd.DataFrame(data)stats_numeric = df["price"].describe().astype(int)print(stats_numeric)`

Run the code, and you’ll get only integers:

``````count        5
mean     27600
std       4878
min      22000
25%      25000
50%      27000
75%      29000
max      35000
Name: price, dtype: int32``````

## Descriptive Statistics for Categorical Data

To get the descriptive statistics for the “product” column, which contains categorical data:

`import pandas as pddata = {    "product": ["A", "B", "C", "C", "D"],    "price": [22000, 27000, 25000, 29000, 35000],    "year": [2014, 2015, 2016, 2017, 2018],}df = pd.DataFrame(data)stats_categorical = df["product"].describe()print(stats_categorical)`

Here are the results:

``````count     5
unique    4
top       C
freq      2
Name: product, dtype: object``````

## Get the Descriptive Statistics for the Entire DataFrame

To get the descriptive statistics for the entire DataFrame:

`import pandas as pddata = {    "product": ["A", "B", "C", "C", "D"],    "price": [22000, 27000, 25000, 29000, 35000],    "year": [2014, 2015, 2016, 2017, 2018],}df = pd.DataFrame(data)stats = df.describe(include="all")print(stats)`

The result:

``````       product         price         year
count        5      5.000000     5.000000
unique       4           NaN          NaN
top          C           NaN          NaN
freq         2           NaN          NaN
mean       NaN  27600.000000  2016.000000
std        NaN   4878.524367     1.581139
min        NaN  22000.000000  2014.000000
25%        NaN  25000.000000  2015.000000
50%        NaN  27000.000000  2016.000000
75%        NaN  29000.000000  2017.000000
max        NaN  35000.000000  2018.000000``````

## Breaking Down the Descriptive Statistics

You can further breakdown the descriptive statistics into the following:

Count:

`df["dataframe_column"].count()`

Mean:

`df["dataframe_column"].mean()`

Standard deviation:

`df["dataframe_column"].std()`

Minimum:

`df["dataframe_column"].min()`

0.25 Quantile:

`df["dataframe_column"].quantile(q=0.25)`

0.50 Quantile (Median):

`df["dataframe_column"].quantile(q=0.50)`

0.75 Quantile:

`df["dataframe_column"].quantile(q=0.75)`

Maximum:

`df["dataframe_column"].max()`

Putting everything together:

`import pandas as pddata = {    "product": ["A", "B", "C", "C", "D"],    "price": [22000, 27000, 25000, 29000, 35000],    "year": [2014, 2015, 2016, 2017, 2018],}df = pd.DataFrame(data)statistics = {    "count": df["price"].count(),    "mean": df["price"].mean(),    "std": df["price"].std(),    "min": df["price"].min(),    "quantile_25": df["price"].quantile(q=0.25),    "quantile_50": df["price"].quantile(q=0.50),    "quantile_75": df["price"].quantile(q=0.75),    "max": df["price"].max(),}for stat, value in statistics.items():    print(f"{stat}: {value}")`

Once you run the code in Python, you’ll get the following stats:

``````count: 5
mean: 27600.0
std: 4878.524367060188
min: 22000
quantile_25: 25000.0
quantile_50: 27000.0
quantile_75: 29000.0
max: 35000``````