Create Pivot Table in Python using Pandas

In this short guide, you’ll see how to create a pivot table in Python using Pandas.

The Example

Imagine that you have the following DataFrame that contains data about sales across 4 quarters:

import pandas as pd

data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}

df = pd.DataFrame(data)

print(df)

Run the above code in Python, and you’ll get the following DataFrame:

   person  sales  quarter country
0       A   1000        1      US
1       B    300        1   Japan
2       C    400        1  Brazil
3       D    500        1      UK
4       E    800        1      US
5       A   1000        2  Brazil
6       B    500        2   Japan
7       C    700        2  Brazil
8       D     50        2      US
9       E     60        2      US
10      A   1000        3      US
11      B    900        3   Japan
12      C    750        3  Brazil
13      D    200        3      UK
14      E    300        3  Brazil
15      A   1000        4   Japan
16      B    900        4   Japan
17      C    250        4  Brazil
18      D    750        4      UK
19      E     50        4      US

5 Scenarios of Pivot Tables in Python using Pandas

Scenario 1: Total sales per person

To get the total sales (across the 4 quarters) per person using aggfunc=’sum’:

pivot = df.pivot_table(index=['person'], values=['sales'], aggfunc='sum')

The complete code:

import pandas as pd

data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}

df = pd.DataFrame(data)

pivot = df.pivot_table(index=['person'], values=['sales'], aggfunc='sum')

print(pivot)

Here are the total sales per person:

        sales
person       
A        4000
B        2600
C        2100
D        1500
E        1210

Scenario 2: Total sales by country

To group the total sales by county (here, aggregate the results by the ‘country‘ field, rather than the ‘person’ field):

import pandas as pd

data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}

df = pd.DataFrame(data)

pivot = df.pivot_table(index=['country'], values=['sales'], aggfunc='sum')

print(pivot)

You’ll then get the total sales by county:

         sales
country       
Brazil    3400
Japan     3600
UK        1450
US        2960

Scenario 3: Sales by both the person and the country

You may aggregate the results by more than one field, such as the ‘person‘ and the ‘country‘ fields:

import pandas as pd

data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}

df = pd.DataFrame(data)

pivot = df.pivot_table(index=['person', 'country'], values=['sales'], aggfunc='sum')

print(pivot)

Run the code, and you’ll see the sales by both the person and the country:

                sales
person country       
A      Brazil    1000
       Japan     1000
       US        2000
B      Japan     2600
C      Brazil    2100
D      UK        1450
       US          50
E      Brazil     300
       US         910

Scenario 4: Maximum individual sale by country

So far, you used the sum operation (i.e., aggfunc=’sum’) to group the results, but you are not limited to that operation.

In this scenario, you’ll find the maximum individual sale by county using aggfunc=’max’:

import pandas as pd

data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}

df = pd.DataFrame(data)

pivot = df.pivot_table(index=['country'], values=['sales'], aggfunc='max')

print(pivot)

And the result:

         sales
country       
Brazil    1000
Japan     1000
UK         750
US        1000

Scenario 5: Median, mean and minimum sales by country

You can use multiple operations within the aggfunc argument. For example, to find the median, mean and minimum sales by country:

aggfunc=['median', 'mean', 'min']

And here is the complete Python code:

import pandas as pd

data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}

df = pd.DataFrame(data)

pivot = df.pivot_table(index=['country'], values=['sales'], aggfunc=['median', 'mean', 'min'])

print(pivot)

You’ll then get the following results:

        median        mean   min
         sales       sales sales
country                         
Brazil   550.0  566.666667   250
Japan    900.0  720.000000   300
UK       500.0  483.333333   200
US       430.0  493.333333    50

You just saw how to create pivot tables across 5 simple scenarios. But the concepts reviewed here can be applied across large number of different scenarios.

You can find additional information about pivot tables by visiting the Pandas documentation.

Leave a Comment