In this short guide, you’ll see how to create a pivot table in Python using Pandas.
The Example
Imagine that you have the following DataFrame that contains data about sales across 4 quarters:
import pandas as pd
data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}
df = pd.DataFrame(data)
print(df)
Run the above code in Python, and you’ll get the following DataFrame:
person sales quarter country
0 A 1000 1 US
1 B 300 1 Japan
2 C 400 1 Brazil
3 D 500 1 UK
4 E 800 1 US
5 A 1000 2 Brazil
6 B 500 2 Japan
7 C 700 2 Brazil
8 D 50 2 US
9 E 60 2 US
10 A 1000 3 US
11 B 900 3 Japan
12 C 750 3 Brazil
13 D 200 3 UK
14 E 300 3 Brazil
15 A 1000 4 Japan
16 B 900 4 Japan
17 C 250 4 Brazil
18 D 750 4 UK
19 E 50 4 US
5 Scenarios of Pivot Tables in Python using Pandas
Scenario 1: Total sales per person
To get the total sales (across the 4 quarters) per person using aggfunc=’sum’:
pivot = df.pivot_table(index=['person'], values=['sales'], aggfunc='sum')
The complete code:
import pandas as pd
data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}
df = pd.DataFrame(data)
pivot = df.pivot_table(index=['person'], values=['sales'], aggfunc='sum')
print(pivot)
Here are the total sales per person:
sales
person
A 4000
B 2600
C 2100
D 1500
E 1210
Scenario 2: Total sales by country
To group the total sales by county (here, aggregate the results by the ‘country‘ field, rather than the ‘person’ field):
import pandas as pd
data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}
df = pd.DataFrame(data)
pivot = df.pivot_table(index=['country'], values=['sales'], aggfunc='sum')
print(pivot)
You’ll then get the total sales by county:
sales
country
Brazil 3400
Japan 3600
UK 1450
US 2960
Scenario 3: Sales by both the person and the country
You may aggregate the results by more than one field, such as the ‘person‘ and the ‘country‘ fields:
import pandas as pd
data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}
df = pd.DataFrame(data)
pivot = df.pivot_table(index=['person', 'country'], values=['sales'], aggfunc='sum')
print(pivot)
Run the code, and you’ll see the sales by both the person and the country:
sales
person country
A Brazil 1000
Japan 1000
US 2000
B Japan 2600
C Brazil 2100
D UK 1450
US 50
E Brazil 300
US 910
Scenario 4: Maximum individual sale by country
So far, you used the sum operation (i.e., aggfunc=’sum’) to group the results, but you are not limited to that operation.
In this scenario, you’ll find the maximum individual sale by county using aggfunc=’max’:
import pandas as pd
data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}
df = pd.DataFrame(data)
pivot = df.pivot_table(index=['country'], values=['sales'], aggfunc='max')
print(pivot)
And the result:
sales
country
Brazil 1000
Japan 1000
UK 750
US 1000
Scenario 5: Median, mean and minimum sales by country
You can use multiple operations within the aggfunc argument. For example, to find the median, mean and minimum sales by country:
aggfunc=['median', 'mean', 'min']
And here is the complete Python code:
import pandas as pd
data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'],
'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60, 1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50],
'quarter': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'country': ['US', 'Japan', 'Brazil', 'UK', 'US', 'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan',
'Brazil', 'UK', 'Brazil', 'Japan', 'Japan', 'Brazil', 'UK', 'US']
}
df = pd.DataFrame(data)
pivot = df.pivot_table(index=['country'], values=['sales'], aggfunc=['median', 'mean', 'min'])
print(pivot)
You’ll then get the following results:
median mean min
sales sales sales
country
Brazil 550.0 566.666667 250
Japan 900.0 720.000000 300
UK 500.0 483.333333 200
US 430.0 493.333333 50
You just saw how to create pivot tables across 5 simple scenarios. But the concepts reviewed here can be applied across large number of different scenarios.
You can find additional information about pivot tables by visiting the Pandas documentation.