How to Create a Pivot Table in Python using Pandas

In this guide, I’ll show you how to create a pivot table in Python using pandas. In particular, I’ll demonstrate how to create a pivot table across 5 simple scenarios.

Create a Pivot Table in Python using Pandas

To start, here is the dataset to be used to create the pivot table in Python:

How to Create a Pivot Table in Python using Pandas

Firstly, you’ll need to capture the above data in Python.

But how would you do that?

You can accomplish this task by using pandas DataFrame:

import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

print (df)

Run the above code in Python, and you’ll get this DataFrame:

df pandas

Once you have your DataFrame ready, you’ll be able to pivot your data.

Let’s say that your goal is to determine the:

  1. Total sales per employee
  2. Total sales by country
  3. Sales by both employee and country
  4. Maximum individual sale by country
  5. Mean, median and minimum sales by country

Next, you’ll see how to pivot the data based on those 5 scenarios.

5 Scenarios of Pivot Tables in Python using Pandas

Scenario 1: Total sales per employee

To get the total sales per employee, you’ll need to add the following syntax to the Python code:

pivot = df.pivot_table(index=['Name of Employee'], values=['Sales'], aggfunc='sum')

This will allow you to sum the sales (across the 4 quarters) per employee by using the aggfunc=’sum’ operation.

Your complete Python code would look like this:

import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

pivot = df.pivot_table(index=['Name of Employee'], values=['Sales'], aggfunc='sum')

print (pivot)

Once you run the code, you’ll get the total sales by employee:

How to Create a Pivot Table in Python using Pandas

Scenario 2: Total sales by country

Now, you’ll see how to group the total sales by the county.

Here, you’ll need to aggregate the results by the ‘Country‘ field, rather than the ‘Name of Employee’ as you saw in the first scenario.

You may then run the following code in Python:

import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc='sum')

print (pivot)

You’ll then get the total sales by county:

Table in Python

But what if you want to plot these results?

In order to do so, you’ll need to add the following 3 components into the code:

  • import matplotlib.pyplot as plt at the top of the code
  • plot() at the end of the ‘pivot’ variable
  • plt.show() at the bottom of the code

Before you can run the code below, make sure that the matplotlib package is installed in Python.

import matplotlib.pyplot as plt
import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc='sum').plot()

plt.show()

You’ll then get this graph when you run the code:

How to plot a Pivot Table in Python using Pandas

Scenario 3: Sales by both employee and country

You may aggregate the results by more than one field (unlike the previous two scenarios where you aggregated the results based on a single field).

For example, you may use the following two fields to get the sales by both the:

  • Name of Employee; and
  • Country
import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

pivot = df.pivot_table(index=['Name of Employee','Country'], values=['Sales'], aggfunc='sum')

print (pivot)

Run the code, and you’ll see the sales by both the employee and country:

Create a Pivot Table in Python using Pandas

Scenario 4: Maximum individual sale by country

So far, you used the sum operation (i.e., aggfunc=’sum’) to group the results, but you are not limited to that operation.

In this scenario, you’ll find the maximum individual sale by county using the aggfunc=’max’

import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc='max')

print (pivot)

And the result:

Pivot Table Pandas

Scenario 5: Mean, median and minimum sales by country

You can use multiple operations within the aggfunc argument. For example, to find the mean, median and minimum sales by country, you may use:

aggfunc={'median','mean','min'}

And here is the complete Python code:

import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc={'median','mean','min'})

print (pivot)

You’ll then get the following results:

Dataframe

Wanna plot the results?

No problem, just apply the following code:

import matplotlib.pyplot as plt
import pandas as pd

employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc={'median','mean','min'}).plot()

plt.show()

You’ll then get this plot:

Plot Python

Conclusion – Pivot Table in Python using Pandas

Pivot tables are traditionally associated with MS Excel. However, you can easily create a pivot table in Python using pandas.

You just saw how to create pivot tables across 5 simple scenarios. But the concepts reviewed here can be applied across large number of different scenarios.

You can find additional information about pivot tables by visiting the pandas documentation.