How to Create a Pivot Table in Python using Pandas

Often times, pivot tables are associated with MS Excel. But did you know that you can also create a pivot table in Python using pandas?

You bet.

And so, in this tutorial, I’ll show you the steps to create a pivot table in Python using pandas. I’ll use a simple data-set about employees and sales to demonstrate this concept.

Create a Pivot Table in Python using Pandas

To start, here is the data-set that will be used to create a pivot table in Python:

 

Name of EmployeeSalesQuarterCountry
Jon10001US
Mark3001Japan
Tina4001Brazil
Maria5001UK
Bill8001US
Jon10002Brazil
Mark5002Japan
Tina7002Brazil
Maria502US
Bill602US
Jon10003US
Mark9003Japan
Tina7503Brazil
Maria2003UK
Bill3003Brazil
Jon10004Japan
Mark9004Japan
Tina2504Brazil
Maria7504UK
Bill504US

 

Firstly, you’ll need to reflect the above data in Python.

But how would you do that?

You can accomplish this task by capturing the data in Python using pandas DataFrame:

 

from pandas import DataFrame

Employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = DataFrame(Employees, columns= ['Name of Employee', 'Sales','Quarter','Country'])

print (df)

 

Press F5 to run the above code in Python, and you should get:

 

df pandas

 

Once you have your DataFrame ready, you’ll be able to pivot your data.

Let’s say that your goal is to determine the:

  • Total sales per employee
  • Total sales by country
  • Sales by both employee and country
  • Max individual sale by country
  • Mean, median and min sales by country

Next, I’ll show you how to pivot the data based on those 5 scenarios.

Scenario 1: Total sales per employee

To get the total sales per employee, you’ll need to add the following syntax to the Python code:

 

pivot = df.pivot_table(index=['Name of Employee'], values=['Sales'], aggfunc='sum')

 

This will allow you to sum the sales (across the 4 quarters) per employee by using the aggfunc=’sum’ operation.

Your full Python code, would look like this:

 

# Total sales per employee
from pandas import DataFrame

Employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = DataFrame(Employees, columns= ['Name of Employee', 'Sales','Quarter','Country'])


pivot = df.pivot_table(index=['Name of Employee'], values=['Sales'], aggfunc='sum')

print (pivot)

 

And once you run the code, you’ll get the total sales by employee:

How to Create a Pivot Table in Python using Pandas

Scenario 2: Total sales by country

Now, we’ll see how to group the total sales by the county.

Here the only difference is that you’ll need to aggregate the results by the ‘Country’ field, rather than the ‘Name of Employee’ as we saw in the first scenario.

And this is the complete Python code that you may use:

 

#Total sales by country
from pandas import DataFrame


Employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = DataFrame(Employees, columns= ['Name of Employee', 'Sales','Quarter','Country'])


pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc='sum')

print (pivot)

 

Once you run the code, you’ll get:

Table in Python

But what if you want to plot these results?

You can easily do that by adding  ‘.plot()’ at the end of the ‘pivot’ variable:

 

#plot total sales by country
from pandas import DataFrame


Employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = DataFrame(Employees, columns= ['Name of Employee', 'Sales','Quarter','Country'])


pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc='sum').plot()

print (pivot)

 

You’ll get this graph when you run the code:

How to plot a Pivot Table in Python using Pandas

Scenario 3: Sales by both employee and country

You may aggregate the results by more than one field (unlike the previous two scenarios where we aggregated the results based on a single field).

For example, you may use the following two fields to get the sales by both:

  • Name of Employee; and
  • Country

 

#Sales by both Employee and Country
from pandas import DataFrame


Employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = DataFrame(Employees, columns= ['Name of Employee', 'Sales','Quarter','Country'])


pivot = df.pivot_table(index=['Name of Employee','Country'], values=['Sales'], aggfunc='sum')

print (pivot)

 

Run the code, and you’ll see the sales by both the employee and country:

Create a Pivot Table in Python using Pandas

Scenario 4: Max individual sale by country

So far we used the sum operation (i.e., aggfunc=’sum’) to group our results, but you are not limited to that operation.

In this scenario, we’ll find the maximum individual sale by county by using the aggfunc=’max’

 

#max individual sale by country
from pandas import DataFrame


Employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = DataFrame(Employees, columns= ['Name of Employee', 'Sales','Quarter','Country'])


pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc='max')

print (pivot)

 

And the result:

Pivot Table Pandas

Scenario 5: Mean, median and min sales by country

You can use multiple operations within the aggfunc argument. For example, to find the mean, median and min sales by country, you may use:

 

aggfunc={'median','mean','min'}

 

And here is the complete Python code:

 

#multiple measurements
from pandas import DataFrame


Employees = {'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
             'Sales': [1000,300,400,500,800,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
             'Quarter': [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
             'Country': ['US','Japan','Brazil','UK','US','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
            }

df = DataFrame(Employees, columns= ['Name of Employee', 'Sales','Quarter','Country'])


pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc={'median','mean','min'})

print (pivot)

 

And here are the results:

How to Create a Pivot Table in Python using Pandas

Wanna plot the results?

No problem, just add ‘.plot()’ at the end of the ‘pivot’ variable:

 

pivot = df.pivot_table(index=['Country'], values=['Sales'], aggfunc={'median','mean','min'}).plot()

 

And you’ll get the plot:

Plot a Pivot Table in Python using Pandas

Conclusion

Pivot tables are traditionally associated with MS Excel. However, you can easily create a pivot table in Python using pandas.

We just saw how to create pivot tables across 5 simple scenarios. But the concepts reviewed can be applied to lots of different scenarios.

You can find additional information about pivot tables by visiting the pandas documentation.