Set Column as Index in Pandas DataFrame

Two approaches to set a column as the index in Pandas DataFrame:

(1) Set a single column as Index:

df.set_index('column', inplace=True)

(2) Set multiple columns as MultiIndex:

df.set_index(['column_1', 'column_2', ...], inplace=True)

Steps to Set Column as Index in Pandas DataFrame

Step 1: Create a DataFrame

To start with a simple example, let’s say that you’d like to create a DataFrame given the following data:

ProductBrandPrice
AAAA200
BBBB700
CCCC400
DDDD1200
EEEE900

You may then run the code below to create the DataFrame:

import pandas as pd

data = {'Product': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'],
        'Brand': ['A', 'B', 'C', 'D', 'E'],
        'Price': [200, 700, 400, 1200, 900]
        }

df = pd.DataFrame(data)

print(df)

You’ll now get the following DataFrame:

  Product   Brand   Price
0     AAA       A     200
1     BBB       B     700
2     CCC       C     400
3     DDD       D    1200
4     EEE       E     900

As you may see in yellow, the current index contains sequential numeric values (staring from zero). In the next step, you’ll see how to change that default index.

Step 2: Set a single column as Index in Pandas DataFrame

You may use the following approach in order to set a single column as the index in the DataFrame:

df.set_index('column', inplace=True)

For example, let’s say that you’d like to set the ‘Product‘ column as the index.

In that case, you may apply the code below to accomplish this goal:

import pandas as pd

data = {'Product': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'],
        'Brand': ['A', 'B', 'C', 'D', 'E'],
        'Price': [200, 700, 400, 1200, 900]
        }

df = pd.DataFrame(data)

df.set_index('Product', inplace=True)

print(df)

As you can see, the ‘Product’ column would now become the new index:

        Brand  Price
Product             
AAA         A    200
BBB         B    700
CCC         C    400
DDD         D   1200
EEE         E    900

Step 3 (optional): Set multiple columns as MultiIndex:

Alternatively, you may use this approach to set multiple columns as the MultiIndex:

df.set_index(['column_1', 'column_2', ...], inplace=True)

For instance, let’s say that you’d like to set both the ‘Product‘ and the ‘Brand‘ columns as the MultiIndex.

In that case, you may run this code:

import pandas as pd

data = {'Product': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'],
        'Brand': ['A', 'B', 'C', 'D', 'E'],
        'Price': [200, 700, 400, 1200, 900]
        }

df = pd.DataFrame(data)

df.set_index(['Product', 'Brand'], inplace=True)

print(df)

As you may observe, both the ‘Product’ and the ‘Brand’ columns became the new MultiIndex:

               Price
Product Brand       
AAA     A        200
BBB     B        700
CCC     C        400
DDD     D       1200
EEE     E        900

You may also want to check the Pandas Documentation for further information about df.set_index.

Leave a Comment