Two approaches to set a column as the index in Pandas DataFrame:
(1) Set a single column as Index:
df.set_index('column', inplace=True)
(2) Set multiple columns as MultiIndex:
df.set_index(['column_1', 'column_2', ...], inplace=True)
Steps to Set Column as Index in Pandas DataFrame
Step 1: Create a DataFrame
To start with a simple example, let’s say that you’d like to create a DataFrame given the following data:
Product | Brand | Price |
AAA | A | 200 |
BBB | B | 700 |
CCC | C | 400 |
DDD | D | 1200 |
EEE | E | 900 |
You may then run the code below to create the DataFrame:
import pandas as pd data = {'Product': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'], 'Brand': ['A', 'B', 'C', 'D', 'E'], 'Price': [200, 700, 400, 1200, 900] } df = pd.DataFrame(data) print(df)
You’ll now get the following DataFrame:
Product Brand Price
0 AAA A 200
1 BBB B 700
2 CCC C 400
3 DDD D 1200
4 EEE E 900
As you may see in yellow, the current index contains sequential numeric values (staring from zero). In the next step, you’ll see how to change that default index.
Step 2: Set a single column as Index in Pandas DataFrame
You may use the following approach in order to set a single column as the index in the DataFrame:
df.set_index('column', inplace=True)
For example, let’s say that you’d like to set the ‘Product‘ column as the index.
In that case, you may apply the code below to accomplish this goal:
import pandas as pd data = {'Product': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'], 'Brand': ['A', 'B', 'C', 'D', 'E'], 'Price': [200, 700, 400, 1200, 900] } df = pd.DataFrame(data) df.set_index('Product', inplace=True) print(df)
As you can see, the ‘Product’ column would now become the new index:
Brand Price
Product
AAA A 200
BBB B 700
CCC C 400
DDD D 1200
EEE E 900
Step 3 (optional): Set multiple columns as MultiIndex:
Alternatively, you may use this approach to set multiple columns as the MultiIndex:
df.set_index(['column_1', 'column_2', ...], inplace=True)
For instance, let’s say that you’d like to set both the ‘Product‘ and the ‘Brand‘ columns as the MultiIndex.
In that case, you may run this code:
import pandas as pd data = {'Product': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'], 'Brand': ['A', 'B', 'C', 'D', 'E'], 'Price': [200, 700, 400, 1200, 900] } df = pd.DataFrame(data) df.set_index(['Product', 'Brand'], inplace=True) print(df)
As you may observe, both the ‘Product’ and the ‘Brand’ columns became the new MultiIndex:
Price
Product Brand
AAA A 200
BBB B 700
CCC C 400
DDD D 1200
EEE E 900
You may also want to check the Pandas Documentation for further information about df.set_index.