Filter Pandas DataFrame Based on Index

To filter Pandas DataFrame based on the index:

df_filter = df.filter(items=[index to keep], axis=0)

Example

Suppose that you created the DataFrame below:

import pandas as pd

data = {
"Product": ["Computer", "Printer", "Monitor", "Desk", "Phone", "Tablet", "Scanner"],
"Price": [900, 200, 300, 450, 150, 250, 150],
}

df = pd.DataFrame(data)

print(df)

As you can see in yellow, the index values are located on the left, starting from 0 and ending at 6:

    Product  Price
0  Computer    900
1   Printer    200
2   Monitor    300
3      Desk    450
4     Phone    150
5    Tablet    250
6   Scanner    150

Example 1: Filter Pandas DataFrame Based on the Index

To select the row with the index of 2 (for the “Monitor’” product) while filtering out all the other rows:

df_filter = df.filter(items=[2], axis=0)

So the complete Python code is:

import pandas as pd

data = {
"Product": ["Computer", "Printer", "Monitor", "Desk", "Phone", "Tablet", "Scanner"],
"Price": [900, 200, 300, 450, 150, 250, 150],
}

df = pd.DataFrame(data)

df_filter = df.filter(items=[2], axis=0)

print(df_filter)

Notice that only the row with the index of 2 is kept, while all the other rows are filtered out:

   Product  Price
2  Monitor    300

What if you’d like to keep multiple rows based on the index values?

For example, to keep the “Monitor” (with an index of 2) and the “Tablet” (with an index of 5), set items = [2, 5]:

import pandas as pd

data = {
"Product": ["Computer", "Printer", "Monitor", "Desk", "Phone", "Tablet", "Scanner"],
"Price": [900, 200, 300, 450, 150, 250, 150],
}

df = pd.DataFrame(data)

df_filter = df.filter(items=[2, 5], axis=0)

print(df_filter)

You’ll now get the two items:

   Product  Price
2  Monitor    300
5   Tablet    250

Example 2: Filter Pandas DataFrame for a Non-numeric Index

Next, assign a non-numeric index to the DataFrame:

index=["Item_A", "Item_B", "Item_C", "Item_D", "Item_E", "Item_F", "Item_G"]

The new DataFrame would look as follows:

import pandas as pd

data = {
"Product": ["Computer", "Printer", "Monitor", "Desk", "Phone", "Tablet", "Scanner"],
"Price": [900, 200, 300, 450, 150, 250, 150],
}

df = pd.DataFrame(
data, index=["Item_A", "Item_B", "Item_C", "Item_D", "Item_E", "Item_F", "Item_G"]
)

print(df)

You’ll now see the non-numeric index on the left side:

         Product  Price
Item_A  Computer    900
Item_B   Printer    200
Item_C   Monitor    300
Item_D      Desk    450
Item_E     Phone    150
Item_F    Tablet    250
Item_G   Scanner    150

To keep the row with the index of “Item_C” while filtering out all the other items:

import pandas as pd

data = {
"Product": ["Computer", "Printer", "Monitor", "Desk", "Phone", "Tablet", "Scanner"],
"Price": [900, 200, 300, 450, 150, 250, 150],
}

df = pd.DataFrame(
data, index=["Item_A", "Item_B", "Item_C", "Item_D", "Item_E", "Item_F", "Item_G"]
)

df_filter = df.filter(items=["Item_C"], axis=0)

print(df_filter)

As you can see, only the row with the index of “Item_C” is kept:

        Product  Price
Item_C  Monitor    300

Alternatively, you can select the same row with the index of “Item_C” by setting like = C” (as the letter “C” doesn’t appear in any of the other index values):

import pandas as pd

data = {
"Product": ["Computer", "Printer", "Monitor", "Desk", "Phone", "Tablet", "Scanner"],
"Price": [900, 200, 300, 450, 150, 250, 150],
}

df = pd.DataFrame(
data, index=["Item_A", "Item_B", "Item_C", "Item_D", "Item_E", "Item_F", "Item_G"]
)

df_filter = df.filter(like="C", axis=0)

print(df_filter)

The result is the same row with the index of “Item_C”:

        Product  Price
Item_C  Monitor    300

You can learn more about df.filter by checking the Pandas Documentation.

Leave a Comment