How to Filter Pandas DataFrame Based on Index

Here is the syntax that you can use to filter Pandas DataFrame based on the index:

df = df.filter(items = [index to keep], axis=0)

Let’s review an example to see how to apply the above syntax in practice.

The Example

Suppose that you created the DataFrame below:

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'])

print(df)

As you can see in yellow, the index values are located on the left, starting from 0 and ending at 6:

    Product  Price
0  Computer    900
1   Printer    200
2   Monitor    300
3      Desk    450
4     Phone    150
5    Tablet    250
6   Scanner    150

Filter Pandas DataFrame Based on the Index

Let’s say that you want to select the row with the index of 2 (for the ‘Monitor’ product) while filtering out all the other rows. In that case, simply add the following syntax to the original code:

df = df.filter(items = [2], axis=0)

So the complete Python code to keep the row with the index of 2 is:

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'])

df = df.filter(items = [2], axis=0)

print(df)

Run the code, and you’ll notice that only the row with the index of 2 is kept, while all the other rows are filtered out:

   Product  Price
2  Monitor    300

What if you’d like to keep multiple rows based on the index values?

For example, let’s keep the ‘Monitor’ (with an index of 2) and the ‘Tablet’ (with an index of 5). In that case, you’ll need to set items = [2,5] as captured below:

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'])

df = df.filter(items = [2,5], axis=0)

print(df)

You’ll now get the two items:

   Product  Price
2  Monitor    300
5   Tablet    250

Filter Pandas DataFrame for a Non-numeric Index

Say that you assigned a non-numeric index to the DataFrame:

index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']

The new DataFrame would look as follows:

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G'])

print(df)

You’ll now see the non-numeric index on the left side:

         Product  Price
Item_A  Computer    900
Item_B   Printer    200
Item_C   Monitor    300
Item_D      Desk    450
Item_E     Phone    150
Item_F    Tablet    250
Item_G   Scanner    150

For illustration purposes, let’s keep the row with the index of ‘Item_C‘ while filtering out all the other items:

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G'])

df = df.filter(items = ['Item_C'], axis=0)

print(df)

You’ll now get only the row with the index of Item_C:

        Product  Price
Item_C  Monitor    300

Alternatively, you could select the same row with the index of ‘Item_C’ by setting like = ‘C’ (as the letter ‘C’ doesn’t appear in any of the other index values):

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G'])

df = df.filter(like = 'C', axis=0)

print(df)

You’ll get the same row with the index of Item_C:

        Product  Price
Item_C  Monitor    300

Select Multiple Index Values that Contain Specific String

What if you’d like to select multiple index values that contain specific string?

For example, let’s modify the index values to the following:

index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']

So the new DataFrame would look like this:

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB'])

print(df)

Run the code, and you’ll see the new index on the left side of the DataFrame:

           Product  Price
Item_AA   Computer    900
Item_BB    Printer    200
Item_CC    Monitor    300
Item_CD       Desk    450
Item_CCC     Phone    150
Item_CA     Tablet    250
Item_CB    Scanner    150

If, for example, you set like = ‘Item_C’ then you’ll get multiple items (as the string of ‘Item_C’ is included in 5 locations):

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB'])

df = df.filter(like = 'Item_C', axis=0)

print(df)

As you can see, there are 5 index values where the string of ‘Item_C’ can be found:

          Product  Price
Item_CC   Monitor    300
Item_CD      Desk    450
Item_CCC    Phone    150
Item_CA    Tablet    250
Item_CB   Scanner    150

Optionally, let’s set like = ‘CC’ as follows:

import pandas as pd

data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'],
          'Price': [900,200,300,450,150,250,150]
        }

df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB'])

df = df.filter(like = 'CC', axis=0)

print(df)

In that case, only two items (which contain ‘CC’) will be selected:

          Product  Price
Item_CC   Monitor    300
Item_CCC    Phone    150

You can learn more about df.filter by checking the Pandas Documentation.