Here is the syntax that you can use to filter Pandas DataFrame based on the index:
df = df.filter(like = 'index to keep', axis=0)
Let’s review an example to see how to apply the above syntax in practice.
The Example
Suppose that you created the DataFrame below:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price']) print(df)
As you can see in red, the index values are located on the left, starting from 0 and ending at 6:
Filter Pandas DataFrame Based on the Index
Let’s say that you want to select the row with the index of 2 (for the ‘Monitor’ product) while filtering out all the other rows. In that case, simply add the following syntax to the original code:
df = df.filter(like = '2', axis=0)
So the complete Python code to keep the row with the index of 2 is:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price']) df = df.filter(like = '2', axis=0) print(df)
Run the code, and you’ll notice that only the row with the index of 2 is kept, while all the other rows are filtered out:
Filter Pandas DataFrame for a Non-numeric Index
Say that you assigned a non-numeric index to the DataFrame:
index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']
The new DataFrame would look as follows:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']) print(df)
You’ll now see the non-numeric index on the left side:
For illustration purposes, let’s keep the row with the index of ‘Item_C‘ while filtering out all the other items:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']) df = df.filter(like = 'Item_C', axis=0) print(df)
You’ll now get only the row with the index of Item_C:
Alternatively, you could select the same row with the index of ‘Item_C’ by setting like = ‘C’ (as the letter ‘C’ doesn’t appear in any of the other index values):
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']) df = df.filter(like = 'C', axis=0) print(df)
You’ll get the same row with the index of Item_C:
Select Multiple Index values
What if you’d like to select multiple index values?
For example, let’s modify the index values to the following:
index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']
So the new DataFrame would look like this:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']) print(df)
Run the code, and you’ll see the new index on the left side of the DataFrame:
If, for example, you set like = ‘Item_C’ then you’ll get multiple items (as the string of Item_C appears in 5 locations):
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']) df = df.filter(like = 'Item_C', axis=0) print(df)
As you can see, there are 5 index values where the string of ‘Item_C’ can be found:
Optionally, let’s set like = ‘CC’ as follows:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']) df = df.filter(like = 'CC', axis=0) print(df)
In that case, only two items (which contain ‘CC’) will be selected:
You can learn more about df.filter by checking the Pandas Documentation.