Here is the syntax that you can use to filter Pandas DataFrame based on the index:
df = df.filter(items = [index to keep], axis=0)
Let’s review an example to see how to apply the above syntax in practice.
The Example
Suppose that you created the DataFrame below:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price']) print(df)
As you can see in yellow, the index values are located on the left, starting from 0 and ending at 6:
Product Price
0 Computer 900
1 Printer 200
2 Monitor 300
3 Desk 450
4 Phone 150
5 Tablet 250
6 Scanner 150
Filter Pandas DataFrame Based on the Index
Let’s say that you want to select the row with the index of 2 (for the ‘Monitor’ product) while filtering out all the other rows. In that case, simply add the following syntax to the original code:
df = df.filter(items = [2], axis=0)
So the complete Python code to keep the row with the index of 2 is:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price']) df = df.filter(items = [2], axis=0) print(df)
Run the code, and you’ll notice that only the row with the index of 2 is kept, while all the other rows are filtered out:
Product Price
2 Monitor 300
What if you’d like to keep multiple rows based on the index values?
For example, let’s keep the ‘Monitor’ (with an index of 2) and the ‘Tablet’ (with an index of 5). In that case, you’ll need to set items = [2,5] as captured below:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price']) df = df.filter(items = [2,5], axis=0) print(df)
You’ll now get the two items:
Product Price
2 Monitor 300
5 Tablet 250
Filter Pandas DataFrame for a Non-numeric Index
Say that you assigned a non-numeric index to the DataFrame:
index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']
The new DataFrame would look as follows:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']) print(df)
You’ll now see the non-numeric index on the left side:
Product Price
Item_A Computer 900
Item_B Printer 200
Item_C Monitor 300
Item_D Desk 450
Item_E Phone 150
Item_F Tablet 250
Item_G Scanner 150
For illustration purposes, let’s keep the row with the index of ‘Item_C‘ while filtering out all the other items:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']) df = df.filter(items = ['Item_C'], axis=0) print(df)
You’ll now get only the row with the index of Item_C:
Product Price
Item_C Monitor 300
Alternatively, you could select the same row with the index of ‘Item_C’ by setting like = ‘C’ (as the letter ‘C’ doesn’t appear in any of the other index values):
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_A','Item_B','Item_C','Item_D','Item_E','Item_F','Item_G']) df = df.filter(like = 'C', axis=0) print(df)
You’ll get the same row with the index of Item_C:
Product Price
Item_C Monitor 300
Select Multiple Index Values that Contain Specific String
What if you’d like to select multiple index values that contain specific string?
For example, let’s modify the index values to the following:
index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']
So the new DataFrame would look like this:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']) print(df)
Run the code, and you’ll see the new index on the left side of the DataFrame:
Product Price
Item_AA Computer 900
Item_BB Printer 200
Item_CC Monitor 300
Item_CD Desk 450
Item_CCC Phone 150
Item_CA Tablet 250
Item_CB Scanner 150
If, for example, you set like = ‘Item_C’ then you’ll get multiple items (as the string of ‘Item_C’ is included in 5 locations):
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']) df = df.filter(like = 'Item_C', axis=0) print(df)
As you can see, there are 5 index values where the string of ‘Item_C’ can be found:
Product Price
Item_CC Monitor 300
Item_CD Desk 450
Item_CCC Phone 150
Item_CA Tablet 250
Item_CB Scanner 150
Optionally, let’s set like = ‘CC’ as follows:
import pandas as pd data = {'Product': ['Computer','Printer','Monitor','Desk','Phone','Tablet','Scanner'], 'Price': [900,200,300,450,150,250,150] } df = pd.DataFrame(data, columns = ['Product','Price'], index = ['Item_AA','Item_BB','Item_CC','Item_CD','Item_CCC','Item_CA','Item_CB']) df = df.filter(like = 'CC', axis=0) print(df)
In that case, only two items (which contain ‘CC’) will be selected:
Product Price
Item_CC Monitor 300
Item_CCC Phone 150
You can learn more about df.filter by checking the Pandas Documentation.