Here are different ways to filter rows in Pandas DataFrame using RegEx:
(1) Get all the rows where the name starts with ‘B’:
import pandas as pd data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'], 'age': [28, 42, 33, 57, 25], 'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India'] } df = pd.DataFrame(data) pattern = r'^B\w*' df_filtered = df[df['name'].str.contains(pattern)] print(df_filtered)
Only the rows where the name starts with ‘B’ are obtained:
name age country
0 Bill 28 Brazil
4 Bob 25 India
For the pattern of ^B\w*
- ^ represents the start of a string
- B represents the ‘B’ letter
- \w* represents zero or more characters
(2) Get all the rows where the name ends with ‘l’:
import pandas as pd data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'], 'age': [28, 42, 33, 57, 25], 'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India'] } df = pd.DataFrame(data) pattern = r'\w*l$' df_filtered = df[df['name'].str.contains(pattern)] print(df_filtered)
Only the rows where the name ends with ‘l’ are retrieved:
name age country
0 Bill 28 Brazil
3 April 57 Bolivia
For the pattern of \w*l$
- \w* represents zero or more characters
- l represents the ‘l’ letter
- $ signifies the end of a string
(3) Get all the rows where the name starts with ‘B’ or the country starts with ‘C’:
import pandas as pd data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'], 'age': [28, 42, 33, 57, 25], 'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India'] } df = pd.DataFrame(data) pattern_name = r'^B\w*' pattern_city = r'^C\w*' df_filtered = df[df['name'].str.contains(pattern_name) | df['country'].str.contains(pattern_city)] print(df_filtered)
Here the name starts with ‘B’ or the country starts with ‘C’:
name age country
0 Bill 28 Brazil
2 David 33 Cambodia
4 Bob 25 India
(4) Get all the rows where the name starts with ‘A’ or the country ends with ‘o’:
import pandas as pd data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'], 'age': [28, 42, 33, 57, 25], 'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India'] } df = pd.DataFrame(data) pattern_name = r'^A\w*' pattern_city = r'\w*o$' df_filtered = df[df['name'].str.contains(pattern_name) | df['country'].str.contains(pattern_city)] print(df_filtered)
Now the name starts with ‘A’ or the country ends with ‘o’:
name age country
1 Maria 42 Mexico
3 April 57 Bolivia
(5) Get all the rows where the age starts with ‘2’:
import pandas as pd data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'], 'age': [28, 42, 33, 57, 25], 'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India'] } df = pd.DataFrame(data) pattern = r'^2' df_filtered = df[df['age'].astype(str).str.contains(pattern)] print(df_filtered)
Only the rows where the age starts with ‘2’ are obtained:
name age country
0 Bill 28 Brazil
4 Bob 25 India
Or by:
import pandas as pd data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'], 'age': [28, 42, 33, 57, 25], 'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India'] } df = pd.DataFrame(data) df_filtered = df[df['age'].astype(str).str.startswith('2')] print(df_filtered)
The result:
name age country
0 Bill 28 Brazil
4 Bob 25 India