Filter Rows in Pandas DataFrame using RegEx

Here are different ways to filter rows in Pandas DataFrame using RegEx:

(1) Get all the rows where the name starts with ‘B’:

import pandas as pd

data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'],
        'age': [28, 42, 33, 57, 25],
        'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India']
        }

df = pd.DataFrame(data)

pattern = r'^B\w*'

df_filtered = df[df['name'].str.contains(pattern)]

print(df_filtered)

Only the rows where the name starts with ‘B’ are obtained:

   name  age country
0  Bill   28  Brazil
4   Bob   25   India

For the pattern of ^B\w*

  • ^ represents the start of a string
  • B represents the ‘B’ letter
  • \w* represents zero or more characters

(2) Get all the rows where the name ends with ‘l’:

import pandas as pd

data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'],
        'age': [28, 42, 33, 57, 25],
        'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India']
        }

df = pd.DataFrame(data)

pattern = r'\w*l$'

df_filtered = df[df['name'].str.contains(pattern)]

print(df_filtered)

Only the rows where the name ends with ‘l’ are retrieved:

    name  age  country
0   Bill   28   Brazil
3  April   57  Bolivia

For the pattern of \w*l$

  • \w* represents zero or more characters
  • l represents the ‘l’ letter
  • $ signifies the end of a string

(3) Get all the rows where the name starts with ‘B’ or the country starts with ‘C’:

import pandas as pd

data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'],
        'age': [28, 42, 33, 57, 25],
        'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India']
        }

df = pd.DataFrame(data)

pattern_name = r'^B\w*'
pattern_city = r'^C\w*'

df_filtered = df[df['name'].str.contains(pattern_name) | df['country'].str.contains(pattern_city)]

print(df_filtered)

Here the name starts with ‘B’ or the country starts with ‘C’:

    name  age   country
0   Bill   28    Brazil
2  David   33  Cambodia
4    Bob   25     India

(4) Get all the rows where the name starts with ‘A’ or the country ends with ‘o’:

import pandas as pd

data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'],
        'age': [28, 42, 33, 57, 25],
        'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India']
        }

df = pd.DataFrame(data)

pattern_name = r'^A\w*'
pattern_city = r'\w*o$'

df_filtered = df[df['name'].str.contains(pattern_name) | df['country'].str.contains(pattern_city)]

print(df_filtered)

Now the name starts with ‘A’ or the country ends with ‘o’:

    name  age  country
1  Maria   42   Mexico
3  April   57  Bolivia

(5) Get all the rows where the age starts with ‘2’:

import pandas as pd

data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'],
        'age': [28, 42, 33, 57, 25],
        'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India']
        }

df = pd.DataFrame(data)

pattern = r'^2'

df_filtered = df[df['age'].astype(str).str.contains(pattern)]

print(df_filtered)

Only the rows where the age starts with ‘2’ are obtained:

   name  age country
0  Bill   28  Brazil
4   Bob   25   India

Or by:

import pandas as pd

data = {'name': ['Bill', 'Maria', 'David', 'April', 'Bob'],
        'age': [28, 42, 33, 57, 25],
        'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India']
        }

df = pd.DataFrame(data)

df_filtered = df[df['age'].astype(str).str.startswith('2')]

print(df_filtered)

The result:

   name  age country
0  Bill   28  Brazil
4   Bob   25   India

Leave a Comment