Select Rows Containing a Substring in Pandas DataFrame

In this short guide, you’ll see how to select rows that contain a specific substring in Pandas DataFrame.

In particular, you’ll observe 5 cases to get all rows that:

  1. Contain a specific substring
  2. Contain one substring OR another substring
  3. Do NOT contain given substrings
  4. Contain a specific substring in the middle of a string
  5. Contain a specific numeric value

The Example

To start, create a DataFrame in Python with the following data:

import pandas as pd

data = {
"month": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October",
"November", "December"],
"days_in_month": [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
}

df = pd.DataFrame(data)

print(df)

Run the code, and you’ll get this DataFrame:

        month  days_in_month
0     January             31
1    February             28
2       March             31
3       April             30
4         May             31
5        June             30
6        July             31
7      August             31
8   September             30
9     October             31
10   November             30
11   December             31

5 Cases

Case 1: Get all rows that contain a specific substring

To get all the months that contain the substring of “Ju” (for the months of “June” and “July”):

import pandas as pd

data = {
"month": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October",
"November", "December"],
"days_in_month": [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
}

df = pd.DataFrame(data)

contain_values = df[df["month"].str.contains("Ju")]

print(contain_values)

As you can see, the only two months that contain the substring of “Ju” are June and July:

  month  days_in_month
5  June             30
6  July             31

Note that str.contains() is case sensitive. Meaning that if you specified “ju” (all in lowercase), then you won’t get any selection:

import pandas as pd

data = {
"month": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October",
"November", "December"],
"days_in_month": [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
}

df = pd.DataFrame(data)

contain_values = df[df["month"].str.contains("ju")]

print(contain_values)

In that case, you’ll get an empty DataFrame:

Empty DataFrame
Columns: [month, days_in_month]
Index: []

Case 2: Get all rows that contain one substring OR another substring

To get all the months that contain EITHER “Ju” OR “Ma” using the pipe symbol (“|”):

import pandas as pd

data = {
"month": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October",
"November", "December"],
"days_in_month": [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
}

df = pd.DataFrame(data)

contain_values = df[df["month"].str.contains("Ju|Ma")]

print(contain_values)

Here are the months that you’ll get:

   month  days_in_month
2  March             31
4    May             31
5   June             30
6   July             31

Case 3: Get all rows that do NOT contain given substrings

Now select all the months that neither contain “Ju” nor “Ma” by applying the “~” symbol before the df[“month”]:

import pandas as pd

data = {
"month": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October",
"November", "December"],
"days_in_month": [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
}

df = pd.DataFrame(data)

contain_values = df[~df["month"].str.contains("Ju|Ma")]

print(contain_values)

Run the code, and you’ll get the following months:

        month  days_in_month
0     January             31
1    February             28
3       April             30
7      August             31
8   September             30
9     October             31
10   November             30
11   December             31

Case 4: Get all rows that contain a specific substring in the middle of a string

To get all the months that contain “uar” (for January and February):

import pandas as pd

data = {
"month": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October",
"November", "December"],
"days_in_month": [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
}

df = pd.DataFrame(data)

contain_values = df[df["month"].str.contains("uar")]

print(contain_values)

You’ll now see the months of January and February:

      month  days_in_month
0   January             31
1  February             28

Case 5: Get all rows that contain a specific numeric value

In order to select all the rows which contain the numeric value of 0 under the “days_in_month” column, you’ll first need to convert that column from integers to strings:

import pandas as pd

data = {
"month": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October",
"November", "December"],
"days_in_month": [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
}

df = pd.DataFrame(data)

contain_values = df[df["days_in_month"].astype(str).str.contains("0")]

print(contain_values)

As you can see, only the months that contain the numeric value of 0 were selected:

        month  days_in_month
3       April             30
5        June             30
8   September             30
10   November             30

You can read more about str.contains by visiting the Pandas Documentation.

Leave a Comment