Replace Characters in Strings in Pandas DataFrame

Here are 2 ways to replace characters in strings in Pandas DataFrame:

(1) Replace character/s under a single DataFrame column:

df["column name"] = df["column name"].str.replace("old character", "new character")

(2) Replace character/s under an entire DataFrame:

df = df.replace("old character", "new character", regex=True)

Examples

Example 1: Replace a Specific Character under a Single DataFrame Column

To start, create a simple DataFrame with two columns that contain strings:

import pandas as pd

data = {
"first_set": ["aa_bb", "cc_dd", "ee_ff", "gg_hh"],
"second_set": ["ii_jj", "kk_ll", "mm_nn", "oo_pp"],
}

df = pd.DataFrame(data)

print(df)

The DataFrame would look like:

  first_set   second_set
0     aa_bb        ii_jj
1     cc_dd        kk_ll
2     ee_ff        mm_nn
3     gg_hh        oo_pp

The goal is to replace the underscore (“_”) character with the hyphen (“-“) character under the “first_set” column.

To achieve this goal, add the following syntax to the code:

df["first_set"] = df["first_set"].str.replace("_", "-")

So the complete Python code to perform the replacement is as follows:

import pandas as pd

data = {
"first_set": ["aa_bb", "cc_dd", "ee_ff", "gg_hh"],
"second_set": ["ii_jj", "kk_ll", "mm_nn", "oo_pp"],
}

df = pd.DataFrame(data)

df["first_set"] = df["first_set"].str.replace("_", "-")

print(df)

As you can see, the underscore character was replaced with the hyphen character under the “first_set” column:

  first_set  second_set
0     aa-bb       ii_jj
1     cc-dd       kk_ll
2     ee-ff       mm_nn
3     gg-hh       oo_pp

Example 2: Replace a Specific Character under the Entire DataFrame

To replace the underscore character with the hyphen character under the entire DataFrame:

import pandas as pd

data = {
"first_set": ["aa_bb", "cc_dd", "ee_ff", "gg_hh"],
"second_set": ["ii_jj", "kk_ll", "mm_nn", "oo_pp"],
}

df = pd.DataFrame(data)

df = df.replace("_", "-", regex=True)

print(df)

The result:

  first_set  second_set
0     aa-bb       ii-jj
1     cc-dd       kk-ll
2     ee-ff       mm-nn
3     gg-hh       oo-pp

Example 3: Replace a Sequence of Characters

Now assume that you have the following DataFrame:

import pandas as pd

data = {
"first_set": ["aa_xyz_bb", "cc_xyz_dd", "ee_xyz_ff", "gg_xyz_hh"],
"second_set": ["ii_xyz_jj", "kk_xyz_ll", "mm_xyz_nn", "oo_xyz_pp"],
}

df = pd.DataFrame(data)

print(df)

The DataFrame:

   first_set  second_set
0  aa_xyz_bb   ii_xyz_jj
1  cc_xyz_dd   kk_xyz_ll
2  ee_xyz_ff   mm_xyz_nn
3  gg_xyz_hh   oo_xyz_pp

To replace the sequence of “_xyz_” with two hyphens under the “first_set” column:

import pandas as pd

data = {
"first_set": ["aa_xyz_bb", "cc_xyz_dd", "ee_xyz_ff", "gg_xyz_hh"],
"second_set": ["ii_xyz_jj", "kk_xyz_ll", "mm_xyz_nn", "oo_xyz_pp"],
}

df = pd.DataFrame(data)

df["first_set"] = df["first_set"].str.replace("_xyz_", "--")

print(df)

You’ll now see the newly replaced characters under the “first_set” column:

  first_set  second_set
0    aa--bb   ii_xyz_jj
1    cc--dd   kk_xyz_ll
2    ee--ff   mm_xyz_nn
3    gg--hh   oo_xyz_pp

Alternatively, you can apply the code below to make the changes under the entire DataFrame:

import pandas as pd

data = {
"first_set": ["aa_xyz_bb", "cc_xyz_dd", "ee_xyz_ff", "gg_xyz_hh"],
"second_set": ["ii_xyz_jj", "kk_xyz_ll", "mm_xyz_nn", "oo_xyz_pp"],
}

df = pd.DataFrame(data)

df = df.replace("_xyz_", "--", regex=True)

print(df)

The result:

  first_set  second_set
0    aa--bb      ii--jj
1    cc--dd      kk--ll
2    ee--ff      mm--nn
3    gg--hh      oo--pp

You can learn more about df.replace by visiting the Pandas Documentation.

Leave a Comment