Here are 2 ways to replace characters in strings in Pandas DataFrame:
(1) Replace character/s under a single DataFrame column:
df["column name"] = df["column name"].str.replace("old character", "new character")
(2) Replace character/s under an entire DataFrame:
df = df.replace("old character", "new character", regex=True)
Examples
Example 1: Replace a Specific Character under a Single DataFrame Column
To start, create a simple DataFrame with two columns that contain strings:
import pandas as pd
data = {
"first_set": ["aa_bb", "cc_dd", "ee_ff", "gg_hh"],
"second_set": ["ii_jj", "kk_ll", "mm_nn", "oo_pp"],
}
df = pd.DataFrame(data)
print(df)
The DataFrame would look like:
first_set second_set
0 aa_bb ii_jj
1 cc_dd kk_ll
2 ee_ff mm_nn
3 gg_hh oo_pp
The goal is to replace the underscore (“_”) character with the hyphen (“-“) character under the “first_set” column.
To achieve this goal, add the following syntax to the code:
df["first_set"] = df["first_set"].str.replace("_", "-")
So the complete Python code to perform the replacement is as follows:
import pandas as pd
data = {
"first_set": ["aa_bb", "cc_dd", "ee_ff", "gg_hh"],
"second_set": ["ii_jj", "kk_ll", "mm_nn", "oo_pp"],
}
df = pd.DataFrame(data)
df["first_set"] = df["first_set"].str.replace("_", "-")
print(df)
As you can see, the underscore character was replaced with the hyphen character under the “first_set” column:
first_set second_set
0 aa-bb ii_jj
1 cc-dd kk_ll
2 ee-ff mm_nn
3 gg-hh oo_pp
Example 2: Replace a Specific Character under the Entire DataFrame
To replace the underscore character with the hyphen character under the entire DataFrame:
import pandas as pd
data = {
"first_set": ["aa_bb", "cc_dd", "ee_ff", "gg_hh"],
"second_set": ["ii_jj", "kk_ll", "mm_nn", "oo_pp"],
}
df = pd.DataFrame(data)
df = df.replace("_", "-", regex=True)
print(df)
The result:
first_set second_set
0 aa-bb ii-jj
1 cc-dd kk-ll
2 ee-ff mm-nn
3 gg-hh oo-pp
Example 3: Replace a Sequence of Characters
Now assume that you have the following DataFrame:
import pandas as pd
data = {
"first_set": ["aa_xyz_bb", "cc_xyz_dd", "ee_xyz_ff", "gg_xyz_hh"],
"second_set": ["ii_xyz_jj", "kk_xyz_ll", "mm_xyz_nn", "oo_xyz_pp"],
}
df = pd.DataFrame(data)
print(df)
The DataFrame:
first_set second_set
0 aa_xyz_bb ii_xyz_jj
1 cc_xyz_dd kk_xyz_ll
2 ee_xyz_ff mm_xyz_nn
3 gg_xyz_hh oo_xyz_pp
To replace the sequence of “_xyz_” with two hyphens under the “first_set” column:
import pandas as pd
data = {
"first_set": ["aa_xyz_bb", "cc_xyz_dd", "ee_xyz_ff", "gg_xyz_hh"],
"second_set": ["ii_xyz_jj", "kk_xyz_ll", "mm_xyz_nn", "oo_xyz_pp"],
}
df = pd.DataFrame(data)
df["first_set"] = df["first_set"].str.replace("_xyz_", "--")
print(df)
You’ll now see the newly replaced characters under the “first_set” column:
first_set second_set
0 aa--bb ii_xyz_jj
1 cc--dd kk_xyz_ll
2 ee--ff mm_xyz_nn
3 gg--hh oo_xyz_pp
Alternatively, you can apply the code below to make the changes under the entire DataFrame:
import pandas as pd
data = {
"first_set": ["aa_xyz_bb", "cc_xyz_dd", "ee_xyz_ff", "gg_xyz_hh"],
"second_set": ["ii_xyz_jj", "kk_xyz_ll", "mm_xyz_nn", "oo_xyz_pp"],
}
df = pd.DataFrame(data)
df = df.replace("_xyz_", "--", regex=True)
print(df)
The result:
first_set second_set
0 aa--bb ii--jj
1 cc--dd kk--ll
2 ee--ff mm--nn
3 gg--hh oo--pp
You can learn more about df.replace by visiting the Pandas Documentation.