Here are two ways to replace characters in strings in Pandas DataFrame:
(1) Replace character/s under a single DataFrame column:
df['column name'] = df['column name'].str.replace('old character','new character')
(2) Replace character/s under the entire DataFrame:
df = df.replace('old character','new character', regex=True)
In this short guide, you’ll see how to replace:
- Specific character under a single DataFrame column
- Specific character under the entire DataFrame
- Sequence of Characters
Replace a Specific Character under a Single DataFrame Column
Let’s create a simple DataFrame with two columns that contain strings:
import pandas as pd colors = {'first_set': ['aa_bb','cc_dd','ee_ff','gg_hh'], 'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp'] } df = pd.DataFrame(colors, columns= ['first_set','second_set']) print (df)
This is how the DataFrame would look like:
first_set second_set
0 aa_bb ii_jj
1 cc_dd kk_ll
2 ee_ff mm_nn
3 gg_hh oo_pp
The goal is to replace the underscore (“_”) character with a pipe (“|”) character under the ‘first_set‘ column.
To achieve this goal, you’ll need to add the following syntax to the code:
df['first_set'] = df['first_set'].str.replace('_','|')
So the complete Python code to perform the replacement is as follows:
import pandas as pd colors = {'first_set': ['aa_bb','cc_dd','ee_ff','gg_hh'], 'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp'] } df = pd.DataFrame(colors, columns= ['first_set','second_set']) df['first_set'] = df['first_set'].str.replace('_','|') print (df)
As you can see, the underscore character was replaced with a pipe character under the ‘first_set’ column:
first_set second_set
0 aa|bb ii_jj
1 cc|dd kk_ll
2 ee|ff mm_nn
3 gg|hh oo_pp
Replace a Specific Character under the Entire DataFrame
What if you’d like to replace a specific character under the entire DataFrame?
For example, let’s replace the underscore character with a pipe character under the entire DataFrame.
In that case, you’ll need to apply the following syntax:
import pandas as pd colors = {'first_set': ['aa_bb','cc_dd','ee_ff','gg_hh'], 'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp'] } df = pd.DataFrame(colors, columns= ['first_set','second_set']) df = df.replace('_','|', regex=True) print (df)
You’ll now see that the underscore character was replaced with a pipe character under the entire DataFrame (under both the ‘first_set’ and the ‘second_set’ columns):
first_set second_set
0 aa|bb ii|jj
1 cc|dd kk|ll
2 ee|ff mm|nn
3 gg|hh oo|pp
Replace a Sequence of Characters
Let’s say that you want to replace a sequence of characters in Pandas DataFrame.
For instance, suppose that you created a new DataFrame where you’d like to replace the sequence of “_xyz_” with two pipes “||”
Here is the syntax to create the new DataFrame:
import pandas as pd colors = {'first_set': ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'], 'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp'] } df = pd.DataFrame(colors, columns= ['first_set','second_set']) print (df)
And this is how the new DataFrame would look like:
first_set second_set
0 aa_xyz_bb ii_xyz_jj
1 cc_xyz_dd kk_xyz_ll
2 ee_xyz_ff mm_xyz_nn
3 gg_xyz_hh oo_xyz_pp
You can then use the following code to replace the sequence of “_xyz_” with “||” under the ‘first_set’ column:
import pandas as pd colors = {'first_set': ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'], 'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp'] } df = pd.DataFrame(colors, columns= ['first_set','second_set']) df['first_set'] = df['first_set'].str.replace('_xyz_','||') print (df)
You’ll now see the newly replaced characters under the ‘first_set’ column:
first_set second_set
0 aa||bb ii_xyz_jj
1 cc||dd kk_xyz_ll
2 ee||ff mm_xyz_nn
3 gg||hh oo_xyz_pp
Alternatively, you could apply the code below to make the changes under the entire DataFrame:
import pandas as pd colors = {'first_set': ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'], 'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp'] } df = pd.DataFrame(colors, columns= ['first_set','second_set']) df = df.replace('_xyz_','||', regex=True) print (df)
Here is the result:
first_set second_set
0 aa||bb ii||jj
1 cc||dd kk||ll
2 ee||ff mm||nn
3 gg||hh oo||pp
You can learn more about df.replace by visiting the Pandas Documentation.