Replace Characters in Strings in Pandas DataFrame

Here are two ways to replace characters in strings in Pandas DataFrame:

(1) Replace character/s under a single DataFrame column:

df['column name'] = df['column name'].str.replace('old character','new character')

(2) Replace character/s under the entire DataFrame:

df = df.replace('old character','new character', regex=True)

In this short guide, you’ll see how to replace:

  • Specific character under a single DataFrame column
  • Specific character under the entire DataFrame
  • Sequence of Characters

Replace a Specific Character under a Single DataFrame Column

Let’s create a simple DataFrame with two columns that contain strings:

import pandas as pd

colors = {'first_set':  ['aa_bb','cc_dd','ee_ff','gg_hh'],
          'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

print (df)

This is how the DataFrame would look like:

  first_set   second_set
0     aa_bb        ii_jj
1     cc_dd        kk_ll
2     ee_ff        mm_nn
3     gg_hh        oo_pp

The goal is to replace the underscore (“_”) character with a pipe (“|”) character under the ‘first_set‘ column.

To achieve this goal, you’ll need to add the following syntax to the code:

df['first_set'] = df['first_set'].str.replace('_','|')

So the complete Python code to perform the replacement is as follows:

import pandas as pd

colors = {'first_set':  ['aa_bb','cc_dd','ee_ff','gg_hh'],
          'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df['first_set'] = df['first_set'].str.replace('_','|')

print (df)

As you can see, the underscore character was replaced with a pipe character under the ‘first_set’ column:

  first_set   second_set
0     aa|bb        ii_jj
1     cc|dd        kk_ll
2     ee|ff        mm_nn
3     gg|hh        oo_pp

Replace a Specific Character under the Entire DataFrame

What if you’d like to replace a specific character under the entire DataFrame?

For example, let’s replace the underscore character with a pipe character under the entire DataFrame.

In that case, you’ll need to apply the following syntax:

import pandas as pd

colors = {'first_set':  ['aa_bb','cc_dd','ee_ff','gg_hh'],
          'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df = df.replace('_','|', regex=True)

print (df)

You’ll now see that the underscore character was replaced with a pipe character under the entire DataFrame (under both the ‘first_set’ and the ‘second_set’ columns):

  first_set   second_set
0     aa|bb        ii|jj
1     cc|dd        kk|ll
2     ee|ff        mm|nn
3     gg|hh        oo|pp

Replace a Sequence of Characters

Let’s say that you want to replace a sequence of characters in Pandas DataFrame.

For instance, suppose that you created a new DataFrame where you’d like to replace the sequence of “_xyz_” with two pipes “||”

Here is the syntax to create the new DataFrame:

import pandas as pd

colors = {'first_set':  ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'],
          'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

print (df)

And this is how the new DataFrame would look like:

   first_set   second_set
0  aa_xyz_bb    ii_xyz_jj
1  cc_xyz_dd    kk_xyz_ll
2  ee_xyz_ff    mm_xyz_nn
3  gg_xyz_hh    oo_xyz_pp

You can then use the following code to replace the sequence of “_xyz_” with “||” under the ‘first_set’ column:

import pandas as pd

colors = {'first_set':  ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'],
          'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df['first_set'] = df['first_set'].str.replace('_xyz_','||')

print (df)

You’ll now see the newly replaced characters under the ‘first_set’ column:

  first_set   second_set
0    aa||bb    ii_xyz_jj
1    cc||dd    kk_xyz_ll
2    ee||ff    mm_xyz_nn
3    gg||hh    oo_xyz_pp

Alternatively, you could apply the code below to make the changes under the entire DataFrame:

import pandas as pd

colors = {'first_set':  ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'],
          'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df = df.replace('_xyz_','||', regex=True)

print (df)

Here is the result:

  first_set   second_set
0    aa||bb       ii||jj
1    cc||dd       kk||ll
2    ee||ff       mm||nn
3    gg||hh       oo||pp

You can learn more about df.replace by visiting the Pandas Documentation.