Replace Characters in Strings in Pandas DataFrame

Here are two ways to replace characters in strings in Pandas DataFrame:

(1) Replace character/s under a single DataFrame column:

df['column name'] = df['column name'].str.replace('old character','new character')

(2) Replace character/s under the entire DataFrame:

df = df.replace('old character','new character', regex=True)

In this short guide, you’ll see how to replace:

  • Specific character under a single DataFrame column
  • Specific character under the entire DataFrame
  • Sequence of Characters

Replace a Specific Character under a Single DataFrame Column

Let’s create a simple DataFrame with two columns that contain strings:

import pandas as pd

colors = {'first_set':  ['aa_bb','cc_dd','ee_ff','gg_hh'],
          'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

print (df)

This is how the DataFrame would look like:

Replace Characters in Strings in Pandas DataFrame

The goal is to replace the underscore (“_”) character with a pipe (“|”) character under the ‘first_set‘ column.

To achieve this goal, you’ll need to add the following syntax to the code:

df['first_set'] = df['first_set'].str.replace('_','|')

So the complete Python code to perform the replacement is as follows:

import pandas as pd

colors = {'first_set':  ['aa_bb','cc_dd','ee_ff','gg_hh'],
          'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df['first_set'] = df['first_set'].str.replace('_','|')

print (df)

As you can see, the underscore character was replaced with a pipe character under the ‘first_set’ column:

Replace Characters in Strings in Pandas DataFrame

Replace a Specific Character under the Entire DataFrame

What if you’d like to replace a specific character under the entire DataFrame?

For example, let’s replace the underscore character with a pipe character under the entire DataFrame.

In that case, you’ll need to apply the following syntax:

import pandas as pd

colors = {'first_set':  ['aa_bb','cc_dd','ee_ff','gg_hh'],
          'second_set': ['ii_jj','kk_ll','mm_nn','oo_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df = df.replace('_','|', regex=True)

print (df)

You’ll now see that the underscore character was replaced with a pipe character under the entire DataFrame (under both the ‘first_set’ and ‘second_set’ columns):

Replace Characters in Strings in Pandas DataFrame

Replace a Sequence of Characters

Let’s say that you want to replace a sequence of characters in Pandas DataFrame.

For instance, suppose that you created a new DataFrame where you’d like to replace the sequence of “_xyz_” with two pipes “||”

Here is the syntax to create the new DataFrame:

import pandas as pd

colors = {'first_set':  ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'],
          'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

print (df)

And this is how the new DataFrame would look like:

Example of dataset

You can then use the following code to replace the sequence of “_xyz_” with “||” under the ‘first_set’ column:

import pandas as pd

colors = {'first_set':  ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'],
          'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df['first_set'] = df['first_set'].str.replace('_xyz_','||')

print (df)

You’ll now see the newly replaced characters under the ‘first_set’ column:

Example of data

Alternatively, you could apply the code below to make the changes under the entire DataFrame:

import pandas as pd

colors = {'first_set':  ['aa_xyz_bb','cc_xyz_dd','ee_xyz_ff','gg_xyz_hh'],
          'second_set': ['ii_xyz_jj','kk_xyz_ll','mm_xyz_nn','oo_xyz_pp']
         }

df = pd.DataFrame(colors, columns= ['first_set','second_set'])

df = df.replace('_xyz_','||', regex=True)

print (df)

Here is the result:

How to Replace Characters in Strings in Pandas DataFrame

You can learn more about df.replace by visiting the Pandas Documentation.