How to to Replace Values in a DataFrame in R

Here is the syntax to replace values in a DataFrame in R:

(1) Replace a value across the entire DataFrame:

df[df == "Old Value"] <- "New Value"

(2) Replace a value under a single DataFrame column:

df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

Next, you’ll see 4 scenarios that describe how to:

  1. Replace a value across the entire DataFrame
  2. Replace multiple values
  3. Replace a value under a single DataFrame column
  4. Deal with factors to avoid the “invalid factor level” warning

Scenario 1: Replace a value across the entire DataFrame in R

To start with a simple example, let’s create a DataFrame in R that contains 4 columns:

df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33),
                 group_b = c(444, 444, 55, 55, 55, 55, 11, 11),
                 group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"),
                 group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue")
                 )

print(df)

Run the code, and you’ll get the following DataFrame:

  group_a  group_b  group_c  group_d
1      11      444     Blue   Yellow
2      11      444     Blue   Yellow
3      11       55     Blue   Yellow
4     222       55    Green    White
5     222       55    Green    White
6     222       55      Red     Blue
7      33       11      Red     Blue
8      33       11      Red     Blue

Suppose that you’d like to replace ‘11‘ with ‘77‘ across the entire DataFrame.

In that case, you’ll need to add the following syntax to the code:

df[df == 11] <- 77

So the complete code to perform the replacement is:

df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33),
                 group_b = c(444, 444, 55, 55, 55, 55, 11, 11),
                 group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"),
                 group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue")
                 )

df[df == 11] <- 77

print(df)

As you can see, ’11’ was replaced with ’77’ across the entire DataFrame:

  group_a  group_b  group_c  group_d
1      77      444     Blue   Yellow
2      77      444     Blue   Yellow
3      77       55     Blue   Yellow
4     222       55    Green    White
5     222       55    Green    White
6     222       55      Red     Blue
7      33       77      Red     Blue
8      33       77      Red     Blue

Scenario 2: Replace multiple values in a DataFrame in R

At times, you may need to replace multiple values in your DataFrame.

For example, let’s replace:

  • ’11’ with ’77’
  • ’33’ with ’77’

In that case, you can use pipe (“|”) to perform the replacement:

df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33),
                 group_b = c(444, 444, 55, 55, 55, 55, 11, 11),
                 group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"),
                 group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue")
                 )

df[df == 11 | df == 33] <- 77

print(df)

Here is the result:

  group_a  group_b  group_c  group_d
1      77      444     Blue   Yellow
2      77      444     Blue   Yellow
3      77       55     Blue   Yellow
4     222       55    Green    White
5     222       55    Green    White
6     222       55      Red     Blue
7      77       77      Red     Blue
8      77       77      Red     Blue

Scenario 3: Replace a value under a single DataFrame column

What if you’d like to replace a value under a single DataFrame column?

For instance, let’s replace ’11’ with ’77’ under the ‘group_b‘ column.

To accomplish this goal, you’ll need to apply the following syntax:

df["group_b"][df["group_b"] == 11] <- 77

Therefore, the complete code to execute the replacement is:

df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33),
                 group_b = c(444, 444, 55, 55, 55, 55, 11, 11),
                 group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"),
                 group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue")
                 )

df["group_b"][df["group_b"] == 11] <- 77

print(df)

The ’11’ value will be replaced with ’77’ only under the ‘group_b’ column:

  group_a  group_b  group_c  group_d
1      11      444     Blue   Yellow
2      11      444     Blue   Yellow
3      11       55     Blue   Yellow
4     222       55    Green    White
5     222       55    Green    White
6     222       55      Red     Blue
7      33       77      Red     Blue
8      33       77      Red     Blue

Scenario 4: Dealing with factors

So far, you have seen how to replace numeric values.

The data type of the last two columns is factor (or chr) rather than numeric.

Let’s say that you’d like to replace the ‘Blue‘ color with the ‘Green‘ color using the code below:

df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33),
                 group_b = c(444, 444, 55, 55, 55, 55, 11, 11),
                 group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"),
                 group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue")
                 )

df[df == "Blue"] <- "Green"

print(df)

Depending on the R version that you’re using, if you run the above code, you may get the following warning message:

Warning message in `[<-.factor`(`*tmp*`, thisvar, value = “Green”):
“invalid factor level, NA generated”

To avoid this message, you may add “,stringsAsFactors = FALSE” at the end of your DataFrame:

df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33),
                 group_b = c(444, 444, 55, 55, 55, 55, 11, 11),
                 group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"),
                 group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue")
                 ,stringsAsFactors = FALSE
                 )

df[df == "Blue"] <- "Green"

print(df)

You’ll now be able to replace all the ‘Blue’ values with the ‘Green’ values without getting the previous warning message:

  group_a  group_b  group_c  group_d
1      11      444    Green   Yellow
2      11      444    Green   Yellow
3      11       55    Green   Yellow
4     222       55    Green    White
5     222       55    Green    White
6     222       55      Red    Green
7      33       11      Red    Green
8      33       11      Red    Green

Similarly, you can replace the ‘Blue’ values with the ‘Green’ values under a single DataFrame column, such as the ‘group_d’ column:

df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33),
                 group_b = c(444, 444, 55, 55, 55, 55, 11, 11),
                 group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"),
                 group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue")
                 ,stringsAsFactors = FALSE
                 )

df["group_d"][df["group_d"] == "Blue"] <- "Green"

print(df)

Here is the result:

  group_a  group_b  group_c  group_d
1      11      444     Blue   Yellow
2      11      444     Blue   Yellow
3      11       55     Blue   Yellow
4     222       55    Green    White
5     222       55    Green    White
6     222       55      Red    Green
7      33       11      Red    Green
8      33       11      Red    Green

Leave a Comment