Here is the syntax to replace values in a DataFrame in R:
(1) Replace a value across the entire DataFrame:
df[df == "Old Value"] <- "New Value"
(2) Replace a value under a single DataFrame column:
df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"
Next, you’ll see 4 scenarios that describe how to:
- Replace a value across the entire DataFrame
- Replace multiple values
- Replace a value under a single DataFrame column
- Deal with factors to avoid the “invalid factor level” warning
Scenario 1: Replace a value across the entire DataFrame in R
To start with a simple example, let’s create a DataFrame in R that contains 4 columns:
df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33), group_b = c(444, 444, 55, 55, 55, 55, 11, 11), group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"), group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue") ) print(df)
Run the code, and you’ll get the following DataFrame:
group_a group_b group_c group_d
1 11 444 Blue Yellow
2 11 444 Blue Yellow
3 11 55 Blue Yellow
4 222 55 Green White
5 222 55 Green White
6 222 55 Red Blue
7 33 11 Red Blue
8 33 11 Red Blue
Suppose that you’d like to replace ‘11‘ with ‘77‘ across the entire DataFrame.
In that case, you’ll need to add the following syntax to the code:
df[df == 11] <- 77
So the complete code to perform the replacement is:
df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33), group_b = c(444, 444, 55, 55, 55, 55, 11, 11), group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"), group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue") ) df[df == 11] <- 77 print(df)
As you can see, ’11’ was replaced with ’77’ across the entire DataFrame:
group_a group_b group_c group_d
1 77 444 Blue Yellow
2 77 444 Blue Yellow
3 77 55 Blue Yellow
4 222 55 Green White
5 222 55 Green White
6 222 55 Red Blue
7 33 77 Red Blue
8 33 77 Red Blue
Scenario 2: Replace multiple values in a DataFrame in R
At times, you may need to replace multiple values in your DataFrame.
For example, let’s replace:
- ’11’ with ’77’
- ’33’ with ’77’
In that case, you can use pipe (“|”) to perform the replacement:
df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33), group_b = c(444, 444, 55, 55, 55, 55, 11, 11), group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"), group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue") ) df[df == 11 | df == 33] <- 77 print(df)
Here is the result:
group_a group_b group_c group_d
1 77 444 Blue Yellow
2 77 444 Blue Yellow
3 77 55 Blue Yellow
4 222 55 Green White
5 222 55 Green White
6 222 55 Red Blue
7 77 77 Red Blue
8 77 77 Red Blue
Scenario 3: Replace a value under a single DataFrame column
What if you’d like to replace a value under a single DataFrame column?
For instance, let’s replace ’11’ with ’77’ under the ‘group_b‘ column.
To accomplish this goal, you’ll need to apply the following syntax:
df["group_b"][df["group_b"] == 11] <- 77
Therefore, the complete code to execute the replacement is:
df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33), group_b = c(444, 444, 55, 55, 55, 55, 11, 11), group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"), group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue") ) df["group_b"][df["group_b"] == 11] <- 77 print(df)
The ’11’ value will be replaced with ’77’ only under the ‘group_b’ column:
group_a group_b group_c group_d
1 11 444 Blue Yellow
2 11 444 Blue Yellow
3 11 55 Blue Yellow
4 222 55 Green White
5 222 55 Green White
6 222 55 Red Blue
7 33 77 Red Blue
8 33 77 Red Blue
Scenario 4: Dealing with factors
So far, you have seen how to replace numeric values.
The data type of the last two columns is factor (or chr) rather than numeric.
Let’s say that you’d like to replace the ‘Blue‘ color with the ‘Green‘ color using the code below:
df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33), group_b = c(444, 444, 55, 55, 55, 55, 11, 11), group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"), group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue") ) df[df == "Blue"] <- "Green" print(df)
Depending on the R version that you’re using, if you run the above code, you may get the following warning message:
Warning message in `[<-.factor`(`*tmp*`, thisvar, value = “Green”):
“invalid factor level, NA generated”
To avoid this message, you may add “,stringsAsFactors = FALSE” at the end of your DataFrame:
df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33), group_b = c(444, 444, 55, 55, 55, 55, 11, 11), group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"), group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue") ,stringsAsFactors = FALSE ) df[df == "Blue"] <- "Green" print(df)
You’ll now be able to replace all the ‘Blue’ values with the ‘Green’ values without getting the previous warning message:
group_a group_b group_c group_d
1 11 444 Green Yellow
2 11 444 Green Yellow
3 11 55 Green Yellow
4 222 55 Green White
5 222 55 Green White
6 222 55 Red Green
7 33 11 Red Green
8 33 11 Red Green
Similarly, you can replace the ‘Blue’ values with the ‘Green’ values under a single DataFrame column, such as the ‘group_d’ column:
df <- data.frame(group_a = c(11, 11, 11, 222, 222, 222, 33, 33), group_b = c(444, 444, 55, 55, 55, 55, 11, 11), group_c = c("Blue", "Blue", "Blue", "Green", "Green", "Red", "Red", "Red"), group_d = c("Yellow", "Yellow", "Yellow", "White", "White", "Blue", "Blue", "Blue") ,stringsAsFactors = FALSE ) df["group_d"][df["group_d"] == "Blue"] <- "Green" print(df)
Here is the result:
group_a group_b group_c group_d
1 11 444 Blue Yellow
2 11 444 Blue Yellow
3 11 55 Blue Yellow
4 222 55 Green White
5 222 55 Green White
6 222 55 Red Green
7 33 11 Red Green
8 33 11 Red Green