Replace NA Values with Zeros in DataFrame in R

Here are 2 ways to replace NA values with zeros in a DataFrame in R:

(1) Replace NA values with zeros across the entire DataFrame:

df[is.na(df)] <- 0

Note that if your DataFrame contains factors, you may consider adding “,stringsAsFactors = FALSE” at the end of your DataFrame (later you’ll see an example that tackles this scenario).

(2) Replace NA values with zeros under a single DataFrame column:

df["column_name"][is.na(df["column_name"])] <- 0

In the following section, you’ll see how to apply the above syntax in practice.

Steps to Replace NA Values with Zeros in a DataFrame in R

Step 1: Create a DataFrame in R with NA values

Let’s start by creating a DataFrame with 4 columns. Additionally, let’s add several NA values across the entire DataFrame:

df <- data.frame(group_a = c(11,11,NA,NA,22,33,33,NA),
                 group_b = c(99,99,NA,77,77,NA,55,55),
                 group_c = c("Green","Green",NA,"Blue","Blue",NA,"Red","Red"),
                 group_d = c(NA,NA,"Blue","Yellow","Yellow","Purple",NA,NA)
                 )

print(df)

As you can see in yellow, there are currently 11 NA values across the DataFrame:

  group_a  group_b  group_c  group_d
1      11       99    Green     <NA>
2      11       99    Green     <NA>
3      NA       NA     <NA>     Blue
4      NA       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33       NA     <NA>   Purple
7      33       55      Red     <NA>
8      NA       55      Red     <NA>

Step 2: Replace the NA values with zeros

You can use the following syntax to replace the NA values with zeros across the entire DataFrame:

df[is.na(df)] <- 0

Here is the full code for our example:

df <- data.frame(group_a = c(11,11,NA,NA,22,33,33,NA),
                 group_b = c(99,99,NA,77,77,NA,55,55),
                 group_c = c("Green","Green",NA,"Blue","Blue",NA,"Red","Red"),
                 group_d = c(NA,NA,"Blue","Yellow","Yellow","Purple",NA,NA)
                 )

df[is.na(df)] <- 0

print(df)

Notice that not all the NA values were replaced with zeros (only the ones under the columns that contained numeric data were replaced). For the last two columns, where the data type is factor, the NA values may not be replaced (depending on the version of R that you’re using):

  group_a  group_b  group_c  group_d
1      11       99    Green     <NA>
2      11       99    Green     <NA>
3       0        0     <NA>     Blue
4       0       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33        0     <NA>   Purple
7      33       55      Red     <NA>
8       0       55      Red     <NA>

To deal with factors, you can then add “,stringsAsFactors = FALSE” at the end of your DataFrame, as captured below:

df <- data.frame(group_a = c(11,11,NA,NA,22,33,33,NA),
                 group_b = c(99,99,NA,77,77,NA,55,55),
                 group_c = c("Green","Green",NA,"Blue","Blue",NA,"Red","Red"),
                 group_d = c(NA,NA,"Blue","Yellow","Yellow","Purple",NA,NA)
                 ,stringsAsFactors = FALSE
                 )

df[is.na(df)] <- 0

print(df)

As you can see, all the NA values are now replaced with zeros across the entire DataFrame:

  group_a  group_b  group_c  group_d
1      11       99    Green        0
2      11       99    Green        0
3       0        0        0     Blue
4       0       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33        0        0   Purple
7      33       55      Red        0
8       0       55      Red        0

Step 3 (optional): Replace NA values under a single DataFrame column

Optionally, you can use the following syntax to replace the NA values with zeros under a single DataFrame column:

df["column_name"][is.na(df["column_name"])] <- 0

For example, let’s replace the NA values under the ‘group_d‘ column:

df <- data.frame(group_a = c(11,11,NA,NA,22,33,33,NA),
                 group_b = c(99,99,NA,77,77,NA,55,55),
                 group_c = c("Green","Green",NA,"Blue","Blue",NA,"Red","Red"),
                 group_d = c(NA,NA,"Blue","Yellow","Yellow","Purple",NA,NA)
                 ,stringsAsFactors = FALSE
                 )

df["group_d"][is.na(df["group_d"])] <- 0

print(df)

As you can see, only the NA values under the ‘group_d’ column were replaced:

  group_a  group_b  group_c  group_d
1      11       99    Green        0
2      11       99    Green        0
3      NA       NA     <NA>     Blue
4      NA       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33       NA     <NA>   Purple
7      33       55      Red        0
8      NA       55      Red        0