Replace NA Values with Zeros in DataFrame in R

Here are 2 ways to replace NA values with zeros in a DataFrame in R:

(1) Replace NA values with zeros across the entire DataFrame:

df[is.na(df)] <- 0

(2) Replace NA values with zeros under a single DataFrame column:

df["column_name"][is.na(df["column_name"])] <- 0

In the following section, you’ll see how to apply the above syntax in practice.

Steps to Replace NA Values with Zeros in a DataFrame in R

Step 1: Create a DataFrame in R with NA values

Let’s start by creating a DataFrame with 4 columns. Additionally, let’s add several NA values across the entire DataFrame:

df <- data.frame(group_a = c(11, 11, NA, NA, 22, 33, 33, NA),
                 group_b = c(99, 99, NA, 77, 77, NA, 55, 55),
                 group_c = c("Green", "Green", NA, "Blue", "Blue", NA, "Red", "Red"),
                 group_d = c(NA, NA, "Blue", "Yellow", "Yellow", "Purple", NA, NA)
                 )

print(df)

As you can see in yellow, there are currently 11 NA values across the DataFrame:

  group_a  group_b  group_c  group_d
1      11       99    Green     <NA>
2      11       99    Green     <NA>
3      NA       NA     <NA>     Blue
4      NA       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33       NA     <NA>   Purple
7      33       55      Red     <NA>
8      NA       55      Red     <NA>

Step 2: Replace the NA values with zeros

You can use the following syntax to replace the NA values with zeros across the entire DataFrame:

df[is.na(df)] <- 0

Here is the full code for our example:

df <- data.frame(group_a = c(11, 11, NA, NA, 22, 33, 33, NA),
                 group_b = c(99, 99, NA, 77, 77, NA, 55, 55),
                 group_c = c("Green", "Green", NA, "Blue", "Blue", NA, "Red", "Red"),
                 group_d = c(NA, NA, "Blue", "Yellow", "Yellow", "Purple", NA, NA)
                 )

df[is.na(df)] <- 0

print(df)

Note that depending on the R version you’re using, only the NA values under the numeric columns would be replaced with zeros. For the last two columns, where the data type is factor, the NA values may not be replaced:

  group_a  group_b  group_c  group_d
1      11       99    Green     <NA>
2      11       99    Green     <NA>
3       0        0     <NA>     Blue
4       0       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33        0     <NA>   Purple
7      33       55      Red     <NA>
8       0       55      Red     <NA>

To deal with factors, you can then add “,stringsAsFactors = FALSE” at the end of your DataFrame, as captured below:

df <- data.frame(group_a = c(11, 11, NA, NA, 22, 33, 33, NA),
                 group_b = c(99, 99, NA, 77, 77, NA, 55, 55),
                 group_c = c("Green", "Green", NA, "Blue", "Blue", NA, "Red", "Red"),
                 group_d = c(NA, NA, "Blue", "Yellow", "Yellow", "Purple", NA, NA)
                 ,stringsAsFactors = FALSE
                 )

df[is.na(df)] <- 0

print(df)

As you can see, all the NA values are now replaced with zeros across the entire DataFrame:

  group_a  group_b  group_c  group_d
1      11       99    Green        0
2      11       99    Green        0
3       0        0        0     Blue
4       0       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33        0        0   Purple
7      33       55      Red        0
8       0       55      Red        0

Step 3 (optional): Replace NA values under a single DataFrame column

Optionally, you can use the following syntax to replace the NA values with zeros under a single DataFrame column:

df["column_name"][is.na(df["column_name"])] <- 0

For example, let’s replace the NA values under the ‘group_d‘ column:

df <- data.frame(group_a = c(11, 11, NA, NA, 22, 33, 33, NA),
                 group_b = c(99, 99, NA, 77, 77, NA, 55, 55),
                 group_c = c("Green", "Green", NA, "Blue", "Blue", NA, "Red", "Red"),
                 group_d = c(NA, NA, "Blue", "Yellow", "Yellow", "Purple", NA, NA)
                 ,stringsAsFactors = FALSE
                 )

df["group_d"][is.na(df["group_d"])] <- 0

print(df)

As you can see, only the NA values under the ‘group_d’ column were replaced:

  group_a  group_b  group_c  group_d
1      11       99    Green        0
2      11       99    Green        0
3      NA       NA     <NA>     Blue
4      NA       77     Blue   Yellow
5      22       77     Blue   Yellow
6      33       NA     <NA>   Purple
7      33       55      Red        0
8      NA       55      Red        0