Remove Duplicates from a Column in R DataFrame

Use the duplicated() function to remove duplicates from a column in R DataFrame:

df_unique <- df[!duplicated(df$column_name), ]

Examples

First, create a DataFrame in R with 2 columns that contain duplicates:

df <- data.frame(colors = c("Red", "Red", "Red", "Green", "Green", "Green", "Blue", "Blue"),
numbers = c(1, 1, 2, 3, 3, 3, 4, 5)
)

print(df)

Here is the DataFrame with the duplicates:

  colors numbers
1    Red       1
2    Red       1
3    Red       2
4  Green       3
5  Green       3
6  Green       3
7   Blue       4
8   Blue       5

Next, remove the duplicates under the “colors” column as follows:

df <- data.frame(colors = c("Red", "Red", "Red", "Green", "Green", "Green", "Blue", "Blue"),
numbers = c(1, 1, 2, 3, 3, 3, 4, 5)
)

df_unique <- df[!duplicated(df$colors), ]

print(df_unique)

There would be no duplicates under the “colors” column:

  colors numbers
1    Red       1
4  Green       3
7   Blue       4

Alternatively, to remove the duplicates under the “numbers” column:

df <- data.frame(colors = c("Red", "Red", "Red", "Green", "Green", "Green", "Blue", "Blue"),
numbers = c(1, 1, 2, 3, 3, 3, 4, 5)
)

df_unique <- df[!duplicated(df$numbers), ]

print(df_unique)

Now there would be no duplicates under the “numbers” column as captured below:

  colors numbers
1    Red       1
3    Red       2
4  Green       3
7   Blue       4
8   Blue       5

Leave a Comment