Use the duplicated() function to remove duplicates from a column in R DataFrame:
df_unique <- df[!duplicated(df$column_name), ]
Examples
First, create a DataFrame in R with 2 columns that contain duplicates:
df <- data.frame(colors = c("Red", "Red", "Red", "Green", "Green", "Green", "Blue", "Blue"),
numbers = c(1, 1, 2, 3, 3, 3, 4, 5)
)
print(df)
Here is the DataFrame with the duplicates:
colors numbers
1 Red 1
2 Red 1
3 Red 2
4 Green 3
5 Green 3
6 Green 3
7 Blue 4
8 Blue 5
Next, remove the duplicates under the “colors” column as follows:
df <- data.frame(colors = c("Red", "Red", "Red", "Green", "Green", "Green", "Blue", "Blue"),
numbers = c(1, 1, 2, 3, 3, 3, 4, 5)
)
df_unique <- df[!duplicated(df$colors), ]
print(df_unique)
There would be no duplicates under the “colors” column:
colors numbers
1 Red 1
4 Green 3
7 Blue 4
Alternatively, to remove the duplicates under the “numbers” column:
df <- data.frame(colors = c("Red", "Red", "Red", "Green", "Green", "Green", "Blue", "Blue"),
numbers = c(1, 1, 2, 3, 3, 3, 4, 5)
)
df_unique <- df[!duplicated(df$numbers), ]
print(df_unique)
Now there would be no duplicates under the “numbers” column as captured below:
colors numbers
1 Red 1
3 Red 2
4 Green 3
7 Blue 4
8 Blue 5