To remove duplicates across the entire DataFrame:
df.drop_duplicates()
To remove duplicates under a single DataFrame column:
df.drop_duplicates(subset=["column_name"])
Steps to Remove Duplicates in Pandas DataFrame
Step 1: Gather the data that contains the duplicates
Firstly, you’ll need to gather the data that contains the duplicates.
Here is an example of data that contains duplicates:
Color | Shape |
Green | Rectangle |
Green | Rectangle |
Green | Square |
Blue | Rectangle |
Blue | Square |
Red | Square |
Red | Square |
Red | Rectangle |
Step 2: Create the Pandas DataFrame
Next, create the Pandas DataFrame using this code:
import pandas as pd data = {"Color": ["Green", "Green", "Green", "Blue", "Blue", "Red", "Red", "Red"], "Shape": ["Rectangle", "Rectangle", "Square", "Rectangle", "Square", "Square", "Square", "Rectangle"] } df = pd.DataFrame(data) print(df)
The resulted DataFrame:
Color Shape
0 Green Rectangle
1 Green Rectangle
2 Green Square
3 Blue Rectangle
4 Blue Square
5 Red Square
6 Red Square
7 Red Rectangle
Step 3: Remove duplicates in Pandas DataFrame
To remove the duplicates across the entire DataFrame using df.drop_duplicates():
import pandas as pd data = {"Color": ["Green", "Green", "Green", "Blue", "Blue", "Red", "Red", "Red"], "Shape": ["Rectangle", "Rectangle", "Square", "Rectangle", "Square", "Square", "Square", "Rectangle"] } df = pd.DataFrame(data) df_no_duplicates = df.drop_duplicates() print(df_no_duplicates)
The result after removing the duplicates:
Color Shape
0 Green Rectangle
2 Green Square
3 Blue Rectangle
4 Blue Square
5 Red Square
7 Red Rectangle
Remove Duplicates under a Specific Column
To remove the duplicates under the Color column using df.drop_duplicates(subset=[“Color”]):
import pandas as pd data = {"Color": ["Green", "Green", "Green", "Blue", "Blue", "Red", "Red", "Red"], "Shape": ["Rectangle", "Rectangle", "Square", "Rectangle", "Square", "Square", "Square", "Rectangle"] } df = pd.DataFrame(data) df_no_duplicates = df.drop_duplicates(subset=["Color"]) print(df_no_duplicates)
The result:
Color Shape
0 Green Rectangle
3 Blue Rectangle
5 Red Square
You may want to check the Pandas Documentation to learn more about removing duplicates from a DataFrame.