In this short guide, you’ll see how to drop rows with NaN values in Pandas DataFrame.
To start, here is the syntax that you may apply in order drop rows with NaN values in your DataFrame:
df.dropna()
In the next section, you’ll observe the steps to apply the above syntax in practice.
Steps to Drop Rows with NaN Values in Pandas DataFrame
Step 1: Create a DataFrame with NaN Values
Let’s say that you have the following dataset:
values_1 | values_2 |
700 | DDD |
ABC | 150 |
500 | 350 |
XYZ | 400 |
1200 | 5000 |
You can then capture the above data in Python by creating a DataFrame:
import pandas as pd df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'], 'values_2': ['DDD','150','350','400','5000'] }) print (df)
Once you run the code, you’ll get this DataFrame:
values_1 values_2
0 700 DDD
1 ABC 150
2 500 350
3 XYZ 400
4 1200 5000
Notice that the DataFrame contains both:
- Numeric data: 700, 500, 1200, 150 , 350 ,400, 5000
- Non-numeric values: ABC, XYZ, DDD
You can then use to_numeric in order to convert the values in the dataset into a float format. But since 3 of those values are non-numeric, you’ll get ‘NaN’ for those 3 values.
Here is the code that you may use to get the NaN values:
import pandas as pd df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'], 'values_2': ['DDD','150','350','400','5000'] }) df = df.apply (pd.to_numeric, errors='coerce') print (df)
As you may observe, the first, second and fourth rows now have NaN values:
values_1 values_2
0 700.0 NaN
1 NaN 150.0
2 500.0 350.0
3 NaN 400.0
4 1200.0 5000.0
Step 2: Drop the Rows with NaN Values in Pandas DataFrame
To drop all the rows with the NaN values, you may use df.dropna().
Here is the complete Python code to drop those rows with the NaN values:
import pandas as pd df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'], 'values_2': ['DDD','150','350','400','5000'] }) df = df.apply (pd.to_numeric, errors='coerce') df = df.dropna() print (df)
Run the code, and you’ll see only two rows without any NaN values:
values_1 values_2
2 500.0 350.0
4 1200.0 5000.0
You may have noticed that those two rows no longer have a sequential index. It is currently 2 and 4. You can then reset the index to start from 0.
Step 3 (Optional): Reset the Index
You can apply the following syntax to reset an index in Pandas DataFrame:
df.reset_index(drop=True)
So this is the full Python code to drop the rows with the NaN values, and then reset the index:
import pandas as pd df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'], 'values_2': ['DDD','150','350','400','5000'] }) df = df.apply (pd.to_numeric, errors='coerce') df = df.dropna() df = df.reset_index(drop=True) print (df)
You’ll now notice that the index starts from 0:
values_1 values_2
0 500.0 350.0
1 1200.0 5000.0