How to Drop Rows with NaN Values in Pandas DataFrame

In this short guide, you’ll see how to drop rows with NaN values in Pandas DataFrame.

To start, here is the syntax that you may apply in order drop rows with NaN values in your DataFrame:

df.dropna()

In the next section, you’ll observe the steps to apply the above syntax in practice.

Steps to Drop Rows with NaN Values in Pandas DataFrame

Step 1: Create a DataFrame with NaN Values

Let’s say that you have the following dataset:

values_1 values_2
700 DDD
ABC 150
500 350
XYZ 400
1200 5000

You can then capture the above data in Python by creating a DataFrame:

import pandas as pd

df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'],
                   'values_2': ['DDD','150','350','400','5000'] 
                   })

print (df)

Once you run the code, you’ll get this DataFrame:

  values_1 values_2
0      700      DDD
1      ABC      150
2      500      350
3      XYZ      400
4     1200     5000

Notice that the DataFrame contains both:

  • Numeric data: 700, 500, 1200, 150 , 350 ,400, 5000
  • Non-numeric values: ABC, XYZ, DDD

You can then use to_numeric in order to convert the values in the dataset into a float format. But since 3 of those values are non-numeric, you’ll get ‘NaN’ for those 3 values.

Here is the code that you may use to get the NaN values:

import pandas as pd

df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'],
                   'values_2': ['DDD','150','350','400','5000'] 
                   })

df = df.apply (pd.to_numeric, errors='coerce')

print (df)

As you may observe, the first, second and fourth rows now have NaN values:

   values_1  values_2
0     700.0       NaN
1       NaN     150.0
2     500.0     350.0
3       NaN     400.0
4    1200.0    5000.0

Step 2: Drop the Rows with NaN Values in Pandas DataFrame

To drop all the rows with the NaN values, you may use df.dropna().

Here is the complete Python code to drop those rows with the NaN values:

import pandas as pd

df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'],
                   'values_2': ['DDD','150','350','400','5000'] 
                   })

df = df.apply (pd.to_numeric, errors='coerce')
df = df.dropna()

print (df)

Run the code, and you’ll see only two rows without any NaN values:

   values_1  values_2
2     500.0     350.0
4    1200.0    5000.0

You may have noticed that those two rows no longer have a sequential index. It is currently 2 and 4. You can then reset the index to start from 0.

Step 3 (Optional): Reset the Index

You can apply the following syntax to reset an index in Pandas DataFrame:

df.reset_index(drop=True)

So this is the full Python code to drop the rows with the NaN values, and then reset the index:

import pandas as pd

df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'],
                   'values_2': ['DDD','150','350','400','5000'] 
                   })

df = df.apply (pd.to_numeric, errors='coerce')
df = df.dropna()
df = df.reset_index(drop=True)

print (df)

You’ll now notice that the index starts from 0:

   values_1  values_2
0     500.0     350.0
1    1200.0    5000.0