Here are 4 ways to check for NaN in Pandas DataFrame:
(1) Check for NaN under a single DataFrame column:
df['column name'].isnull().values.any()
(2) Count the NaN under a single DataFrame column:
df['column name'].isnull().sum()
(3) Check for NaN under an entire DataFrame:
df.isnull().values.any()
(4) Count the NaN under an entire DataFrame:
df.isnull().sum().sum()
Examples of checking for NaN in Pandas DataFrame
(1) Check for NaN under a single DataFrame column
In the following example, we’ll create a DataFrame with a set of numbers and 3 NaN values:
import pandas as pd import numpy as np data = {'set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan]} df = pd.DataFrame(data) print(df)
You’ll now see the DataFrame with the 3 NaN values:
set_of_numbers
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 NaN
6 6.0
7 7.0
8 NaN
9 8.0
10 9.0
11 10.0
12 NaN
You can then use the following template in order to check for NaN under a single DataFrame column:
df['column name'].isnull().values.any()
For our example (where the desired column name is ‘set_of_numbers‘):
import pandas as pd import numpy as np data = {'set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan]} df = pd.DataFrame(data) check_for_nan = df['set_of_numbers'].isnull().values.any() print(check_for_nan)
Run the code, and you’ll get ‘True‘ which confirms the existence of NaN values under the DataFrame column:
True
And if you want to get the actual breakdown of the instances where NaN values exist, then you may remove .values.any() from the code:
import pandas as pd import numpy as np data = {'set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan]} df = pd.DataFrame(data) check_for_nan = df['set_of_numbers'].isnull() print(check_for_nan)
You’ll now see the 3 instances of the NaN values:
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 True
9 False
10 False
11 False
12 True
Here is another approach where you can get all the instances where a NaN value exists:
import pandas as pd import numpy as np data = {'set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan]} df = pd.DataFrame(data) df.loc[df['set_of_numbers'].isnull(), 'value_is_NaN'] = 'Yes' df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No' print(df)
You’ll now see a new column (called ‘value_is_NaN’), which indicates all the instances where a NaN value exists:
set_of_numbers value_is_NaN
0 1.0 No
1 2.0 No
2 3.0 No
3 4.0 No
4 5.0 No
5 NaN Yes
6 6.0 No
7 7.0 No
8 NaN Yes
9 8.0 No
10 9.0 No
11 10.0 No
12 NaN Yes
(2) Count the NaN under a single DataFrame column
You can apply this syntax in order to count the NaN values under a single DataFrame column:
df['column name'].isnull().sum()
For our example:
import pandas as pd import numpy as np data = {'set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan]} df = pd.DataFrame(data) count_nan = df['set_of_numbers'].isnull().sum() print('Count of NaN: ' + str(count_nan))
You’ll then get the count of 3 NaN values:
Count of NaN: 3
And here is another approach to get the count:
import pandas as pd import numpy as np data = {'set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan]} df = pd.DataFrame(data) df.loc[df['set_of_numbers'].isnull(), 'value_is_NaN'] = 'Yes' df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No' count_nan = df.loc[df['value_is_NaN'] == 'Yes'].count() print(count_nan)
As before, you’ll get the count of 3 instances of NaN values:
value_is_NaN 3
(3) Check for NaN under an entire DataFrame
Now let’s add a second column into the original DataFrame. This column would include another set of numbers with NaN values:
import pandas as pd import numpy as np data = {'first_set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan], 'second_set_of_numbers': [11, 12, np.nan, 13, 14, np.nan, 15, 16, np.nan, np.nan, 17, np.nan, 19]} df = pd.DataFrame(data) print(df)
Run the code, and you’ll get 8 instances of NaN values across the entire DataFrame:
first_set_of_numbers second_set_of_numbers
0 1.0 11.0
1 2.0 12.0
2 3.0 NaN
3 4.0 13.0
4 5.0 14.0
5 NaN NaN
6 6.0 15.0
7 7.0 16.0
8 NaN NaN
9 8.0 NaN
10 9.0 17.0
11 10.0 NaN
12 NaN 19.0
You can then apply this syntax in order to verify the existence of NaN values under the entire DataFrame:
df.isnull().values.any()
For our example:
import pandas as pd import numpy as np data = {'first_set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan], 'second_set_of_numbers': [11, 12, np.nan, 13, 14, np.nan, 15, 16, np.nan, np.nan, 17, np.nan, 19]} df = pd.DataFrame(data) check_nan_in_df = df.isnull().values.any() print(check_nan_in_df)
Once you run the code, you’ll get ‘True‘ which confirms the existence of NaN values in the DataFrame:
True
You can get a further breakdown by removing .values.any() from the code:
import pandas as pd import numpy as np data = {'first_set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan], 'second_set_of_numbers': [11, 12, np.nan, 13, 14, np.nan, 15, 16, np.nan, np.nan, 17, np.nan, 19]} df = pd.DataFrame(data) check_nan_in_df = df.isnull() print(check_nan_in_df)
Here is the result of the breakdown:
first_set_of_numbers second_set_of_numbers
0 False False
1 False False
2 False True
3 False False
4 False False
5 True True
6 False False
7 False False
8 True True
9 False True
10 False False
11 False True
12 True False
(4) Count the NaN under an entire DataFrame
You may now use this template to count the NaN values under the entire DataFrame:
df.isnull().sum().sum()
Here is the code for our example:
import pandas as pd import numpy as np data = {'first_set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan], 'second_set_of_numbers': [11, 12, np.nan, 13, 14, np.nan, 15, 16, np.nan, np.nan, 17, np.nan, 19]} df = pd.DataFrame(data) count_nan_in_df = df.isnull().sum().sum() print('Count of NaN: ' + str(count_nan_in_df))
You’ll then get the total count of 8:
Count of NaN: 8
And if you want to get the count of NaN by column, then you may use the following code:
import pandas as pd import numpy as np data = {'first_set_of_numbers': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan, 8, 9, 10, np.nan], 'second_set_of_numbers': [11, 12, np.nan, 13, 14, np.nan, 15, 16, np.nan, np.nan, 17, np.nan, 19]} df = pd.DataFrame(data) count_nan_in_df = df.isnull().sum() print(count_nan_in_df)
And here is the result:
first_set_of_numbers 3
second_set_of_numbers 5
You just saw how to check for NaN in Pandas DataFrame. Alternatively you may: