Here are 2 ways to drop columns with NaN values in Pandas DataFrame:
(1) Drop any column that contains at least one NaN:
df.dropna(axis='columns', inplace=True)
(2) Drop column/s where ALL the values are NaN:
df.dropna(axis='columns', how='all', inplace=True)
The Example
For demonstration purposes, let’s create a DataFrame with 5 columns, where:
- 3 columns will contain NaN values
- 2 columns will not contain any NaN values
Here is the syntax to create the DataFrame:
import pandas as pd import numpy as np data = {'Column_A': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan], 'Column_B': [11, 22, 33, 44, 55, 66, 77, 88, 99], 'Column_C': ['a', 'b', np.nan, np.nan, 'c', 'd', 'e', np.nan, 'f'], 'Column_D': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff', 'gg', 'hh', 'ii'], 'Column_E': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan] } df = pd.DataFrame(data) print(df)
As you can see, 3 columns (‘Column_A’, ‘Column_C’ and ‘Column_E’) contain NaN values:
Column_A Column_B Column_C Column_D Column_E
0 1.0 11 a aa NaN
1 2.0 22 b bb NaN
2 3.0 33 NaN cc NaN
3 4.0 44 NaN dd NaN
4 5.0 55 c ee NaN
5 NaN 66 d ff NaN
6 6.0 77 e gg NaN
7 7.0 88 NaN hh NaN
8 NaN 99 f ii NaN
The ultimate goal is to drop the columns with the NaN values in the above DataFrame.
(1) Drop any column that contains at least one NaN
You can use the following template to drop any column that contains at least one NaN:
df.dropna(axis='columns', inplace=True)
For our example:
import pandas as pd import numpy as np data = {'Column_A': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan], 'Column_B': [11, 22, 33, 44, 55, 66, 77, 88, 99], 'Column_C': ['a', 'b', np.nan, np.nan, 'c', 'd', 'e', np.nan, 'f'], 'Column_D': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff', 'gg', 'hh', 'ii'], 'Column_E': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan] } df = pd.DataFrame(data) df.dropna(axis='columns', inplace=True) print(df)
Once you run the code, you’ll notice that the 3 columns, which originally contained the NaN values, were dropped. Only the other 2 columns (without the NaN values) were maintained:
Column_B Column_D
0 11 aa
1 22 bb
2 33 cc
3 44 dd
4 55 ee
5 66 ff
6 77 gg
7 88 hh
8 99 ii
(2) Drop column/s where ALL the values are NaN
What if you’d like to drop only the column/s where ALL the values are NaN?
In that case, you can use the template below to accomplish this goal:
df.dropna(axis='columns', how='all', inplace=True)
Here is the code that you can use:
import pandas as pd import numpy as np data = {'Column_A': [1, 2, 3, 4, 5, np.nan, 6, 7, np.nan], 'Column_B': [11, 22, 33, 44, 55, 66, 77, 88, 99], 'Column_C': ['a', 'b', np.nan, np.nan, 'c', 'd', 'e', np.nan, 'f'], 'Column_D': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff', 'gg', 'hh', 'ii'], 'Column_E': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan] } df = pd.DataFrame(data) df.dropna(axis='columns', how='all', inplace=True) print(df)
As you may notice, ‘Column_E‘ (that contained only NaN values) was dropped:
Column_A Column_B Column_C Column_D
0 1.0 11 a aa
1 2.0 22 b bb
2 3.0 33 NaN cc
3 4.0 44 NaN dd
4 5.0 55 c ee
5 NaN 66 d ff
6 6.0 77 e gg
7 7.0 88 NaN hh
8 NaN 99 f ii
Additional Resources
You can check the Pandas Documentation to learn more about dropna.
Optionally, you can check the following guide to learn how to drop rows with NaN values in Pandas DataFrame.