Depending on the scenario, you may use either of the 4 methods below in order to replace NaN values with zeros in Pandas DataFrame:
(1) For a single column using Pandas:
df['DataFrame Column'] = df['DataFrame Column'].fillna(0)
(2) For a single column using NumPy:
df['DataFrame Column'] = df['DataFrame Column'].replace(np.nan, 0)
(3) For an entire DataFrame using Pandas:
df.fillna(0)
(4) For an entire DataFrame using NumPy:
df.replace(np.nan,0)
Let’s now review how to apply each of the 4 methods using simple examples.
4 cases to replace NaN values with zeros in Pandas DataFrame
Case 1: replace NaN values with zeros for a column using Pandas
Suppose that you have a single column with the following data:
values |
700 |
ABC300 |
500 |
900XYZ |
You can then create a DataFrame in Python to capture that data:
import pandas as pd df = pd.DataFrame({'values': ['700','ABC300','500','900XYZ']}) print (df)
This is how the DataFrame would look like once you run the above code in Python:
Notice that some of the values in the dataset contain text (i.e., ABC300 and 900XYZ), while other values are purely numeric (i.e., 700 and 500).
You can then use to_numeric in order to convert the values in the dataset into a float format. But since two of those values contain text, then you’ll get ‘NaN’ for those two values.
Later, you’ll see how to replace the NaN values with zeros in Pandas DataFrame. But in the meantime, you can use the code below in order to convert the strings into floats, while generating the NaN values:
import pandas as pd df = pd.DataFrame({'values': ['700','ABC300','500','900XYZ']}) df['values'] = pd.to_numeric(df['values'], errors='coerce') print (df)
And this the result that you’ll get with the NaN values:
Finally, in order to replace the NaN values with zeros for a column using Pandas, you may use the first method introduced at the top of this guide:
df['DataFrame Column'] = df['DataFrame Column'].fillna(0)
In the context of our example, here is the complete Python code to replace the NaN values with 0’s:
import pandas as pd df = pd.DataFrame({'values': ['700','ABC300','500','900XYZ']}) df['values'] = pd.to_numeric(df['values'], errors='coerce') df['values'] = df['values'].fillna(0) print (df)
Run the code, and you’ll see that the previous two NaN values became 0’s:
Case 2: replace NaN values with zeros for a column using NumPy
You can accomplish the same task of replacing the NaN values with zeros by using NumPy:
df['DataFrame Column'] = df['DataFrame Column'].replace(np.nan, 0)
For our example, you can use the following code to perform the replacement:
import pandas as pd import numpy as np df = pd.DataFrame({'values': ['700','ABC300','500','900XYZ']}) df['values'] = pd.to_numeric(df['values'], errors='coerce') df['values'] = df['values'].replace(np.nan, 0) print (df)
As before, the two NaN values became 0’s:
Case 3: replace NaN values with zeros for an entire DataFrame using Pandas
For the first two cases, you only had a single column in the dataset. But what if your DataFrame contains multiple columns?
For simplicity, let’s assume that you have the following dataset with 2 columns:
values_1 | values_2 |
700 | DDD200 |
ABC300 | 150 |
500 | 350ZZZ |
900XYZ | 400 |
You can then create the DataFrame as follows:
import pandas as pd df = pd.DataFrame({'values_1': ['700','ABC300','500','900XYZ'], 'values_2': ['DDD200','150','350ZZZ','400'] }) print (df)
Run the code, and you’ll get the DataFrame with the two columns:
Notice that both of the columns contain numeric and text values. You can then use to_numeric to convert the entire DataFrame into a float. While doing so, you’ll get NaN values for all the entries that contained text:
import pandas as pd df = pd.DataFrame({'values_1': ['700','ABC300','500','900XYZ'], 'values_2': ['DDD200','150','350ZZZ','400'] }) df = df.apply (pd.to_numeric, errors='coerce') print (df)
Run the code, and you’ll see that the 4 non-numeric values became NaN:
Finally, in order to replace the NaN values with zeros for an entire DataFrame using Pandas, you may use the third method:
df.fillna(0)
Applying this method for our example:
import pandas as pd df = pd.DataFrame({'values_1': ['700','ABC300','500','900XYZ'], 'values_2': ['DDD200','150','350ZZZ','400'] }) df = df.apply (pd.to_numeric, errors='coerce') df = df.fillna(0) print (df)
You’ll now get 0’s, instead of all the NaNs, across the entire DataFrame:
Case 4: replace NaN values with zeros for an entire DataFrame using NumPy
You can achieve the same goal for an entire DataFrame using NumPy:
df.replace(np.nan,0)
And for our example, you can apply the code below to replace the NaN values with zeros:
import pandas as pd import numpy as np df = pd.DataFrame({'values_1': ['700','ABC300','500','900XYZ'], 'values_2': ['DDD200','150','350ZZZ','400'] }) df = df.apply (pd.to_numeric, errors='coerce') df= df.replace(np.nan,0) print (df)
Run the code, and you’ll get the same results as in the previous case:
You can find additional information about replacing values in Pandas by visiting the Pandas documentation.
Alternatively, you may check this guide for the steps to drop rows with NaN values in Pandas DataFrame.