Need to convert strings to floats in pandas DataFrame?
Depending on the scenario, you may use either of the following two methods in order to convert strings to floats in pandas DataFrame:
(1) astype(float) method
df['DataFrame Column'] = df['DataFrame Column'].astype(float)
(2) to_numeric method
df['DataFrame Column'] = pd.to_numeric(df['DataFrame Column'],errors='coerce')
Want to see how to apply those two methods in practice?
If so, in this tutorial, I’ll review 2 scenarios to demonstrate how to convert strings to floats:
(1) For a column that contains numeric values stored as strings; and
(2) For a column that contains both numeric and non-numeric values
Scenarios to Convert Strings to Floats in Pandas DataFrame
Scenario 1: Numeric values stored as strings
To keep things simple, let’s create a DataFrame with only two columns:
Product | Price |
ABC | 250 |
XYZ | 270 |
Below is the code to create the DataFrame in Python, where the values under the ‘Price’ column are stored as strings (by using single quotes around those values. Note that the same concepts would apply by using double quotes):
import pandas as pd Data = {'Product': ['ABC','XYZ'], 'Price': ['250','270']} df = pd.DataFrame(Data) print (df) print (df.dtypes)
Run the code in Python and you would see that the data type for the ‘Price’ column is Object:
The goal is to convert the values under the ‘Price’ column into a float.
You can then use the astype(float) method to perform the conversion into a float:
df['DataFrame Column'] = df['DataFrame Column'].astype(float)
In the context of our example, the ‘DataFrame Column’ is the ‘Price’ column. And so, the full code to convert the values into a float would be:
import pandas as pd Data = {'Product': ['ABC','XYZ'], 'Price': ['250','270']} df = pd.DataFrame(Data) df['Price'] = df['Price'].astype(float) print (df) print (df.dtypes)
You’ll now see that the Price column has been converted into a float:
Scenario 2: Numeric and non-numeric values
Let’s create a new DataFrame with two columns (the Product and Price columns). Only this time, the values under the Price column would contain a combination of both numeric and non-numeric data:
Product | Price |
AAA | 250 |
BBB | ABC260 |
CCC | 270 |
DDD | 280XYZ |
This is how the DataFrame would look like in Python:
import pandas as pd Data = {'Product': ['AAA','BBB','CCC','DDD'], 'Price': ['250','ABC260','270','280XYZ']} df = pd.DataFrame(Data) print (df) print(df.dtypes)
As before, the data type for the Price column is Object:
You can then use the to_numeric method in order to convert the values under the Price column into a float:
df['DataFrame Column'] = pd.to_numeric(df['DataFrame Column'], errors='coerce')
By setting errors=’coerce’, you’ll transform the non-numeric values into NaN.
Here it the complete code that you can use:
import pandas as pd Data = {'Product': ['AAA','BBB','CCC','DDD'], 'Price': ['250','ABC260','270','280XYZ']} df = pd.DataFrame(Data) df['Price'] = pd.to_numeric(df['Price'], errors='coerce') print (df) print(df.dtypes)
Run the code and you’ll see that the Price column is now a float:
To take things further, you can even replace the ‘NaN’ values with ‘0’ values by using df.replace:
import pandas as pd import numpy as np Data = {'Product': ['AAA','BBB','CCC','DDD'], 'Price': ['250','ABC260','270','280XYZ']} df = pd.DataFrame(Data) df ['Price'] = pd.to_numeric(df['Price'], errors='coerce') df = df.replace(np.nan, 0, regex=True) print (df) print(df.dtypes)
And here is what you’ll get: