Need to convert strings to floats in Pandas DataFrame?
Depending on the scenario, you may use either of the following two approaches in order to convert strings to floats in Pandas DataFrame:
(1) astype(float)
df['DataFrame Column'] = df['DataFrame Column'].astype(float)
(2) to_numeric
df['DataFrame Column'] = pd.to_numeric(df['DataFrame Column'],errors='coerce')
In this short guide, you’ll see 3 scenarios with the steps to convert strings to floats:
- For a column that contains numeric values stored as strings
- For a column that contains both numeric and non-numeric values
- For an entire DataFrame
Scenarios to Convert Strings to Floats in Pandas DataFrame
Scenario 1: Numeric values stored as strings
To keep things simple, let’s create a DataFrame with only two columns:
Product | Price |
ABC | 250 |
XYZ | 270 |
Below is the code to create the DataFrame in Python, where the values under the ‘Price’ column are stored as strings (by using single quotes around those values. Note that the same concepts would apply by using double quotes):
import pandas as pd data = {'Product': ['ABC','XYZ'], 'Price': ['250','270'] } df = pd.DataFrame(data) print (df) print (df.dtypes)
Run the code in Python, and you’ll see that the data type for the ‘Price’ column is Object:
Product Price
0 ABC 250
1 XYZ 270
Product object
Price object
dtype: object
The goal is to convert the values under the ‘Price’ column into floats.
You can then use the astype(float) approach to perform the conversion into floats:
df['DataFrame Column'] = df['DataFrame Column'].astype(float)
In the context of our example, the ‘DataFrame Column’ is the ‘Price’ column. And so, the full code to convert the values to floats would be:
import pandas as pd data = {'Product': ['ABC','XYZ'], 'Price': ['250','270'] } df = pd.DataFrame(data) df['Price'] = df['Price'].astype(float) print (df) print (df.dtypes)
You’ll now see that the ‘Price’ column has been converted into a float:
Product Price
0 ABC 250.0
1 XYZ 270.0
Product object
Price float64
dtype: object
Scenario 2: Numeric and non-numeric values
Let’s create a new DataFrame with two columns (the ‘Product’ and the ‘Price’ columns). Only this time, the values under the ‘Price’ column would contain a combination of both numeric and non-numeric data:
Product | Price |
AAA | 250 |
BBB | ABC260 |
CCC | 270 |
DDD | 280XYZ |
This is how the DataFrame would look like in Python:
import pandas as pd data = {'Product': ['AAA','BBB','CCC','DDD'], 'Price': ['250','ABC260','270','280XYZ'] } df = pd.DataFrame(data) print (df) print(df.dtypes)
As before, the data type for the ‘Price’ column is Object:
Product Price
0 AAA 250
1 BBB ABC260
2 CCC 270
3 DDD 280XYZ
Product object
Price object
dtype: object
You can then use the to_numeric approach in order to convert the values under the ‘Price’ column into floats:
df['DataFrame Column'] = pd.to_numeric(df['DataFrame Column'], errors='coerce')
By setting errors=’coerce’, you’ll transform the non-numeric values into NaN.
Here it the complete code that you can use:
import pandas as pd data = {'Product': ['AAA','BBB','CCC','DDD'], 'Price': ['250','ABC260','270','280XYZ'] } df = pd.DataFrame(data) df['Price'] = pd.to_numeric(df['Price'], errors='coerce') print (df) print(df.dtypes)
Run the code, and you’ll see that the ‘Price’ column is now a float:
Product Price
0 AAA 250.0
1 BBB NaN
2 CCC 270.0
3 DDD NaN
Product object
Price float64
dtype: object
To take things further, you can even replace the ‘NaN’ values with ‘0’ values by using df.replace:
import pandas as pd import numpy as np data = {'Product': ['AAA','BBB','CCC','DDD'], 'Price': ['250','ABC260','270','280XYZ'] } df = pd.DataFrame(data) df ['Price'] = pd.to_numeric(df['Price'], errors='coerce') df = df.replace(np.nan, 0, regex=True) print (df) print(df.dtypes)
And here is what you’ll get:
Product Price
0 AAA 250.0
1 BBB 0.0
2 CCC 270.0
3 DDD 0.0
Product object
Price float64
dtype: object
Scenario 3: Convert Strings to Floats under the Entire DataFrame
For the final scenario, let’s create a DataFrame with 3 columns, where all the values will be stored as strings (using single quotes):
import pandas as pd data = {'Price_1': ['300','750','600','770','920'], 'Price_2': ['250','270','950','580','410'], 'Price_3': ['530','480','420','290','830'] } df = pd.DataFrame(data) print (df) print (df.dtypes)
As you can see, the data type of all the columns across the DataFrame is object:
Price_1 Price_2 Price_3
0 300 250 530
1 750 270 480
2 600 950 420
3 770 580 290
4 920 410 830
Price_1 object
Price_2 object
Price_3 object
dtype: object
You can then add the following syntax to convert all the values into floats under the entire DataFrame:
df = df.astype(float)
So the complete Python code to perform the conversion would be:
import pandas as pd data = {'Price_1': ['300','750','600','770','920'], 'Price_2': ['250','270','950','580','410'], 'Price_3': ['530','480','420','290','830'] } df = pd.DataFrame(data) df = df.astype(float) print (df) print (df.dtypes)
All the columns under the entire DataFrame are now floats:
Price_1 Price_2 Price_3
0 300.0 250.0 530.0
1 750.0 270.0 480.0
2 600.0 950.0 420.0
3 770.0 580.0 290.0
4 920.0 410.0 830.0
Price_1 float64
Price_2 float64
Price_3 float64
dtype: object
You may also want to check the following guides for additional conversions of: