In this guide, you’ll see two approaches to convert strings into integers in Pandas DataFrame:
(1) The astype(int) approach:
df['DataFrame Column'] = df['DataFrame Column'].astype(int)
(2) The to_numeric approach:
df['DataFrame Column'] = pd.to_numeric(df['DataFrame Column'])
Let’s now review few examples with the steps to convert strings into integers.
Steps to Convert Strings to Integers in Pandas DataFrame
Step 1: Create a DataFrame
To start, let’s say that you want to create a DataFrame for the following data:
Product | Price |
AAA | 210 |
BBB | 250 |
You can capture the values under the Price column as strings by placing those values within quotes.
This is how the DataFrame would look like in Python:
import pandas as pd data = {'Product': ['AAA','BBB'], 'Price': ['210','250']} df = pd.DataFrame(data) print (df) print (df.dtypes)
When you run the code, you’ll notice that indeed the values under the Price column are strings (where the data type is object):
Product Price
0 AAA 210
1 BBB 250
Product object
Price object
Step 2: Convert the Strings to Integers in Pandas DataFrame
Now how do you convert those strings values into integers?
You may use the first approach of astype(int) to perform the conversion:
df['DataFrame Column'] = df['DataFrame Column'].astype(int)
Since in our example the ‘DataFrame Column’ is the Price column (which contains the strings values), you’ll then need to add the following syntax:
df['Price'] = df['Price'].astype(int)
So this is the complete Python code that you may apply to convert the strings into integers in Pandas DataFrame:
import pandas as pd data = {'Product': ['AAA','BBB'], 'Price': ['210','250']} df = pd.DataFrame(data) df['Price'] = df['Price'].astype(int) print (df) print (df.dtypes)
As you can see, the values under the Price column are now integers:
Product Price
0 AAA 210
1 BBB 250
Product object
Price int32
Step 3 (optional): Convert the Strings to Integers using to_numeric
For this optional step, you may use the second approach of to_numeric to convert the strings to integers:
df['DataFrame Column'] = pd.to_numeric(df['DataFrame Column'])
And this is the complete Python code to perform the conversion:
import pandas as pd data = {'Product': ['AAA','BBB'], 'Price': ['210','250']} df = pd.DataFrame(data) df['Price'] = pd.to_numeric(df['Price']) print (df) print (df.dtypes)
You’ll now see that the values under the Price column are indeed integers:
Product Price
0 AAA 210
1 BBB 250
Product object
Price int64
What if your column contains a combination of numeric and non-numeric values?
For example, in the DataFrame below, there are both numeric and non-numeric values under the Price column:
Product | Price |
AAA | 210 |
BBB | 250 |
CCC | 22XYZ |
In that case, you can still use to_numeric in order to convert the strings:
df['DataFrame Column'] = pd.to_numeric(df['DataFrame Column'], errors='coerce')
By setting errors=’coerce’, you’ll transform the non-numeric values into NaN.
Here is the Python code:
import pandas as pd data = {'Product': ['AAA','BBB','CCC'], 'Price': ['210','250','22XYZ']} df = pd.DataFrame(data) df['Price'] = pd.to_numeric(df['Price'],errors='coerce') print (df) print (df.dtypes)
You’ll now notice the NaN value, where the data type is float:
Product Price
0 AAA 210.0
1 BBB 250.0
2 CCC NaN
Product object
Price float64
You can take things further by replacing the ‘NaN’ values with ‘0’ values using df.replace:
import pandas as pd import numpy as np data = {'Product': ['AAA','BBB','CCC'], 'Price': ['210','250','22XYZ']} df = pd.DataFrame(data) df['Price'] = pd.to_numeric(df['Price'],errors='coerce') df = df.replace(np.nan, 0, regex=True) df['Price'] = df['Price'].astype(int) print (df) print (df.dtypes)
When you run the code, you’ll get a ‘0’ value instead of the NaN value, as well as the data type of integer:
Product Price
0 AAA 210
1 BBB 250
2 CCC 0
Product object
Price int32