Here are 3 approaches to convert strings to integers in Pandas DataFrame:
(1) The astype(int) approach:
df["dataframe_column"] = df["dataframe_column"].astype(int)
(2) The apply(int) approach:
df["dataframe_column"] = df["dataframe_column"].apply(int)
(2) The map(int) approach:
df["dataframe_column"] = df["dataframe_column"].map(int)
Let’s review an example with the steps to convert strings to integers.
Steps to Convert Strings to Integers in Pandas DataFrame
Step 1: Create a DataFrame
To start, let’s say that you want to create a DataFrame based on the following data:
product | price |
aaa | 210 |
bbb | 250 |
You can capture the values under the price column as strings by enclosing those values within quotes.
This is how the DataFrame would look like in Python:
import pandas as pd data = {"product": ["aaa", "bbb"], "price": ["210", "250"]} df = pd.DataFrame(data) print(df) print(df.dtypes)
When you run the code, you’ll notice that indeed the values under the price column are strings (where the data type is object):
product price
0 aaa 210
1 bbb 250
product object
price object
Step 2: Convert the Strings to Integers in Pandas DataFrame
You may use astype(int) to convert the strings to integers:
df["dataframe_column"] = df["dataframe_column"].astype(int)
For our example:
import pandas as pd data = {"product": ["aaa", "bbb"], "price": ["210", "250"]} df = pd.DataFrame(data) df["price"] = df["price"].astype(int) print(df) print(df.dtypes)
As you can see, the values under the price column are now integers:
product price
0 aaa 210
1 bbb 250
product object
price int32
Step 3 (optional): Check the Execution Time of each Method
You can use the timeit module to check the execution time of each method:
import pandas as pd import timeit data = {"product": ["aaa", "bbb"], "price": ["210", "250"]} df = pd.DataFrame(data) def method_astype(): df["price"] = df["price"].astype(int) def method_apply(): df["price"] = df["price"].apply(int) def method_map(): df["price"] = df["price"].map(int) methods_used = [method_astype, method_apply, method_map] for i in methods_used: print(f"{i.__name__}: {timeit.timeit(i, number=10000):.6f} seconds")
Here are the results:
method_astype: 0.516682 seconds
method_apply: 0.702825 seconds
method_map: 0.638091 seconds