Convert Strings to Integers in Pandas DataFrame

Here are 3 approaches to convert strings to integers in Pandas DataFrame:

(1) The astype(int) approach:

df["dataframe_column"] = df["dataframe_column"].astype(int)

(2) The apply(int) approach:

df["dataframe_column"] = df["dataframe_column"].apply(int)

(2) The map(int) approach:

df["dataframe_column"] = df["dataframe_column"].map(int)

Let’s review an example with the steps to convert strings to integers.

Steps to Convert Strings to Integers in Pandas DataFrame

Step 1: Create a DataFrame

To start, let’s say that you want to create a DataFrame based on the following data:

productprice
aaa210
bbb250

You can capture the values under the price column as strings by enclosing those values within quotes.

This is how the DataFrame would look like in Python:

import pandas as pd

data = {"product": ["aaa", "bbb"], "price": ["210", "250"]}

df = pd.DataFrame(data)

print(df)
print(df.dtypes)

When you run the code, you’ll notice that indeed the values under the price column are strings (where the data type is object):

  product price
0     aaa   210
1     bbb   250
product    object
price      object

Step 2: Convert the Strings to Integers in Pandas DataFrame

You may use astype(int) to convert the strings to integers:

df["dataframe_column"] = df["dataframe_column"].astype(int)

For our example:

import pandas as pd

data = {"product": ["aaa", "bbb"], "price": ["210", "250"]}

df = pd.DataFrame(data)
df["price"] = df["price"].astype(int)

print(df)
print(df.dtypes)

As you can see, the values under the price column are now integers:

  product  price
0     aaa    210
1     bbb    250
product    object
price       int32

Step 3 (optional): Check the Execution Time of each Method

You can use the timeit module to check the execution time of each method:

import pandas as pd
import timeit

data = {"product": ["aaa", "bbb"], "price": ["210", "250"]}
df = pd.DataFrame(data)


def method_astype():
    df["price"] = df["price"].astype(int)


def method_apply():
    df["price"] = df["price"].apply(int)


def method_map():
    df["price"] = df["price"].map(int)


methods_used = [method_astype, method_apply, method_map]

for i in methods_used:
    print(f"{i.__name__}: {timeit.timeit(i, number=10000):.6f} seconds")

Here are the results:

method_astype: 0.516682 seconds
method_apply: 0.702825 seconds
method_map: 0.638091 seconds

Leave a Comment