Fastest way to Convert Integers to Strings in Pandas DataFrame

In this article, we’ll examine the fastest way to convert integers to strings in Pandas DataFrame.

The approaches that will be measured are:

(1) map(str)

df["DataFrame Column"] = df["DataFrame Column"].map(str)

(2) apply(str)

df["DataFrame Column"] = df["DataFrame Column"].apply(str)

(3) astype(str)

df["DataFrame Column"] = df["DataFrame Column"].astype(str)

(4) values.astype(str)

df["DataFrame Column"] = df["DataFrame Column"].values.astype(str)

The Experiment

For the experiment, we’ll use NumPy, where:

  • 5 million random integers will be created
  • Each integer will fall within the range of 10 to 99 (to keep each integer to two digits)

We will also use the timeit module to measure the execution time of a given statement by running it a specified number of times, and averaging the results.

Note that the results may vary depending on the versions of Python, Pandas, and Numpy that you’re using, as well as your computer. For this experiment, we’ll use:

  • Python version: 3.12.0
  • Pandas version: 2.1.2
  • Numpy version: 1.26.1

To check the versions on your computer:

import sys

import numpy as np
import pandas as pd

print("Python Version: " + sys.version)
print("Pandas Version: " + pd.__version__)
print("Numpy Version: " + np.version.version)

For our example:

Python Version: 3.12.0
Pandas Version: 2.1.2
Numpy Version: 1.26.1

Which approach is the Fastest way to Convert Integers to Strings?

So which approach is really the fastest?

Let’s find out by running the code below:

import timeit
import pandas as pd
import numpy as np

# Setup code that creates a DataFrame containing random integers
setup_code = """
df = pd.DataFrame(np.random.randint(10, 99, size=(5000000, 1)), columns=['random_numbers'])
"
""

# List of statements to measure the execution time
statements = [
"df['random_numbers'] = df['random_numbers'].map(str)",
"df['random_numbers'] = df['random_numbers'].apply(str)",
"df['random_numbers'] = df['random_numbers'].astype(str)",
"df['random_numbers'] = df['random_numbers'].values.astype(str)"
]

# Measure the execution time for each statement and average over five runs
for s in statements:
exec_time = timeit.timeit(stmt=s, setup=setup_code, globals=globals(), number=5) / 5
print(f"Execution time for {s}: {exec_time} seconds")

Based on our experiment (and considering the versions used), the fastest way to convert integers to strings in Pandas DataFrame is astype(str):

Execution time for df['random_numbers'] = df['random_numbers'].map(str): 0.341262839990668 seconds
Execution time for df['random_numbers'] = df['random_numbers'].apply(str): 0.35109735999722036 seconds
Execution time for df['random_numbers'] = df['random_numbers'].astype(str): 0.32315470001194624 seconds
Execution time for df['random_numbers'] = df['random_numbers'].values.astype(str): 0.6795214000158012 seconds

Leave a Comment