Fastest way to Convert Integers to Strings in Pandas DataFrame

In this article, we’ll look into the fastest way to convert integers to strings in Pandas DataFrame.

The approaches that will be measured are:

(1) map(str)

df['DataFrame Column'] = df['DataFrame Column'].map(str)

(2) apply(str)

df['DataFrame Column'] = df['DataFrame Column'].apply(str)

(3) astype(str)

df['DataFrame Column'] = df['DataFrame Column'].astype(str)

(4) values.astype(str)

df['DataFrame Column'] = df['DataFrame Column'].values.astype(str)

The Experiment

For the experiment, we’ll use Numpy, where:

  • 5 million random integers will be created
  • Each integer will fall within the range of 10 to 99 (to keep each integer to two digits)

We will then use the time package to measure which approach is the fastest way to convert the integers to strings in Pandas DataFrame.

Note that the results may vary depending on the versions of Python, Pandas, and Numpy that you’re using, as well as your computer. For this experiment, we’ll use:

  • Python version: 3.9.0
  • Pandas version: 1.2.4
  • Numpy version: 1.19.3

You may apply the following code in order to check the versions on your computer:

import pandas as pd
import numpy as np
import sys

print('Python Version: ' + sys.version)
print('Pandas Version: ' + pd.__version__)
print('Numpy Version: ' + np.version.version)

For our example:

Python Version: 3.9.0
Pandas Version: 1.2.4
Numpy Version: 1.19.3

Which approach is the Fastest way to Convert Integers to Strings?

So which approach is really the fastest?

Let’s find out by running the code below:

import pandas as pd
import numpy as np
import sys
import time

df_1 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_2 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_3 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_4 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])

start_time_1 = time.time()
df_1['Random numbers'] = df_1['Random numbers'].map(str)
execution_time_1 = (time.time() - start_time_1)  
print('Execution time in seconds using map(str): ' + str(execution_time_1))

start_time_2 = time.time()
df_2['Random numbers'] = df_2['Random numbers'].apply(str)
execution_time_2 = (time.time() - start_time_2)  
print('Execution time in seconds using apply(str): ' + str(execution_time_2))

start_time_3 = time.time()
df_3['Random numbers'] = df_3['Random numbers'].astype(str)
execution_time_3 = (time.time() - start_time_3)  
print('Execution time in seconds using astype(str): ' + str(execution_time_3))

start_time_4 = time.time()
df_4['Random numbers'] = df_4['Random numbers'].values.astype(str)
execution_time_4 = (time.time() - start_time_4)  
print('Execution time in seconds using values.astype(str): ' + str(execution_time_4))

Based on our experiment (and considering the versions used), the fastest way to convert integers to strings in Pandas DataFrame is apply(str), while map(str) is close second:

Execution time in seconds using map(str): 0.9216582775115967
Execution time in seconds using apply(str): 0.8591742515563965
Execution time in seconds using astype(str): 2.666469097137451
Execution time in seconds using values.astype(str): 2.2509682178497314

You may rerun the code again to check for consistent results. Here we got similar results:

Execution time in seconds using map(str): 0.827937126159668
Execution time in seconds using apply(str): 0.8123118877410889
Execution time in seconds using astype(str): 2.5619332790374756
Execution time in seconds using values.astype(str): 2.1869211196899414

Conclusion

So which approach should you use?

If speed is what you need, then you may consider either apply(str) or map(str).

You’ll need to take into consideration additional factors, such as the versions installed, as well as the computer used.

You may also want to check the following guide for the complete steps to convert integers to strings in your DataFrame.