Fastest way to Convert Integers to Strings in Pandas DataFrame

In this article, we’ll look into the fastest way to convert integers to strings in Pandas DataFrame.

The approaches that will be measured are:

(1) map(str)

df['DataFrame Column'] = df['DataFrame Column'].map(str)

(2) apply(str)

df['DataFrame Column'] = df['DataFrame Column'].apply(str)

(3) astype(str)

df['DataFrame Column'] = df['DataFrame Column'].astype(str)

(4) values.astype(str)

df['DataFrame Column'] = df['DataFrame Column'].values.astype(str)

The Experiment

For the experiment, we’ll use Numpy, where:

  • 5 million random integers will be created
  • Each integer will fall within the range of 10 to 99 (to keep each integer to two digits)

We will then use the time package to measure which approach is the fastest way to convert the integers to strings in Pandas DataFrame.

Note that the results may vary depending on the versions of Python, Pandas, and Numpy that you’re using, as well as your computer. For this experiment, we’ll use:

  • Python version: 3.7.2
  • Pandas version: 0.24.1
  • Numpy version: 1.16.2

You may apply the following code in order to check the versions on your computer:

import pandas as pd
import numpy as np
import sys

print('Python Version: ' + sys.version)
print('Pandas Version: ' + pd.__version__)
print('Numpy Version: ' + np.version.version)

In my case, I got the following versions:

Check versions

Which approach is the Fastest way to Convert Integers to Strings in Pandas DataFrame?

So which approach is really the fastest?

Let’s find out by running the code below:

import pandas as pd
import numpy as np
import sys
import time

df_1 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_2 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_3 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_4 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])

start_time_1 = time.time()
df_1['Random numbers'] = df_1['Random numbers'].map(str)
execution_time_1 = (time.time() - start_time_1)  
print('Execution time in seconds using map(str): ' + str(execution_time_1))

start_time_2 = time.time()
df_2['Random numbers'] = df_2['Random numbers'].apply(str)
execution_time_2 = (time.time() - start_time_2)  
print('Execution time in seconds using apply(str): ' + str(execution_time_2))

start_time_3 = time.time()
df_3['Random numbers'] = df_3['Random numbers'].astype(str)
execution_time_3 = (time.time() - start_time_3)  
print('Execution time in seconds using astype(str): ' + str(execution_time_3))

start_time_4 = time.time()
df_4['Random numbers'] = df_4['Random numbers'].values.astype(str)
execution_time_4 = (time.time() - start_time_4)  
print('Execution time in seconds using values.astype(str): ' + str(execution_time_4))

Based on our experiment (and considering the versions used), the fastest way to convert integers to string in Pandas DataFrame is apply(str), while map(str) is close second:

convert integers to strings

I then ran the code using more recent versions of Python, Pandas and Numpy and got similar results:

Fastest way to Convert Integers to Strings in Pandas DataFrame

To take things further, I ran the code below in Anaconda Spyder (where the versions are different):

import pandas as pd
import numpy as np
import sys
import time

print('Python Version: ' + sys.version)
print('Pandas Version: ' + pd.__version__)
print('Numpy Version: ' + np.version.version)

df_1 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_2 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_3 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])
df_4 = pd.DataFrame(np.random.randint(10,99,size=(5000000, 1)), columns=['Random numbers'])

start_time_1 = time.time()
df_1['Random numbers'] = df_1['Random numbers'].map(str)
execution_time_1 = (time.time() - start_time_1)  
print('Execution time in seconds using map(str): ' + str(execution_time_1))

start_time_2 = time.time()
df_2['Random numbers'] = df_2['Random numbers'].apply(str)
execution_time_2 = (time.time() - start_time_2)  
print('Execution time in seconds using apply(str): ' + str(execution_time_2))

start_time_3 = time.time()
df_3['Random numbers'] = df_3['Random numbers'].astype(str)
execution_time_3 = (time.time() - start_time_3)  
print('Execution time in seconds using astype(str): ' + str(execution_time_3))

start_time_4 = time.time()
df_4['Random numbers'] = df_4['Random numbers'].values.astype(str)
execution_time_4 = (time.time() - start_time_4)  
print('Execution time in seconds using values.astype(str): ' + str(execution_time_4))

As you may observe, the results in Anaconda are consistent where apply(str) is slightly faster than map(str):

Fastest way to Convert Integers to Strings in Pandas DataFrame

Conclusion

So which approach should you apply?

If speed is what you need, then you may consider either apply(str) or map(str).

You’ll need to take into consideration additional factors, such as the versions installed, as well as the computer used.

You may also want to check the following guide for the complete steps to convert integers to strings in your DataFrame.