How to Convert Strings to Datetime in Pandas DataFrame

You may use this template in order to convert strings to datetime in Pandas DataFrame:

df['DataFrame Column'] = pd.to_datetime(df['DataFrame Column'], format=specify your format)

Note that the strings must match the format specified. Later, you’ll see several scenarios for different formats.

Steps to Convert Strings to Datetime in Pandas DataFrame

Step 1: Collect the Data to be Converted

To begin, collect the data that you’d like to convert to datetime.

For example, here is a simple dataset about 3 different dates (with a format of yyyymmdd), when a store might be opened or closed:

Dates Status
20210305 Opened
20210316 Opened
20210328 Closed

Step 2: Create a DataFrame

Next, create a DataFrame to capture the above data in Python. You can capture the dates as strings by placing quotes around the values under the ‘dates’ column:

import pandas as pd

values = {'dates':  ['20210305','20210316','20210328'],
          'status': ['Opened','Opened','Closed']
          }

df = pd.DataFrame(values)

print (df)
print (df.dtypes)

Run the code in Python, and you’ll get this DataFrame:

      dates  status
0  20210305  Opened
1  20210316  Opened
2  20210328  Closed
dates     object
status    object
dtype: object

Notice that the ‘dates’ were indeed stored as strings (represented by object).

Step 3: Convert the Strings to Datetime in the DataFrame

You may then use the template below in order to convert the strings to datetime in Pandas DataFrame:

df['DataFrame Column'] = pd.to_datetime(df['DataFrame Column'], format=specify your format)

Recall that for our example, the date format is yyyymmdd.

This date format can be represented as:

format='%Y%m%d'

Note that the strings data (yyyymmdd) must match the format specified (%Y%m%d). You may refer to the following source for the different formats that you may apply.

For our example, the complete Python code to convert the strings to datetime would be:

import pandas as pd

values = {'dates': ['20210305','20210316','20210328'],
          'status': ['Opened','Opened','Closed']
          }

df = pd.DataFrame(values)

df['dates'] = pd.to_datetime(df['dates'], format='%Y%m%d')

print (df)
print (df.dtypes)

You’ll see that the data type for the ‘dates’ column is now datetime:

       dates  status
0 2021-03-05  Opened
1 2021-03-16  Opened
2 2021-03-28  Closed
dates     datetime64[ns]
status            object
dtype: object

Note that when applying pd.to_datetime, the default format is yyyymmdd. So in the above particular example, you could remove the format =’%Y%m%d’ from the code. However, in other scenarios, as you’ll see below, you must specify the correct format to match with the strings data.

Converting Additional Formats

Let’s say that the dates are now formatted as ddmmyyyy:

Dates Status
05032021 Opened
16032021 Opened
28032021 Closed

In that case, you’ll need to apply the format below (for reference, check the following table to identify the correct format that you should apply):

format='%d%m%Y'

Here is the complete Python code:

import pandas as pd

values = {'dates':  ['05032021','16032021','28032021'],
          'status': ['Opened','Opened','Closed']
          }

df = pd.DataFrame(values)

df['dates'] = pd.to_datetime(df['dates'], format='%d%m%Y')

print (df)
print (df.dtypes)

As before, your strings will now get converted to datetime:

       dates  status
0 2021-03-05  Opened
1 2021-03-16  Opened
2 2021-03-28  Closed
dates     datetime64[ns]
status            object
dtype: object

What if your dates have a ddmmmyyyy format (e.g., 05Mar2021)?

You’ll then need to apply the format below (by changing the ‘m’ to ‘b’):

format='%d%b%Y' 

So your complete Python code would look like this:

import pandas as pd

values = {'dates':  ['05Mar2021','16Mar2021','28Mar2021'],
          'status': ['Opened','Opened','Closed']
          }

df = pd.DataFrame(values)

df['dates'] = pd.to_datetime(df['dates'], format='%d%b%Y')

print (df)
print (df.dtypes)

You’ll now get the datetime format:

       dates  status
0 2021-03-05  Opened
1 2021-03-16  Opened
2 2021-03-28  Closed
dates     datetime64[ns]
status            object
dtype: object

Let’s say that your dates now contain dashes (e.g., ’05-Mar-2021′) .

In that case, simply add those dashes as follows:

format='%d-%b-%Y'

Here is the full Python code:

import pandas as pd

values = {'dates':  ['05-Mar-2021','16-Mar-2021','28-Mar-2021'],
          'status': ['Opened','Opened','Closed']
          }

df = pd.DataFrame(values)

df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y')

print (df)
print (df.dtypes)

And the result:

       dates  status
0 2021-03-05  Opened
1 2021-03-16  Opened
2 2021-03-28  Closed
dates     datetime64[ns]
status            object
dtype: object

Formats with Timestamps

Suppose that your strings contain both the dates and times:

Dates Status
20210305093000 Opened
20210316093000 Opened
20210328200000 Closed

In that case, the format that should be specified is:

format='%Y%m%d%H%M%S'

So the full Python code would be:

import pandas as pd

values = {'dates':  ['20210305093000','20210316093000','20210328200000'],
          'status': ['Opened','Opened','Closed']
          }

df = pd.DataFrame(values)

df['dates'] = pd.to_datetime(df['dates'], format='%Y%m%d%H%M%S')

print (df)
print (df.dtypes)

You’ll now see the datetime format:

                dates  status
0 2021-03-05 09:30:00  Opened
1 2021-03-16 09:30:00  Opened
2 2021-03-28 20:00:00  Closed
dates     datetime64[ns]
status            object
dtype: object

Now let’s say that the strings contain characters, such as the dash character (“-“) to separate between the date and the time:

Dates Status
20210305-093000 Opened
20210316-093000 Opened
20210328-200000 Closed

In that scenario, the format should include the dash as well:

format='%Y%m%d-%H%M%S'

Here is the full Python code:

import pandas as pd

values = {'dates':  ['20210305-093000','20210316-093000','20210328-200000'],
          'status': ['Opened','Opened','Closed']
          }

df = pd.DataFrame(values)

df['dates'] = pd.to_datetime(df['dates'], format='%Y%m%d-%H%M%S')

print (df)
print (df.dtypes)

And the result:

                dates  status
0 2021-03-05 09:30:00  Opened
1 2021-03-16 09:30:00  Opened
2 2021-03-28 20:00:00  Closed
dates     datetime64[ns]
status            object
dtype: object