Get a List of all Column Names in Pandas DataFrame

Here are two approaches to get a list of all the column names in Pandas DataFrame:

First approach:

my_list = list(df)

Second approach:

my_list = df.columns.values.tolist()

Later you’ll also see which approach is the fastest to use.

The Example

To start with a simple example, let’s create a DataFrame with 3 columns:

import pandas as pd

data = {'First_Name': ['Bill','Maria','David ','James','Mary'],
        'Last_Name': ['Anderson','Smith','Green','Miller','Carter'],
        'Age': [32,45,27,59,37]
        }

df = pd.DataFrame(data, columns = ['First_Name', 'Last_Name', 'Age'])

print (df)

Once you run the above code, you’ll see the following DataFrame with the 3 columns:

Get a List of all Column Names in Pandas DataFrame

Using list(df) to Get the List of all Column Names in Pandas DataFrame

You may use the first approach by adding my_list = list(df) to the code:

import pandas as pd

data = {'First_Name': ['Bill','Maria','David ','James','Mary'],
        'Last_Name': ['Anderson','Smith','Green','Miller','Carter'],
        'Age': [32,45,27,59,37]
        }

df = pd.DataFrame(data, columns = ['First_Name', 'Last_Name', 'Age'])

my_list = list(df)

print (my_list)

You’ll now see the List that contains the 3 column names:

List of all Column Names in Pandas DataFrame

Optionally, you can quickly verify that you got a list by adding print (type(my_list)) to the bottom of the code:

import pandas as pd

data = {'First_Name': ['Bill','Maria','David ','James','Mary'],
        'Last_Name': ['Anderson','Smith','Green','Miller','Carter'],
        'Age': [32,45,27,59,37]
        }

df = pd.DataFrame(data, columns = ['First_Name', 'Last_Name', 'Age'])

my_list = list(df)

print (my_list)
print (type(my_list))

You’ll then be able to confirm that you got a list:

Get a List of all Column Names in Pandas DataFrame

Using my_list = df.columns.values.tolist() to Get the List of all Column Names in Pandas DataFrame

Alternatively, you may apply the second approach by adding my_list = df.columns.values.tolist() to the code:

import pandas as pd

data = {'First_Name': ['Bill','Maria','David ','James','Mary'],
        'Last_Name': ['Anderson','Smith','Green','Miller','Carter'],
        'Age': [32,45,27,59,37]
        }

df = pd.DataFrame(data, columns = ['First_Name', 'Last_Name', 'Age'])

my_list = df.columns.values.tolist()

print (my_list)
print (type(my_list))

As before, you’ll now get the list with the column names:

Get a List of all Column Names in Pandas DataFrame

Which approach should you choose?

Depending on your needs, you may require to use the faster approach.

So which approach is the fastest?

Let’s check the execution time for each of the options using the timeit module:

(1) Measuring the time under the first approach of my_list = list(df):

from timeit import default_timer
import pandas as pd

data = {'First_Name': ['Bill','Maria','David ','James','Mary'],
        'Last_Name': ['Anderson','Smith','Green','Miller','Carter'],
        'Age': [32,45,27,59,37]
        }

df = pd.DataFrame(data, columns = ['First_Name', 'Last_Name', 'Age'])

beginning = default_timer()
my_list = list(df)
ending = default_timer()

print((ending - beginning)*1000)

When I ran the code in Python, I got the following execution time:

Measure Time

You may wish to run the code few times to get a better sense of the execution time.

(2) Now let’s measure the time under the second approach of my_list = df.columns.values.tolist():

from timeit import default_timer
import pandas as pd

data = {'First_Name': ['Bill','Maria','David ','James','Mary'],
        'Last_Name': ['Anderson','Smith','Green','Miller','Carter'],
        'Age': [32,45,27,59,37]
        }

df = pd.DataFrame(data, columns = ['First_Name', 'Last_Name', 'Age'])

beginning = default_timer()
my_list = df.columns.values.tolist()
ending = default_timer()

print((ending - beginning)*1000)

As you can see, the second approach is actually faster compared to the first approach:

Measure Speed

Note that the execution time may vary depending on your Pandas/Python version and/or your machine.