Get a List of all Column Names in Pandas DataFrame

Here are two approaches to get a list of all the column names in Pandas DataFrame:

First approach:

my_list = list(df)

Second approach:

my_list = df.columns.values.tolist()

Later you’ll also observe which approach is the fastest to use.

The Example

To start with a simple example, let’s create a DataFrame with 3 columns:

import pandas as pd

data = {'Name': ['Bill','Maria','David','James','Mary'],
        'Age': [32,45,27,59,37],
        'Country': ['Spain','Canada','Brazil','UK','France']
        }

df = pd.DataFrame(data)

print (df)

Once you run the above code, you’ll see the following DataFrame with the 3 columns:

    Name  Age  Country
0   Bill   32    Spain
1  Maria   45   Canada
2  David   27   Brazil
3  James   59       UK
4   Mary   37   France

Using list(df) to Get the List of all Column Names in Pandas DataFrame

You may use the first approach by adding my_list = list(df) to the code:

import pandas as pd

data = {'Name': ['Bill','Maria','David','James','Mary'],
        'Age': [32,45,27,59,37],
        'Country': ['Spain','Canada','Brazil','UK','France']
        }

df = pd.DataFrame(data)

my_list = list(df)

print (my_list)

You’ll now see the List that contains the 3 column names:

['Name', 'Age', 'Country']

Optionally, you can quickly verify that you got a list by adding print (type(my_list)) to the bottom of the code:

import pandas as pd

data = {'Name': ['Bill','Maria','David','James','Mary'],
        'Age': [32,45,27,59,37],
        'Country': ['Spain','Canada','Brazil','UK','France']
        }

df = pd.DataFrame(data)

my_list = list(df)

print (my_list)
print (type(my_list))

You’ll then be able to confirm that you got a list:

['Name', 'Age', 'Country']
<class 'list'>

Using my_list = df.columns.values.tolist() to Get the List of all Column Names in Pandas DataFrame

Alternatively, you may apply the second approach by adding my_list = df.columns.values.tolist() to the code:

import pandas as pd

data = {'Name': ['Bill','Maria','David','James','Mary'],
        'Age': [32,45,27,59,37],
        'Country': ['Spain','Canada','Brazil','UK','France']
        }

df = pd.DataFrame(data)

my_list = df.columns.values.tolist()

print (my_list)
print (type(my_list))

As before, you’ll now get the list with the column names:

['Name', 'Age', 'Country']
<class 'list'>

Which approach should you choose?

Depending on your needs, you may require to use the faster approach.

So which approach is the fastest?

Let’s check the execution time for each of the options using the timeit module:

(1) Measuring the time under the first approach of my_list = list(df):

from timeit import default_timer
import pandas as pd

data = {'Name': ['Bill','Maria','David','James','Mary'],
        'Age': [32,45,27,59,37],
        'Country': ['Spain','Canada','Brazil','UK','France']
        }

df = pd.DataFrame(data)

beginning = default_timer()
my_list = list(df)
ending = default_timer()

print((ending - beginning)*1000)

Here is an example of the execution time:

0.011199999999988997

You may wish to run the code few times to get a better sense of the execution time.

(2) Now let’s measure the time under the second approach of my_list = df.columns.values.tolist():

from timeit import default_timer
import pandas as pd

data = {'Name': ['Bill','Maria','David','James','Mary'],
        'Age': [32,45,27,59,37],
        'Country': ['Spain','Canada','Brazil','UK','France']
        }

df = pd.DataFrame(data)

beginning = default_timer()
my_list = df.columns.values.tolist()
ending = default_timer()

print((ending - beginning)*1000)

As you can see, the second approach is actually faster compared to the first approach:

0.005499999999991623

Note that the execution time may vary depending on your Pandas/Python version and/or your machine.