Here are two approaches to get a list of all the column names in Pandas DataFrame:
First approach:
my_list = list(df)
Second approach:
my_list = df.columns.values.tolist()
Later you’ll also observe which approach is the fastest to use.
The Example
To start with a simple example, let’s create a DataFrame with 3 columns:
import pandas as pd data = {'Name': ['Bill','Maria','David','James','Mary'], 'Age': [32,45,27,59,37], 'Country': ['Spain','Canada','Brazil','UK','France'] } df = pd.DataFrame(data) print (df)
Once you run the above code, you’ll see the following DataFrame with the 3 columns:
Name Age Country
0 Bill 32 Spain
1 Maria 45 Canada
2 David 27 Brazil
3 James 59 UK
4 Mary 37 France
Using list(df) to Get the List of all Column Names in Pandas DataFrame
You may use the first approach by adding my_list = list(df) to the code:
import pandas as pd data = {'Name': ['Bill','Maria','David','James','Mary'], 'Age': [32,45,27,59,37], 'Country': ['Spain','Canada','Brazil','UK','France'] } df = pd.DataFrame(data) my_list = list(df) print (my_list)
You’ll now see the List that contains the 3 column names:
['Name', 'Age', 'Country']
Optionally, you can quickly verify that you got a list by adding print (type(my_list)) to the bottom of the code:
import pandas as pd data = {'Name': ['Bill','Maria','David','James','Mary'], 'Age': [32,45,27,59,37], 'Country': ['Spain','Canada','Brazil','UK','France'] } df = pd.DataFrame(data) my_list = list(df) print (my_list) print (type(my_list))
You’ll then be able to confirm that you got a list:
['Name', 'Age', 'Country']
<class 'list'>
Using my_list = df.columns.values.tolist() to Get the List of all Column Names in Pandas DataFrame
Alternatively, you may apply the second approach by adding my_list = df.columns.values.tolist() to the code:
import pandas as pd data = {'Name': ['Bill','Maria','David','James','Mary'], 'Age': [32,45,27,59,37], 'Country': ['Spain','Canada','Brazil','UK','France'] } df = pd.DataFrame(data) my_list = df.columns.values.tolist() print (my_list) print (type(my_list))
As before, you’ll now get the list with the column names:
['Name', 'Age', 'Country']
<class 'list'>
Which approach should you choose?
Depending on your needs, you may require to use the faster approach.
So which approach is the fastest?
Let’s check the execution time for each of the options using the timeit module:
(1) Measuring the time under the first approach of my_list = list(df):
from timeit import default_timer import pandas as pd data = {'Name': ['Bill','Maria','David','James','Mary'], 'Age': [32,45,27,59,37], 'Country': ['Spain','Canada','Brazil','UK','France'] } df = pd.DataFrame(data) beginning = default_timer() my_list = list(df) ending = default_timer() print((ending - beginning)*1000)
Here is an example of the execution time:
0.011199999999988997
You may wish to run the code few times to get a better sense of the execution time.
(2) Now let’s measure the time under the second approach of my_list = df.columns.values.tolist():
from timeit import default_timer import pandas as pd data = {'Name': ['Bill','Maria','David','James','Mary'], 'Age': [32,45,27,59,37], 'Country': ['Spain','Canada','Brazil','UK','France'] } df = pd.DataFrame(data) beginning = default_timer() my_list = df.columns.values.tolist() ending = default_timer() print((ending - beginning)*1000)
As you can see, the second approach is actually faster compared to the first approach:
0.005499999999991623
Note that the execution time may vary depending on your Pandas/Python version and/or your machine.