How to Create Pandas DataFrame in Python

Need to create Pandas DataFrame in Python?

If so, I’ll show you two different methods to create Pandas DataFrame:

  • By typing the values in Python itself to create the DataFrame
  • By importing the values from a file (such as an Excel file), and then creating the DataFrame in Python based on the values imported

Method 1: typing values in Python to create Pandas DataFrame

To create Pandas DataFrame in Python, you can follow this generic template:

import pandas as pd

data = {'First Column Name':  ['First value', 'Second value',...],
        'Second Column Name': ['First value', 'Second value',...],
         ....
        }

df = pd.DataFrame (data, columns = ['First Column Name','Second Column Name',...])

print (df)

Note that you don’t need to use quotes around numeric values (unless you wish to capture those values as strings).

Now let’s see how to apply the above template using a simple example.

To start, let’s say that you have the following data about Cars, and that you want to capture that data in Python using Pandas DataFrame:

BrandPrice
Honda Civic22000
Toyota Corolla25000
Ford Focus27000
Audi A435000

This is how the Python code would look like for our example:

import pandas as pd

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

print (df)

Run the Python code, and you’ll get the following DataFrame:

How to Create Pandas DataFrame in Python

You may have noticed that each row is represented by a number (also known as the index) starting from 0. Alternatively, you may assign another value/name to represent each row.

For example, in the code below, the index=[‘Car_1′,’Car_2′,’Car_3′,’Car_4’] was added:

import pandas as pd

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

df = pd.DataFrame(cars, columns = ['Brand','Price'], index=['Car_1','Car_2','Car_3','Car_4'])

print (df)

You’ll now see the newly assigned index:

DataFrame index

Let’s now review the second method of importing the values into Python to create the DataFrame.

Method 2: importing values from an Excel file to create Pandas DataFrame

You can use the following template to import an Excel file into Python in order to create your DataFrame:

import pandas as pd

data = pd.read_excel(r'Path where the Excel file is stored\File name.xlsx') #for an earlier version of Excel use 'xls'
df = pd.DataFrame(data, columns = ['First Column Name','Second Column Name',...])

print (df)

Make sure that the columns names specified in the code exactly match to the column names in the Excel file.

Let’s say that you have the following table stored in an Excel file (where the Excel file name is ‘Cars’):

BrandPrice
Honda Civic22000
Toyota Corolla25000
Ford Focus27000
Audi A435000

In the Python code below, you’ll need to change the path name to reflect the location where the Excel file is stored on your computer.

In my case, the Excel file is saved on my desktop, under the following path:

 ‘C:\Users\Ron\Desktop\Cars.xlsx’

Once you imported the data into Python, you’ll be able to assign it to the DataFrame. Here is the full Python code for our example:

import pandas as pd

cars = pd.read_excel(r'C:\Users\Ron\Desktop\Cars.xlsx')
df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

print (df)

As before, you’ll get the same Pandas DataFrame in Python:

How to Create Pandas DataFrame in Python

Note: you will have to install xlrd if you get the following error when running the code:

ImportError: Install xlrd >= 1.0.0 for Excel support

You may then use the PIP install method to install xlrd as follows:

pip install xlrd

You can also create the same DataFrame if you need to import a CSV file into Python, rather than using an Excel file.

Get the maximum value from the DataFrame

Once you have your values in the DataFrame, you can perform a large variety of operations. For example, you may calculate stats using Pandas.

For instance, let’s say that you want to find the maximum price among all the Cars within the DataFrame.

Obviously, you can derive this value just by looking at the dataset, but the method presented below would work for much larger datasets.

To get the maximum price for our Cars example, you’ll need to add the following portion to the Python code (and then print the results):

max1 = df['Price'].max()

Here is the complete Python code:

import pandas as pd

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

max1 = df['Price'].max()
print (max1)

Once you run the code, you’ll get the value of 35,000, which is indeed the maximum price!

You can check the Pandas documentation to learn more about creating a Pandas DataFrame.