How to Create Pandas DataFrame in Python

In this short guide, you’ll see two different methods to create Pandas DataFrame:

  • By typing the values in Python itself to create the DataFrame
  • By importing the values from a file (such as a CSV file), and then creating the DataFrame in Python based on the values imported

Method 1: typing values in Python to create Pandas DataFrame

To create Pandas DataFrame in Python, you can follow this generic template:

import pandas as pd

data = {'first_column':  ['first_value', 'second_value', ...],
        'second_column': ['first_value', 'second_value', ...],
         ....
        }

df = pd.DataFrame(data)

print (df)

Note that you don’t need to use quotes around numeric values (unless you wish to capture those values as strings).

Now let’s see how to apply the above template using a simple example.

To start, let’s say that you have the following data about products, and that you want to capture that data in Python using Pandas DataFrame:

product_name price
laptop 1200
printer 150
tablet 300
desk 450
chair 200

You may then use the code below in order to create the DataFrame for our example:

import pandas as pd

data = {'product_name': ['laptop', 'printer', 'tablet', 'desk', 'chair'],
        'price': [1200, 150, 300, 450, 200]
        }

df = pd.DataFrame(data)

print (df)

Run the code in Python, and you’ll get the following DataFrame:

  product_name  price
0       laptop   1200
1      printer    150
2       tablet    300
3         desk    450
4        chair    200

You may have noticed that each row is represented by a number (also known as the index) starting from 0. Alternatively, you may assign another value/name to represent each row.

For example, in the code below, the index=[‘product_1′,’product_2′,’product_3′,’product_4′,’product_5’] was added:

import pandas as pd

data = {'product_name': ['laptop', 'printer', 'tablet', 'desk', 'chair'],
        'price': [1200, 150, 300, 450, 200]
        }

df = pd.DataFrame(data, index=['product_1','product_2','product_3','product_4','product_5'])

print (df)

You’ll now see the newly assigned index (as highlighted in yellow):

          product_name  price
product_1       laptop   1200
product_2      printer    150
product_3       tablet    300
product_4         desk    450
product_5        chair    200

Let’s now review the second method of importing the values into Python to create the DataFrame.

Method 2: importing values from a CSV file to create Pandas DataFrame

You may use the following template to import a CSV file into Python in order to create your DataFrame:

import pandas as pd

data = pd.read_csv(r'Path where the CSV file is stored\File name.csv')
df = pd.DataFrame(data)

print (df)

Let’s say that you have the following data stored in a CSV file (where the CSV file name is ‘products’):

product_name price
laptop 1200
printer 150
tablet 300
desk 450
chair 200

In the Python code below, you’ll need to change the path name to reflect the location where the CSV file is stored on your computer.

For example, let’s suppose that the CSV file is stored under the following path:

 ‘C:\Users\Ron\Desktop\products.csv’

Here is the full Python code for our example:

import pandas as pd

data = pd.read_csv(r'C:\Users\Ron\Desktop\products.csv')
df = pd.DataFrame(data)

print (df)

As before, you’ll get the same Pandas DataFrame in Python:

  product_name  price
0       laptop   1200
1      printer    150
2       tablet    300
3         desk    450
4        chair    200

You can also create the same DataFrame by importing an Excel file into Python using Pandas.

Find the maximum value in the DataFrame

Once you have your values in the DataFrame, you can perform a large variety of operations. For example, you may calculate stats using Pandas.

For instance, let’s say that you want to find the maximum price among all the products within the DataFrame.

Obviously, you can derive this value just by looking at the dataset, but the method presented below would work for much larger datasets.

To get the maximum price for our example, you’ll need to add the following portion to the Python code (and then print the results):

max_price = df['price'].max()

Here is the complete Python code:

import pandas as pd

data = {'product_name': ['laptop', 'printer', 'tablet', 'desk', 'chair'],
        'price': [1200, 150, 300, 450, 200]
        }

df = pd.DataFrame(data)

max_price = df['price'].max()
print (max_price)

Once you run the code, you’ll get the value of 1200, which is indeed the maximum price:

1200

You may check the Pandas Documentation to learn more about creating a DataFrame.