How to Import an Excel File into Python using Pandas

To import an Excel file into Python using Pandas:

import pandas as pd

df = pd.read_excel(r"Path where the Excel file is stored\File name.xlsx")

print(df)

And if you have a specific Excel sheet that you’d like to import, you may then apply:

import pandas as pd

df = pd.read_excel(r"Path of Excel file\File name.xlsx", sheet_name="your Excel sheet name")

print(df)

Steps to Import an Excel File

Step 1: Install the required packages

If you haven’t already done so, install the Pandas and Openpyxl packages.

To install Pandas use:

pip install pandas

To install Openpyxl use:

pip install openpyxl

Step 2: Save the data in an Excel file

Suppose that you have the following table stored in an Excel file:

product_nameprice
computer700
tablet250
printer120
laptop1200
keyboard100

Save the Excel file as “my_products” (where the Excel file extension is .xlsx).

Step 3: Capture the file path

Capture the full path where the Excel file is stored on your computer.

For our example:

C:\Users\Ron\Desktop\my_products.xlsx

Where:

  • my_products is the file name
  • .xlsx is the Excel file extension

In the Python code below, you’ll need to modify the path name to reflect the location where the Excel file is stored on your computer.

Step 4: Apply the Python script

Here is the Python script for our example:

import pandas as pd

df = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx")

print(df)

Note that you should place “r” before the path string to address special characters, such as ‘\’, in the path.

Step 5: Run the Python script

Run the Python script (adjusted to your path), and you’ll get the following dataset:

  product_name  price
0     computer    700
1       tablet    250
2      printer    120
3       laptop   1200
4     keyboard    100

Notice that you got the same results as those that were stored in the Excel file.

Optional Step: Selecting subset of columns

Now what if you want to select a specific column or columns from the Excel file?

For example, what if you want to select only the product_name column?

If that’s the case, you can specify this column name as captured below:

import pandas as pd

data = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx") 

df = pd.DataFrame(data, columns=["product_name"])

print(df)

Run the script (after adjusting the file path), and you’ll get only the product_name column:

  product_name
0     computer
1       tablet
2      printer
3       laptop
4     keyboard

You can specify additional columns by separating their names using a comma, so if you want to include both the product_name and the price columns, you can use this syntax:

import pandas as pd

data = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx") 

df = pd.DataFrame(data, columns=["product_name", "price"])

print(df)

You’ll need to make sure that the column names specified in the code exactly match with the column names within the Excel file. Otherwise, you’ll get NaN values.

Conclusion

You just saw how to import an Excel file into Python using Pandas.

At times, you may need to import a CSV file into Python. If that’s the case, you may want to check the following tutorial that explains how to import a CSV file into Python using Pandas.

You may also check the Pandas Documentation to find out more about the different options that you may apply in regards to read_excel.

Leave a Comment