To import an Excel file into Python using Pandas:
import pandas as pd df = pd.read_excel(r"Path where the Excel file is stored\File name.xlsx") print(df)
And if you have a specific Excel sheet that you’d like to import, you may then apply:
import pandas as pd df = pd.read_excel(r"Path of Excel file\File name.xlsx", sheet_name="your Excel sheet name") print(df)
Steps to Import an Excel File
Step 1: Install the required packages
If you haven’t already done so, install the Pandas and Openpyxl packages.
To install Pandas use:
pip install pandas
To install Openpyxl use:
pip install openpyxl
Step 2: Save the data in an Excel file
Suppose that you have the following table stored in an Excel file:
product_name | price |
---|---|
computer | 700 |
tablet | 250 |
printer | 120 |
laptop | 1200 |
keyboard | 100 |
Save the Excel file as “my_products” (where the Excel file extension is .xlsx).
Step 3: Capture the file path
Capture the full path where the Excel file is stored on your computer.
For our example:
C:\Users\Ron\Desktop\my_products.xlsx
Where:
- my_products is the file name
- .xlsx is the Excel file extension
In the Python code below, you’ll need to modify the path name to reflect the location where the Excel file is stored on your computer.
Step 4: Apply the Python script
Here is the Python script for our example:
import pandas as pd df = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx") print(df)
Note that you should place “r” before the path string to address special characters, such as ‘\’, in the path.
Step 5: Run the Python script
Run the Python script (adjusted to your path), and you’ll get the following dataset:
product_name price
0 computer 700
1 tablet 250
2 printer 120
3 laptop 1200
4 keyboard 100
Notice that you got the same results as those that were stored in the Excel file.
Optional Step: Selecting subset of columns
Now what if you want to select a specific column or columns from the Excel file?
For example, what if you want to select only the product_name column?
If that’s the case, you can specify this column name as captured below:
import pandas as pd data = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx") df = pd.DataFrame(data, columns=["product_name"]) print(df)
Run the script (after adjusting the file path), and you’ll get only the product_name column:
product_name
0 computer
1 tablet
2 printer
3 laptop
4 keyboard
You can specify additional columns by separating their names using a comma, so if you want to include both the product_name and the price columns, you can use this syntax:
import pandas as pd data = pd.read_excel(r"C:\Users\Ron\Desktop\my_products.xlsx") df = pd.DataFrame(data, columns=["product_name", "price"]) print(df)
You’ll need to make sure that the column names specified in the code exactly match with the column names within the Excel file. Otherwise, you’ll get NaN values.
Conclusion
You just saw how to import an Excel file into Python using Pandas.
At times, you may need to import a CSV file into Python. If that’s the case, you may want to check the following tutorial that explains how to import a CSV file into Python using Pandas.
You may also check the Pandas Documentation to find out more about the different options that you may apply in regards to read_excel.