You can easily import an Excel file into Python using Pandas. In order to accomplish this goal, you’ll need to use read_excel:
import pandas as pd df = pd.read_excel(r'Path where the Excel file is stored\File name.xlsx') print(df)
Note that for an earlier version of Excel, you may need to use the file extension of ‘xls’
And if you have a specific Excel sheet that you’d like to import, you may then apply:
import pandas as pd df = pd.read_excel(r'Path of Excel file\File name.xlsx', sheet_name='your Excel sheet name') print(df)
Let’s now review an example that includes the data to be imported into Python.
The Data to be Imported into Python
Suppose that you have the following table stored in Excel (where the Excel file name is ‘products‘):
product_name | price |
computer | 700 |
tablet | 250 |
printer | 120 |
laptop | 1200 |
keyboard | 100 |
You may then follow the steps below to import the Excel file into Python.
Steps to Import an Excel File into Python using Pandas
Step 1: Capture the file path
First, capture the full path where the Excel file is stored on your computer.
For example, let’s suppose that an Excel file is stored under the following path:
C:\Users\Ron\Desktop\products.xlsx
In the Python code below, you’ll need to modify the path name to reflect the location where the Excel file is stored on your computer.
Don’t forget to include the file name (in our example, it’s ‘products‘ as highlighted in blue). You’ll also need to include the Excel file extension (in our case, it’s ‘.xlsx‘ as highlighted in green).
Step 2: Apply the Python code
Here is the Python code for our example:
import pandas as pd df = pd.read_excel(r'C:\Users\Ron\Desktop\products.xlsx') print(df)
Note that you should place “r” before the path string to address special characters, such as ‘\’. In addition, don’t forget to put the file name at the end of the path + ‘.xlsx’
Step 3: Run the Python code to import the Excel file
Run the Python code (adjusted to your path), and you’ll get the following dataset:
product_name price
0 computer 700
1 tablet 250
2 printer 120
3 laptop 1200
4 keyboard 100
Notice that you got the same results as those that were stored in the Excel file.
Note: you’ll have to install an additional package if you get the following error when running the code:
ImportError: Missing optional dependency ‘xlrd’
You may then use the PIP install approach to install openpyxl for .xlsx files:
pip install openpyxl
Optional Step: Selecting subset of columns
Now what if you want to select a specific column or columns from the Excel file?
For example, what if you want to select only the product_name column? If that’s the case, you can specify this column name as captured below:
import pandas as pd data = pd.read_excel(r'C:\Users\Ron\Desktop\products.xlsx') df = pd.DataFrame(data, columns=['product_name']) print(df)
Run the code (after adjusting the file path), and you’ll get only the product_name column:
product_name
0 computer
1 tablet
2 printer
3 laptop
4 keyboard
You can specify additional columns by separating their names using a comma, so if you want to include both the product_name and price columns, you can use this syntax:
import pandas as pd data = pd.read_excel(r'C:\Users\Ron\Desktop\products.xlsx') df = pd.DataFrame(data, columns=['product_name', 'price']) print(df)
You’ll need to make sure that the column names specified in the code exactly match with the column names within the Excel file. Otherwise, you’ll get NaN values.
Conclusion
You just saw how to import an Excel file into Python using Pandas.
At times, you may need to import a CSV file into Python. If that’s the case, you may want to check the following tutorial that explains how to import a CSV file into Python using Pandas.
You may also check the Pandas Documentation to find out more about the different options that you may apply in regards to read_excel.