In this short guide, you’ll see how to compare values between two Pandas DataFrames. You’ll also observe how to compare values from two imported files.
Steps to Compare Values Between two Pandas DataFrames
Step 1: Prepare the datasets to be compared
To start, let’s say that you have the following two datasets that you want to compare:
First Dataset:
product_1 | price_1 |
computer | 1200 |
monitor | 800 |
printer | 200 |
desk | 350 |
Second Dataset:
product_2 | price_2 |
computer | 900 |
monitor | 800 |
printer | 300 |
desk | 350 |
The ultimate goal is to compare the prices (i.e., price_1 vs. price_2).
Step 2: Create the two DataFrames
Based on the above data, you can then create the following two DataFrames using this code:
import pandas as pd data_1 = {'product_1': ['computer','monitor','printer','desk'], 'price_1': [1200,800,200,350] } df1 = pd.DataFrame(data_1) print(df1) data_2 = {'product_2': ['computer','monitor','printer','desk'], 'price_2': [900,800,300,350] } df2 = pd.DataFrame(data_2) print (df2)
Run the code in Python, and you’ll get these two DataFrames:
product_1 price_1
0 computer 1200
1 monitor 800
2 printer 200
3 desk 350
product_2 price_2
0 computer 900
1 monitor 800
2 printer 300
3 desk 350
Step 3: Compare the values between the two Pandas DataFrames
In this step, you’ll need to import the NumPy package.
You can then use this template to perform the comparison:
df1['new column that will contain the comparison results'] = np.where(condition,'value if true','value if false')
For our example, here is the syntax that you can add in order to compare the prices (i.e., price_1 vs. price_2) between the two DataFrames:
df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False')
You’ll notice that a new column (i.e., the ‘prices_match’ column) would be created under the first DataFrame (i.e., df1). This new column would contain the comparison results based on the following rules:
- If price_1 is equal to price_2, then assign the value of True
- Otherwise, assign the value of False
Here is the complete Python code that you can use to compare the prices between the two DataFrames:
import pandas as pd import numpy as np data_1 = {'product_1': ['computer','monitor','printer','desk'], 'price_1': [1200,800,200,350] } df1 = pd.DataFrame(data_1) data_2 = {'product_2': ['computer','monitor','printer','desk'], 'price_2': [900,800,300,350] } df2 = pd.DataFrame(data_2) df1['price_2'] = df2['price_2'] #add the price_2 column from df2 to df1 df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') #create a new column in df1 to check if prices match print(df1)
Run the code, and you’ll get the following price comparison:
product_1 price_1 price_2 prices_match
0 computer 1200 900 False
1 monitor 800 800 True
2 printer 200 300 False
3 desk 350 350 True
Note that in the above code, the price_2 column from the second DataFrame was also added to the first DataFrame in order to get a better view when comparing the prices.
Now what if you want to find the actual differences between the two prices?
(price_1) – (price_2)
In that case, you may add the following syntax to your code:
df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2'])
So the complete Python code would look like this:
import pandas as pd import numpy as np data_1 = {'product_1': ['computer','monitor','printer','desk'], 'price_1': [1200,800,200,350] } df1 = pd.DataFrame(data_1) data_2 = {'product_2': ['computer','monitor','printer','desk'], 'price_2': [900,800,300,350] } df2 = pd.DataFrame(data_2) df1['price_2'] = df2['price_2'] #add the price_2 column from df2 to df1 df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') #create a new column in df1 to check if prices match df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2']) #create a new column in df1 for price diff print(df1)
Once you run the code, you’ll get the actual differences between the prices:
product_1 price_1 price_2 prices_match price_diff
0 computer 1200 900 False 300
1 monitor 800 800 True 0
2 printer 200 300 False -100
3 desk 350 350 True 0
Compare Values from two Imported Files
Lastly, you’ll see how to compare values from two imported files.
Let’s say that you have the following data stored in a CSV file called file_1:
product_1 | price_1 |
computer | 1200 |
monitor | 800 |
printer | 200 |
desk | 350 |
While you have the data below stored in a second CSV file called file_2:
product_2 | price_2 |
computer | 900 |
monitor | 800 |
printer | 300 |
desk | 350 |
You can then import the above files into Python. In order to do so, you’ll need to specify the paths where the CSV files are stored on your computer. For demonstration purposes, let’s assume that the paths of the files are:
- file_1 is stored under this path: C:\Users\Ron\Desktop\Test\file_1.csv
- file_2 is stored under this path: C:\Users\Ron\Desktop\Test\file_2.csv
Once you imported the CSV files into Python, you’ll be able to assign each file into a DataFrame, where:
- file_1 will be assigned to df1
- file_2 will be assigned to df2
As before, the goal is to compare the prices (i.e., price_1 vs. price_2).
So here is the complete Python code to compare the values from the two imported files:
import pandas as pd import numpy as np data_1 = pd.read_csv(r'C:\Users\Ron\Desktop\Test\file_1.csv') df1 = pd.DataFrame(data_1) data_2 = pd.read_csv(r'C:\Users\Ron\Desktop\Test\file_2.csv') df2 = pd.DataFrame(data_2) df1['price_2'] = df2['price_2'] df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2']) print(df1)
Once you run the code in Python (adjusted to your paths), you’ll get the differences between the prices:
product_1 price_1 price_2 prices_match price_diff
0 computer 1200 900 False 300
1 monitor 800 800 True 0
2 printer 200 300 False -100
3 desk 350 350 True 0
Final note when comparing DataFrames
When you compare two DataFrames, you must ensure that the number of records in the first DataFrame matches with the number of records in the second DataFrame. In our example, each of the two DataFrames had 4 records, with 4 products and 4 prices.
If, for example, one of the DataFrames had 5 products, while the other DataFrame had 4 products, and you tried to run the comparison, you would then get the following error:
ValueError: Can only compare identically-labeled Series objects