How to Compare Values between two Pandas DataFrames

In this short guide, you’ll see how to compare values between two Pandas DataFrames. You’ll also observe how to compare values from two imported files.

Steps to Compare Values Between two Pandas DataFrames

Step 1: Prepare the datasets to be compared

To start, let’s say that you have the following two datasets that you want to compare:

First Dataset:

product_1 price_1
computer 1200
monitor 800
printer 200
desk 350

Second Dataset:

product_2 price_2
computer 900
monitor 800
printer 300
desk 350

The ultimate goal is to compare the prices (i.e., price_1 vs. price_2).

Step 2: Create the two DataFrames

Based on the above data, you can then create the following two DataFrames using this code:

import pandas as pd

data_1 = {'product_1': ['computer','monitor','printer','desk'],
                   'price_1': [1200,800,200,350]
                   }
df1 = pd.DataFrame(data_1)
print(df1)

data_2 = {'product_2': ['computer','monitor','printer','desk'],
                    'price_2': [900,800,300,350]
                    }
df2 = pd.DataFrame(data_2)
print (df2)

Run the code in Python, and you’ll get these two DataFrames:

  product_1  price_1
0  computer     1200
1   monitor      800
2   printer      200
3      desk      350
  product_2  price_2
0  computer      900
1   monitor      800
2   printer      300
3      desk      350

Step 3: Compare the values between the two Pandas DataFrames

In this step, you’ll need to import the NumPy package.

You can then use this template to perform the comparison:

df1['new column that will contain the comparison results'] = np.where(condition,'value if true','value if false')

For our example, here is the syntax that you can add in order to compare the prices (i.e., price_1 vs. price_2) between the two DataFrames:

df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False')

You’ll notice that a new column (i.e., the ‘prices_match’ column) would be created under the first DataFrame (i.e., df1). This new column would contain the comparison results based on the following rules:

  • If price_1 is equal to price_2, then assign the value of True
  • Otherwise, assign the value of False

Here is the complete Python code that you can use to compare the prices between the two DataFrames:

import pandas as pd
import numpy as np

data_1 = {'product_1': ['computer','monitor','printer','desk'],
                   'price_1': [1200,800,200,350]
                   }
df1 = pd.DataFrame(data_1)


data_2 = {'product_2': ['computer','monitor','printer','desk'],
                    'price_2': [900,800,300,350]
                    }
df2 = pd.DataFrame(data_2)


df1['price_2'] = df2['price_2'] #add the price_2 column from df2 to df1
df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') #create a new column in df1 to check if prices match

print(df1)

Run the code, and you’ll get the following price comparison:

  product_1  price_1  price_2  prices_match
0  computer     1200      900         False
1   monitor      800      800          True
2   printer      200      300         False
3      desk      350      350          True

Note that in the above code, the price_2 column from the second DataFrame was also added to the first DataFrame in order to get a better view when comparing the prices.

Now what if you want to find the actual differences between the two prices?

(price_1) – (price_2)

In that case, you may add the following syntax to your code:

df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2'])

So the complete Python code would look like this:

import pandas as pd
import numpy as np

data_1 = {'product_1': ['computer','monitor','printer','desk'],
                   'price_1': [1200,800,200,350]
                   }
df1 = pd.DataFrame(data_1)


data_2 = {'product_2': ['computer','monitor','printer','desk'],
                    'price_2': [900,800,300,350]
                    }
df2 = pd.DataFrame(data_2)


df1['price_2'] = df2['price_2'] #add the price_2 column from df2 to df1
df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') #create a new column in df1 to check if prices match
df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2']) #create a new column in df1 for price diff

print(df1)

Once you run the code, you’ll get the actual differences between the prices:

  product_1  price_1  price_2  prices_match  price_diff
0  computer     1200      900         False         300
1   monitor      800      800          True           0
2   printer      200      300         False        -100
3      desk      350      350          True           0

Compare Values from two Imported Files

Lastly, you’ll see how to compare values from two imported files.

Let’s say that you have the following data stored in a CSV file called file_1:

product_1 price_1
computer 1200
monitor 800
printer 200
desk 350

While you have the data below stored in a second CSV file called file_2:

product_2 price_2
computer 900
monitor 800
printer 300
desk 350

You can then import the above files into Python. In order to do so, you’ll need to specify the paths where the CSV files are stored on your computer. For demonstration purposes, let’s assume that the paths of the files are:

  • file_1 is stored under this path: C:\Users\Ron\Desktop\Test\file_1.csv
  • file_2 is stored under this path: C:\Users\Ron\Desktop\Test\file_2.csv

Once you imported the CSV files into Python, you’ll be able to assign each file into a DataFrame, where:

  • file_1 will be assigned to df1
  • file_2 will be assigned to df2

As before, the goal is to compare the prices (i.e., price_1 vs. price_2).

So here is the complete Python code to compare the values from the two imported files:

import pandas as pd
import numpy as np

data_1 = pd.read_csv(r'C:\Users\Ron\Desktop\Test\file_1.csv')
df1 = pd.DataFrame(data_1)

data_2 = pd.read_csv(r'C:\Users\Ron\Desktop\Test\file_2.csv')
df2 = pd.DataFrame(data_2)

df1['price_2'] = df2['price_2']
df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False')
df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2'])

print(df1)

Once you run the code in Python (adjusted to your paths), you’ll get the differences between the prices:

  product_1  price_1  price_2  prices_match  price_diff
0  computer     1200      900         False         300
1   monitor      800      800          True           0
2   printer      200      300         False        -100
3      desk      350      350          True           0

Final note when comparing DataFrames

When you compare two DataFrames, you must ensure that the number of records in the first DataFrame matches with the number of records in the second DataFrame. In our example, each of the two DataFrames had 4 records, with 4 products and 4 prices.

If, for example, one of the DataFrames had 5 products, while the other DataFrame had 4 products, and you tried to run the comparison, you would then get the following error:

ValueError: Can only compare identically-labeled Series objects