How to Compare Values in two Pandas DataFrames

In this short guide, I’ll show you how to compare values in two Pandas DataFrames.

Steps to Compare Values in two Pandas DataFrames

Step 1: Prepare the datasets to be compared

To start, let’s say that you have the following two datasets that you want to compare:

First Dataset:

Product1Price1
Computer1200
Phone800
Printer200
Desk350

Second Dataset:

Product2Price2
Computer900
Phone800
Printer300
Desk350

The ultimate goal is to compare the prices (i.e., Price1 vs. Price2).

Step 2: Create the two DataFrames

Based on the above data, you can then create the following two DataFrames using this code:

from pandas import DataFrame

firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
                   'Price1': [1200,800,200,350]
                   }
df1 = DataFrame(firstProductSet,columns= ['Product1', 'Price1'])
print(df1)

secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
                    'Price2': [900,800,300,350]
                    }
df2 = DataFrame(secondProductSet,columns= ['Product2', 'Price2'])
print (df2)

Run the code in Python and you’ll get the two DataFrames:

Pandas DataFrames

Step 3: Compare the Values

In this step, you’ll need to import the numpy package to perform the comparison.

You can then use this template to perform the comparison:

df1['new column that will store the comparison results'] = np.where(condition,'value if true','value if false')

For our example, here is the syntax that you can add in order to compare the prices (i.e., Price1 vs. Price2) under the two DataFrames:

df1['pricesMatch?'] = np.where(df1.Price1 == df2.Price2, 'True', 'False')

You’ll notice that a new column (i.e., the ‘pricesMatch?’ column) will be created under the first DataFrame (i.e., df1). This new column will display the comparison results based on the following rules:

  • If Price1 is equal to Price2, then assign the value of True
  • Otherwise, assign the value of False.

Here is the complete Python code that you can use to compare the prices from the two DataFrames:

from pandas import DataFrame
import numpy as np

firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
                   'Price1': [1200,800,200,350]
                   }
df1 = DataFrame(firstProductSet,columns= ['Product1', 'Price1'])


secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
                    'Price2': [900,800,300,350]
                    }
df2 = DataFrame(secondProductSet,columns= ['Product2', 'Price2'])


df1['Price2'] = df2['Price2'] #add the Price2 column from df2 to df1

df1['pricesMatch?'] = np.where(df1.Price1 == df2.Price2, 'True', 'False')  #create new column in df1 to check if prices match
print (df1)

Run the code and you’ll get the following price comparison:

Compare Values in two Pandas DataFrames

Note that in the above code, the Price2 column from the second DataFrame was also added to the first DataFrame in order to get a better view when comparing the prices.

Now what if you want to find the actual differences between the two prices?

Price1 – Price2

In that case, you may add the following syntax to your code:

df1['priceDiff?'] = np.where(df1.Price1 == df2.Price2, '0', df1.Price1 - df2.Price2) #create new column in df1 for price diff 

So the complete Python code would look like this:

from pandas import DataFrame
import numpy as np

firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
                   'Price1': [1200,800,200,350]
                   }
df1 = DataFrame(firstProductSet,columns= ['Product1', 'Price1'])


secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
                    'Price2': [900,800,300,350]
                    }
df2 = DataFrame(secondProductSet,columns= ['Product2', 'Price2'])


df1['Price2'] = df2['Price2'] #add the Price2 column from df2 to df1

df1['pricesMatch?'] = np.where(df1.Price1 == df2.Price2, 'True', 'False')  #create new column in df1 to check if prices match
df1['priceDiff?'] = np.where(df1.Price1 == df2.Price2, '0', df1.Price1 - df2.Price2) #create new column in df1 for price diff 
print (df1)

Once you run the code, you’ll get the actual differences between the prices:

Compare Values Pandas DataFrames

Final note when comparing DataFrames

When you compare two DataFrames, you must ensure that the number of records in the first DataFrame matches with the number of records in the second DataFrame. In our example, each of the two DataFrames had 4 records, with 4 products and 4 prices.

If, for example, one of the DataFrames had 5 products, while the other DataFrame had 4 products, and you tried to run the comparison, you would get the following error:

ValueError: Can only compare identically-labeled Series objects