How to Compare Values in two Pandas DataFrames

In this short guide, I’ll show you how to compare values in two Pandas DataFrames. I’ll also review how to compare values from two imported files.

Steps to Compare Values in two Pandas DataFrames

Step 1: Prepare the datasets to be compared

To start, let’s say that you have the following two datasets that you want to compare:

First Dataset:

Product1Price1
Computer1200
Phone800
Printer200
Desk350

Second Dataset:

Product2Price2
Computer900
Phone800
Printer300
Desk350

The ultimate goal is to compare the prices (i.e., Price1 vs. Price2).

Step 2: Create the two DataFrames

Based on the above data, you can then create the following two DataFrames using this code:

import pandas as pd

firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
                   'Price1': [1200,800,200,350]
                   }
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])
print(df1)

secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
                    'Price2': [900,800,300,350]
                    }
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])
print (df2)

Run the code in Python, and you’ll get these two DataFrames:

Pandas DataFrames

Step 3: Compare the values

In this step, you’ll need to import the numpy package.

You can then use this template to perform the comparison:

df1['new column that will contain the comparison results'] = np.where(condition,'value if true','value if false')

For our example, here is the syntax that you can add in order to compare the prices (i.e., Price1 vs. Price2) under the two DataFrames:

df1['pricesMatch?'] = np.where(df1['Price1'] == df2['Price2'], 'True', 'False')

You’ll notice that a new column (i.e., the ‘pricesMatch?’ column) will be created under the first DataFrame (i.e., df1). This new column will contain the comparison results based on the following rules:

  • If Price1 is equal to Price2, then assign the value of True
  • Otherwise, assign the value of False.

Here is the complete Python code that you can use to compare the prices from the two DataFrames:

import pandas as pd
import numpy as np

firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
                   'Price1': [1200,800,200,350]
                   }
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])


secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
                    'Price2': [900,800,300,350]
                    }
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])


df1['Price2'] = df2['Price2'] #add the Price2 column from df2 to df1

df1['pricesMatch?'] = np.where(df1['Price1'] == df2['Price2'], 'True', 'False')  #create new column in df1 to check if prices match
print (df1)

Run the code, and you’ll get the following price comparison:

Compare Values in two Pandas DataFrames

Note that in the above code, the Price2 column from the second DataFrame was also added to the first DataFrame in order to get a better view when comparing the prices.

Now what if you want to find the actual differences between the two prices?

Price1 – Price2

In that case, you may add the following syntax to your code:

df1['priceDiff?'] = np.where(df1['Price1'] == df2['Price2'], 0, df1['Price1'] - df2['Price2'])

So the complete Python code would look like this:

import pandas as pd
import numpy as np

firstProductSet = {'Product1': ['Computer','Phone','Printer','Desk'],
                   'Price1': [1200,800,200,350]
                   }
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])


secondProductSet = {'Product2': ['Computer','Phone','Printer','Desk'],
                    'Price2': [900,800,300,350]
                    }
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])


df1['Price2'] = df2['Price2'] #add the Price2 column from df2 to df1

df1['pricesMatch?'] = np.where(df1['Price1'] == df2['Price2'], 'True', 'False')  #create new column in df1 to check if prices match
df1['priceDiff?'] = np.where(df1['Price1'] == df2['Price2'], 0, df1['Price1'] - df2['Price2']) #create new column in df1 for price diff 
print (df1)

Once you run the code, you’ll get the actual differences between the prices:

Compare Values Pandas DataFrames

Compare Values from two Imported Files

Lastly, you’ll see how to compare values from two imported files.

Let’s say that you have the following data stored in a CSV file called File_1:

While you have the data below stored in a second CSV file called File_2:

You can then import the above files into Python. In order to do so, you’ll need to specify the paths where the CSV files are stored on your computer. In my case, I stored:

  • File_1 under this path: C:\Users\Ron\Desktop\Test\File_1.csv
  • File_2 under this path: C:\Users\Ron\Desktop\Test\File_2.csv

Once you imported the CSV files into Python, you’ll be able to assign each file into a DataFrame, where:

  • File_1 will be assigned to df1
  • File_2 will be assigned to df2

As before, the goal is to compare the prices (i.e., Price1 vs. Price2).

So here is the complete Python code to compare the values from the two imported files:

import pandas as pd
import numpy as np

firstProductSet = pd.read_csv(r'C:\Users\Ron\Desktop\Test\File_1.csv')
df1 = pd.DataFrame(firstProductSet,columns= ['Product1', 'Price1'])

secondProductSet = pd.read_csv(r'C:\Users\Ron\Desktop\Test\File_2.csv') 
df2 = pd.DataFrame(secondProductSet,columns= ['Product2', 'Price2'])

df1['Price2'] = df2['Price2']
df1['pricesMatch?'] = np.where(df1['Price1'] == df2['Price2'], 'True', 'False')
df1['priceDiff?'] = np.where(df1['Price1'] == df2['Price2'], 0, df1['Price1'] - df2['Price2'])

print (df1)

Once you run the code in Python (adjusted to your paths), you’ll get the differences between the prices:

Final note when comparing DataFrames

When you compare two DataFrames, you must ensure that the number of records in the first DataFrame matches with the number of records in the second DataFrame. In our example, each of the two DataFrames had 4 records, with 4 products and 4 prices.

If, for example, one of the DataFrames had 5 products, while the other DataFrame had 4 products, and you tried to run the comparison, you would get the following error:

ValueError: Can only compare identically-labeled Series objects