Compare Values between two Pandas DataFrames

In this short guide, you’ll see how to compare values between two Pandas DataFrames.

The Steps

Step 1: Create two DataFrames

To start, create the two DataFrames below. Note that your two DataFrames must have the same number of rows before making the comparison.

import pandas as pd

# Create the first DataFrame
data_1 = {
"product_1": ["computer", "monitor", "printer", "desk"],
"price_1": [1200, 800, 200, 350],
}
df1 = pd.DataFrame(data_1)
print(df1)

# Create the second DataFrame
data_2 = {
"product_2": ["computer", "monitor", "printer", "desk"],
"price_2": [900, 800, 300, 350],
}
df2 = pd.DataFrame(data_2)
print(df2)

Run the code in Python, and you’ll get these two DataFrames:

  product_1  price_1
0  computer     1200
1   monitor      800
2   printer      200
3      desk      350
  product_2  price_2
0  computer      900
1   monitor      800
2   printer      300
3      desk      350

Step 2: Compare the values between the two Pandas DataFrames

You can use the NumPy package to compare the two DataFrames. Here is a general template:

df1["new comparison column"] = np.where(condition, "value if true", "value if false")

For example, to compare the prices (price_1 vs. price_2) between the two DataFrames created:

df1["prices_match"] = np.where(df1["price_1"] == df2["price_2"], "Yes", "No")

The complete code:

import pandas as pd
import numpy as np

# Create the first DataFrame
data_1 = {
"product_1": ["computer", "monitor", "printer", "desk"],
"price_1": [1200, 800, 200, 350],
}
df1 = pd.DataFrame(data_1)

# Create the second DataFrame
data_2 = {
"product_2": ["computer", "monitor", "printer", "desk"],
"price_2": [900, 800, 300, 350],
}
df2 = pd.DataFrame(data_2)

# Adding the price_2 column to df1 to get a better view when comparing prices
df1["price_2"] = df2["price_2"]

# The new prices_match column to be created in df1
df1["prices_match"] = np.where(df1["price_1"] == df2["price_2"], "Yes", "No")

print(df1)

Run the code, and you’ll get the following price comparison:

  product_1  price_1  price_2  prices_match
0  computer     1200      900            No
1   monitor      800      800           Yes
2   printer      200      300            No
3      desk      350      350           Yes

Now what if you want to find the actual differences between the two prices?

(price_1) – (price_2)

In that case, add the following line to your code:

df1["price_diff"] = np.where(df1["price_1"] == df2["price_2"], 0, df1["price_1"] - df2["price_2"])

So the complete Python code would look like this:

import pandas as pd
import numpy as np

# Create the first DataFrame
data_1 = {
"product_1": ["computer", "monitor", "printer", "desk"],
"price_1": [1200, 800, 200, 350],
}
df1 = pd.DataFrame(data_1)

# Create the second DataFrame
data_2 = {
"product_2": ["computer", "monitor", "printer", "desk"],
"price_2": [900, 800, 300, 350],
}
df2 = pd.DataFrame(data_2)

# Adding the price_2 column to df1 to get a better view when comparing prices
df1["price_2"] = df2["price_2"]

# The new prices_match comparison column to be created in df1
df1["prices_match"] = np.where(df1["price_1"] == df2["price_2"], "Yes", "No")

# The new price_diff comparison column to be created in df1
df1["price_diff"] = np.where(df1["price_1"] == df2["price_2"], 0, df1["price_1"] - df2["price_2"])

print(df1)

Once you run the code, you’ll get the actual differences between the prices:

  product_1  price_1  price_2  prices_match  price_diff
0  computer     1200      900            No         300
1   monitor      800      800           Yes           0
2   printer      200      300            No        -100
3      desk      350      350           Yes           0

Leave a Comment