In this short guide, you’ll see how to compare values between two Pandas DataFrames.
The Steps
Step 1: Create two DataFrames
To start, create the two DataFrames below. Note that your two DataFrames must have the same number of rows before making the comparison.
import pandas as pd
# Create the first DataFrame
data_1 = {
"product_1": ["computer", "monitor", "printer", "desk"],
"price_1": [1200, 800, 200, 350],
}
df1 = pd.DataFrame(data_1)
print(df1)
# Create the second DataFrame
data_2 = {
"product_2": ["computer", "monitor", "printer", "desk"],
"price_2": [900, 800, 300, 350],
}
df2 = pd.DataFrame(data_2)
print(df2)
Run the code in Python, and you’ll get these two DataFrames:
product_1 price_1
0 computer 1200
1 monitor 800
2 printer 200
3 desk 350
product_2 price_2
0 computer 900
1 monitor 800
2 printer 300
3 desk 350
Step 2: Compare the values between the two Pandas DataFrames
You can use the NumPy package to compare the two DataFrames. Here is a general template:
df1["new comparison column"] = np.where(condition, "value if true", "value if false")
For example, to compare the prices (price_1 vs. price_2) between the two DataFrames created:
df1["prices_match"] = np.where(df1["price_1"] == df2["price_2"], "Yes", "No")
The complete code:
import pandas as pd
import numpy as np
# Create the first DataFrame
data_1 = {
"product_1": ["computer", "monitor", "printer", "desk"],
"price_1": [1200, 800, 200, 350],
}
df1 = pd.DataFrame(data_1)
# Create the second DataFrame
data_2 = {
"product_2": ["computer", "monitor", "printer", "desk"],
"price_2": [900, 800, 300, 350],
}
df2 = pd.DataFrame(data_2)
# Adding the price_2 column to df1 to get a better view when comparing prices
df1["price_2"] = df2["price_2"]
# The new prices_match column to be created in df1
df1["prices_match"] = np.where(df1["price_1"] == df2["price_2"], "Yes", "No")
print(df1)
Run the code, and you’ll get the following price comparison:
product_1 price_1 price_2 prices_match
0 computer 1200 900 No
1 monitor 800 800 Yes
2 printer 200 300 No
3 desk 350 350 Yes
Now what if you want to find the actual differences between the two prices?
(price_1) – (price_2)
In that case, add the following line to your code:
df1["price_diff"] = np.where(df1["price_1"] == df2["price_2"], 0, df1["price_1"] - df2["price_2"])
So the complete Python code would look like this:
import pandas as pd
import numpy as np
# Create the first DataFrame
data_1 = {
"product_1": ["computer", "monitor", "printer", "desk"],
"price_1": [1200, 800, 200, 350],
}
df1 = pd.DataFrame(data_1)
# Create the second DataFrame
data_2 = {
"product_2": ["computer", "monitor", "printer", "desk"],
"price_2": [900, 800, 300, 350],
}
df2 = pd.DataFrame(data_2)
# Adding the price_2 column to df1 to get a better view when comparing prices
df1["price_2"] = df2["price_2"]
# The new prices_match comparison column to be created in df1
df1["prices_match"] = np.where(df1["price_1"] == df2["price_2"], "Yes", "No")
# The new price_diff comparison column to be created in df1
df1["price_diff"] = np.where(df1["price_1"] == df2["price_2"], 0, df1["price_1"] - df2["price_2"])
print(df1)
Once you run the code, you’ll get the actual differences between the prices:
product_1 price_1 price_2 prices_match price_diff
0 computer 1200 900 No 300
1 monitor 800 800 Yes 0
2 printer 200 300 No -100
3 desk 350 350 Yes 0