How to Join pandas DataFrames using Merge
In this tutorial, you will learn how to join/merge two DataFrames.
TLDR solution
df = df1.merge(right=df2, how='inner', on=['column_x'])
Examples
Suppose, you have the following two DataFrames on fishes:
import pandas as pd
data1 = {'fish': ['salmon', 'pufferfish', 'shark'],
'count': [100, 10, 1],
'boat_id': [0, 1, 2]
}
data2 = {'fish': ['salmon', 'pufferfish', 'mackerel'],
'avg_length_cm': [50.5, 10.1, 20.4]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
print(f'df1:\n{df1}\n')
print(f'df2:\n{df2}')
df1:
fish count boat_id
0 salmon 100 0
1 pufferfish 10 1
2 shark 1 2
df2:
fish avg_length_cm
0 salmon 50.5
1 pufferfish 10.1
2 mackerel 20.4
Note that both DataFrames have a fish column. However, only two values overlap: salmon and pufferfish.
You can then join two DataFrames using the merge method.
Specify the how and on options to set the join type and the column to join on, respectively.
Inner Join Two DataFrames
If you only want to keep rows with fish that appear in both DataFrames, perform an inner join:
df = df1.merge(df2, how='inner', on='fish')
print(df)
fish count boat_id avg_length_cm
0 salmon 100 0 50.5
1 pufferfish 10 1 10.1
Left Join Two DataFrames
If you want to keep all rows of the first/left DataFrame, do a left join:
df = df1.merge(df2, how='left', on='fish')
print(df)
fish count boat_id avg_length_cm
0 salmon 100 0 50.5
1 pufferfish 10 1 10.1
2 shark 1 2 NaN
Rigth Join Two DataFrames
If you want to keep all rows of the second/right DataFrame, do a right join:
df = df1.merge(df2, how='right', on='fish')
print(df)
fish count boat_id avg_length_cm
0 salmon 100.0 0.0 50.5
1 pufferfish 10.0 1.0 10.1
2 mackerel NaN NaN 20.4
Outer Join Two DataFrames
If you want to keep all rows of both DataFrames, do an outer join:
df = df1.merge(df2, how='outer', on='fish')
print(df)
fish count boat_id avg_length_cm
0 salmon 100.0 0.0 50.5
1 pufferfish 10.0 1.0 10.1
2 shark 1.0 2.0 NaN
3 mackerel NaN NaN 20.4
That's it! You just learned how to merge two pandas DataFrames.