How to Join pandas DataFrames using Merge

In this tutorial, you will learn how to join/merge two DataFrames.

TLDR solution

df = df1.merge(right=df2, how='inner', on=['column_x'])

Examples

Suppose, you have the following two DataFrames on fishes:

import pandas as pd

data1 = {'fish': ['salmon', 'pufferfish', 'shark'],
        'count': [100, 10, 1],
        'boat_id': [0, 1, 2]
        }
data2 = {'fish': ['salmon', 'pufferfish', 'mackerel'],
        'avg_length_cm': [50.5, 10.1, 20.4]
        }

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

print(f'df1:\n{df1}\n')
print(f'df2:\n{df2}')
df1:
         fish  count  boat_id
0      salmon    100        0
1  pufferfish     10        1
2       shark      1        2

df2:
         fish  avg_length_cm
0      salmon           50.5
1  pufferfish           10.1
2    mackerel           20.4

Note that both DataFrames have a fish column. However, only two values overlap: salmon and pufferfish.

You can then join two DataFrames using the merge method. Specify the how and on options to set the join type and the column to join on, respectively.

Inner Join Two DataFrames

If you only want to keep rows with fish that appear in both DataFrames, perform an inner join:

df = df1.merge(df2, how='inner', on='fish')

print(df)
         fish  count  boat_id  avg_length_cm
0      salmon    100        0           50.5
1  pufferfish     10        1           10.1

Left Join Two DataFrames

If you want to keep all rows of the first/left DataFrame, do a left join:

df = df1.merge(df2, how='left', on='fish')

print(df)
         fish  count  boat_id  avg_length_cm
0      salmon    100        0           50.5
1  pufferfish     10        1           10.1
2       shark      1        2            NaN

Rigth Join Two DataFrames

If you want to keep all rows of the second/right DataFrame, do a right join:

df = df1.merge(df2, how='right', on='fish')

print(df)
         fish  count  boat_id  avg_length_cm
0      salmon  100.0      0.0           50.5
1  pufferfish   10.0      1.0           10.1
2    mackerel    NaN      NaN           20.4

Outer Join Two DataFrames

If you want to keep all rows of both DataFrames, do an outer join:

df = df1.merge(df2, how='outer', on='fish')

print(df)
         fish  count  boat_id  avg_length_cm
0      salmon  100.0      0.0           50.5
1  pufferfish   10.0      1.0           10.1
2       shark    1.0      2.0            NaN
3    mackerel    NaN      NaN           20.4

That's it! You just learned how to merge two pandas DataFrames.