How to Union pandas DataFrames using Concat
In this tutorial, you will learn how to union/stack two or more DataFrames.
TLDR solution
df = pd.concat([df1, df2, ...])
Step-by-Step Example
Suppose, you have the following two DataFrames on fishes:
import pandas as pd
data1 = {'fish': ['salmon', 'pufferfish', 'shark'],
'count': [100, 10, 1],
'boat_id': [0, 1, 2]
}
data2 = {'fish': ['salmon', 'pufferfish'],
'count': [50, 10]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
print(f'df1:\n{df1}\n')
print(f'df2:\n{df2}')
df1:
fish count boat_id
0 salmon 100 0
1 pufferfish 10 1
2 shark 1 1
df2:
fish count
0 salmon 50
1 pufferfish 10
Union Two DataFrames
You can then union two DataFrames using the concat method:
df = pd.concat([df1, df2])
print(df)
fish count boat_id
0 salmon 100 0.0
1 pufferfish 10 1.0
2 shark 1 1.0
0 salmon 50 NaN
1 pufferfish 10 NaN
Clean the Data
Suppose, you know that df2 is data from the boat 2.
Let's fill the missing boat_id also drop duplicate rows and reset the index:
df['boat_id'] = df['boat_id'].fillna(2).astype('int')
df = df.drop_duplicates()
df = df.reset_index(drop=True)
print(df)
fish count boat_id
0 salmon 100 0
1 pufferfish 10 1
2 shark 1 1
3 salmon 50 2
4 pufferfish 10 2
That's it! You just learned how to union pandas DataFrames.