Two Ways to Apply an If-Condition on a pandas DataFrame
In this tutorial, you will learn two ways to apply an if condition a DataFrame.
TLDR solution
df.loc[df['column'] == condition_value, 'target_column' ] = then_value
df['target_column'] = df['column'].apply(lambda x: then_value if x == condition_value)
Method 1: Use the .loc Attribute
What df.loc[condition] does: show me all rows where condition is true.
By also specifying a target_column and then_value, you can create/overwrite (if column already exists) a column that holds a specific value when the condition is met.
Let's say, you have a data on caught fish.
- A recount has shown the number of caught pufferfish is actually 10 and not 5.
- You are asked to create system that flags rows where the number is greater or equal to 100.
A solution would need to do the following:
- If a row has "pufferfish" as value in column
fish, then setcaught_countto 10. - If a row has a
caught_countgreater or equal 100, then create a column calledge_100and set its row value to True.
The following code achieves this:
if_then.py
import pandas as pd
data = {'fish': ['salmon', 'pufferfish', 'shark'],
'caught_count': [100, 5, 0]
}
df = pd.DataFrame(data)
print(f"Before:\n{df}")
# Possible operators: equal ==, not equal !=,
# greater >, greater or equal >=, less <, less or equal <=,
# condition_value, then_value can be of any type
df.loc[df['fish'] == "pufferfish", 'caught_count'] = 10
df.loc[df['caught_count'] >= 100, 'ge_100'] = True
df.loc[df['caught_count'] < 100, 'ge_100'] = False
print(f"\nAfter:\n{df}")
Note that equality conditions need to have double equal signs, since you are checking for equality, and not assigning (single equal sign) a value. The output:
Before:
fish caught_count
0 salmon 100
1 pufferfish 5
2 shark 0
After:
fish caught_count ge_100
0 salmon 100 True
1 pufferfish 10 False
2 shark 0 False
Method 2: Apply a lambda Function
You can achieve the same by applying a lambda function instead:
if_then.py
import pandas as pd
data = {'fish': ['salmon', 'pufferfish', 'shark'],
'caught_count': [100, 5, 0]
}
df = pd.DataFrame(data)
df['caught_count'] = df['fish'].apply(lambda x: 10 if x == "pufferfish")
df['ge_100'] = df['caught_count'].apply(lambda x: True if x >= 100 else False)
That's it! You just learned to apply an if condition on a DataFrame.