Two Ways to Apply an If-Condition on a pandas DataFrame

In this tutorial, you will learn two ways to apply an if condition a DataFrame.

TLDR solution

df.loc[df['column'] == condition_value, 'target_column' ] = then_value

df['target_column'] = df['column'].apply(lambda x: then_value if x == condition_value)

Method 1: Use the .loc Attribute

What df.loc[condition] does: show me all rows where condition is true. By also specifying a target_column and then_value, you can create/overwrite (if column already exists) a column that holds a specific value when the condition is met.

Let's say, you have a data on caught fish.

  • A recount has shown the number of caught pufferfish is actually 10 and not 5.
  • You are asked to create system that flags rows where the number is greater or equal to 100.

A solution would need to do the following:

  • If a row has "pufferfish" as value in column fish, then set caught_count to 10.
  • If a row has a caught_count greater or equal 100, then create a column called ge_100 and set its row value to True.

The following code achieves this:

if_then.py
import pandas as pd

data = {'fish': ['salmon', 'pufferfish', 'shark'],
        'caught_count': [100, 5, 0]
        }

df = pd.DataFrame(data)

print(f"Before:\n{df}")

# Possible operators: equal ==, not equal !=,
# greater >, greater or equal >=, less <, less or equal <=, 
# condition_value, then_value can be of any type
df.loc[df['fish'] == "pufferfish", 'caught_count'] = 10
df.loc[df['caught_count'] >= 100, 'ge_100'] = True
df.loc[df['caught_count'] < 100, 'ge_100'] = False

print(f"\nAfter:\n{df}")

Note that equality conditions need to have double equal signs, since you are checking for equality, and not assigning (single equal sign) a value. The output:

Before:
         fish  caught_count
0      salmon           100
1  pufferfish             5
2       shark             0

After:
         fish  caught_count ge_100
0      salmon           100   True
1  pufferfish            10  False
2       shark             0  False

Method 2: Apply a lambda Function

You can achieve the same by applying a lambda function instead:

if_then.py
import pandas as pd

data = {'fish': ['salmon', 'pufferfish', 'shark'],
        'caught_count': [100, 5, 0]
        }

df = pd.DataFrame(data)

df['caught_count'] = df['fish'].apply(lambda x: 10 if x == "pufferfish")
df['ge_100'] = df['caught_count'].apply(lambda x: True if x >= 100 else False)

That's it! You just learned to apply an if condition on a DataFrame.