Here are 4 ways to randomly select rows from Pandas DataFrame:
(1) Randomly select a single row:
df = df.sample()
(2) Randomly select a specified number of rows. For example, to select 3 random rows, set n=3:
df = df.sample(n=3)
(3) Allow a random selection of the same row more than once (by setting replace=True):
df = df.sample(n=3, replace=True)
(4) Randomly select a specified fraction of the total number of rows. For example, if you have 8 rows, and you set frac=0.50, then you’ll get a random selection of 50% of the total rows, meaning that 4 rows will be selected:
df = df.sample(frac=0.50)
Let’s now see how to apply each of the above scenarios in practice
The Example
To start with a simple example, let’s create a DataFrame with 8 rows:
import pandas as pd
data = {
"Product": ["ABC", "DDD", "XYZ", "AAA", "CCC", "PPP", "NNN", "RRR"],
"Price": [630, 790, 250, 370, 880, 1250, 550, 700],
"Discount": ["No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes"],
}
df = pd.DataFrame(data)
print(df)
Run the code in Python, and you’ll get the following DataFrame:
Product Price Discount
0 ABC 630 No
1 DDD 790 Yes
2 XYZ 250 No
3 AAA 370 Yes
4 CCC 880 Yes
5 PPP 1250 No
6 NNN 550 No
7 RRR 700 Yes
The goal is to randomly select rows from the above DataFrame across the 4 scenarios below.
4 Scenarios to Randomly Select Rows from a DataFrame
Scenario 1: randomly select a single row
To randomly select a single row, simply add df = df.sample() to the code:
import pandas as pd
data = {
"Product": ["ABC", "DDD", "XYZ", "AAA", "CCC", "PPP", "NNN", "RRR"],
"Price": [630, 790, 250, 370, 880, 1250, 550, 700],
"Discount": ["No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes"],
}
df = pd.DataFrame(data)
df = df.sample()
print(df)
As you can see, a single row was randomly selected:
Product Price Discount
4 CCC 880 Yes
Scenario 2: randomly select a specified number of rows
Let’s now randomly select 3 rows by setting n=3:
import pandas as pd
data = {
"Product": ["ABC", "DDD", "XYZ", "AAA", "CCC", "PPP", "NNN", "RRR"],
"Price": [630, 790, 250, 370, 880, 1250, 550, 700],
"Discount": ["No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes"],
}
df = pd.DataFrame(data)
df = df.sample(n=3)
print(df)
You’ll now see 3 randomly selected rows:
Product Price Discount
1 DDD 790 Yes
6 NNN 550 No
2 XYZ 250 No
Scenario 3: allow a random selection of the same row more than once
You may set replace=True to allow a random selection of the same row more than once:
import pandas as pd
data = {
"Product": ["ABC", "DDD", "XYZ", "AAA", "CCC", "PPP", "NNN", "RRR"],
"Price": [630, 790, 250, 370, 880, 1250, 550, 700],
"Discount": ["No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes"],
}
df = pd.DataFrame(data)
df = df.sample(n=3, replace=True)
print(df)
As you can see, the fifth row (with an index of 4) was randomly selected more than once:
Product Price Discount
6 NNN 550 No
4 CCC 880 Yes
4 CCC 880 Yes
Note that setting replace=True doesn’t guarantee that you’ll get the random selection of the same row more than once.
Scenario 4: randomly select a specified fraction of the total number of rows
For the final scenario, let’s set frac=0.50 to get a random selection of 50% of the total rows:
import pandas as pd
data = {
"Product": ["ABC", "DDD", "XYZ", "AAA", "CCC", "PPP", "NNN", "RRR"],
"Price": [630, 790, 250, 370, 880, 1250, 550, 700],
"Discount": ["No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes"],
}
df = pd.DataFrame(data)
df = df.sample(frac=0.50)
print(df)
You’ll now see that 4 rows, out of the total of 8 rows in the DataFrame, were selected:
Product Price Discount
7 RRR 700 Yes
3 AAA 370 Yes
4 CCC 880 Yes
2 XYZ 250 No
You can read more about df.sample() by visiting the Pandas Documentation.
Alternatively, you can check the following guide to learn how to randomly select columns from Pandas DataFrame.