4 Ways to Randomly Select Rows from Pandas DataFrame

Here are 4 ways to randomly select rows from Pandas DataFrame:

(1) Randomly select a single row:

df = df.sample()

(2) Randomly select a specified number of rows. For example, to select 3 random rows, set n=3:

df = df.sample(n=3)

(3) Allow a random selection of the same row more than once (by setting replace=True):

df = df.sample(n=3,replace=True)

(4) Randomly select a specified fraction of the total number of rows. For example, if you have 8 rows, and you set frac=0.50, then you’ll get a random selection of 50% of the total rows, meaning that 4 rows will be selected:

df = df.sample(frac=0.50)

Let’s now see how to apply each of the above scenarios in practice

The Example

To start with a simple example, let’s create a DataFrame with 8 rows:

import pandas as pd

data = {'Product': ['ABC','DDD','XYZ','AAA','CCC','PPP','NNN','RRR'],
          'Price': [630,790,250,370,880,1250,550,700],
       'Discount': ['No','Yes','No','Yes','Yes','No','No','Yes']
        }

df = pd.DataFrame(data, columns = ['Product','Price','Discount'])

print (df)

Run the code in Python, and you’ll get the following DataFrame:

Example of a dataset

The goal is to randomly select rows from the above DataFrame across the 4 scenarios below.

4 Scenarios to Randomly Select Rows from Pandas DataFrame

Scenario 1: randomly select a single row

To randomly select a single row, simply add df = df.sample() to the code:

import pandas as pd

data = {'Product': ['ABC','DDD','XYZ','AAA','CCC','PPP','NNN','RRR'],
          'Price': [630,790,250,370,880,1250,550,700],
       'Discount': ['No','Yes','No','Yes','Yes','No','No','Yes']
        }

df = pd.DataFrame(data, columns = ['Product','Price','Discount'])

df = df.sample()

print (df)

As you can see, a single row was randomly selected:

Random selection

Scenario 2: randomly select a specified number of rows

Let’s now randomly select 3 rows by setting n=3:

import pandas as pd

data = {'Product': ['ABC','DDD','XYZ','AAA','CCC','PPP','NNN','RRR'],
          'Price': [630,790,250,370,880,1250,550,700],
       'Discount': ['No','Yes','No','Yes','Yes','No','No','Yes']
        }

df = pd.DataFrame(data, columns = ['Product','Price','Discount'])

df = df.sample(n=3)

print (df)

You’ll now see 3 randomly selected rows:

Randomly Select Rows from Pandas DataFrame

Scenario 3: allow a random selection of the same row more than once

You may set replace=True to allow a random selection of the same row more than once:

import pandas as pd

data = {'Product': ['ABC','DDD','XYZ','AAA','CCC','PPP','NNN','RRR'],
          'Price': [630,790,250,370,880,1250,550,700],
       'Discount': ['No','Yes','No','Yes','Yes','No','No','Yes']
        }

df = pd.DataFrame(data, columns = ['Product','Price','Discount'])

df = df.sample(n=3,replace=True)

print (df)

As you can see, the second row (with an index of 1) was randomly selected more than once:

Randomly Select Rows from Pandas DataFrame

Note that setting replace=True doesn’t guarantee that you’ll get the random selection of the same row more than once.

Scenario 4: randomly select a specified fraction of the total number of rows

For the final scenario, let’s set frac=0.50 to get a random selection of 50% of the total rows:

import pandas as pd

data = {'Product': ['ABC','DDD','XYZ','AAA','CCC','PPP','NNN','RRR'],
          'Price': [630,790,250,370,880,1250,550,700],
       'Discount': ['No','Yes','No','Yes','Yes','No','No','Yes']
        }

df = pd.DataFrame(data, columns = ['Product','Price','Discount'])

df = df.sample(frac=0.50)

print (df)

You’ll now see that 4 rows, out of the total of 8 rows in the DataFrame, were selected:

How to Randomly Select Rows from Pandas DataFrame

You can read more about df.sample() by visiting the Pandas Documentation.

Alternatively, you can check the following guide to learn how to randomly select columns from Pandas DataFrame.