How to Randomly Select Columns from Pandas DataFrame

Depending on your needs, you may use either of the 4 techniques below in order to randomly select columns from Pandas DataFrame:

(1) Randomly select a single column:

df = df.sample(axis='columns')

(2) Randomly select a specified number of columns. For example, to select 3 random columns, set n=3:

df = df.sample(n=3,axis='columns')

 (3) Allow a random selection of the same column more than once (by setting replace=True):

df = df.sample(n=3,axis='columns',replace=True)

(4) Randomly select a specified fraction of the total number of columns (for example, if you have 6 columns, and you set frac=0.50, then you’ll get a random selection of 50% of the total columns, meaning that 3 columns will be randomly selected):

df = df.sample(frac=0.50,axis='columns')

In the next section, you’ll see how to apply each of the above cases in practice.

The Example

To begin with a simple example, let’s create a DataFrame with 6 columns that contain data about boxes:

import pandas as pd

boxes = {'Color': ['Blue','Blue','Green','Green','Green','Red','Red','Red'],
         'Shape': ['Square','Square','Square','Rectangle','Rectangle','Rectangle','Square','Rectangle'],
      'Material': ['Wood','Cardboard','Wood','Wood','Wood','Cardboard','Cardboard','Wood'],
        'Length': [15,25,25,15,15,15,20,25],
         'Width': [8,5,5,4,8,8,5,4],
        'Height': [30,35,35,40,30,35,40,40]
        }

df = pd.DataFrame(boxes, columns = ['Color','Shape','Material','Length','Width','Height'])

print (df)

Run the code in Python, and you’ll get the following DataFrame:

Example of a DataFrame

The goal is to randomly select columns from the above DataFrame across 4 different cases.

4 Cases to Randomly Select Columns from Pandas DataFrame

Case 1: randomly select a single column

To randomly select a single column, simply add df = df.sample(axis=’columns’) to the code:

import pandas as pd

boxes = {'Color': ['Blue','Blue','Green','Green','Green','Red','Red','Red'],
         'Shape': ['Square','Square','Square','Rectangle','Rectangle','Rectangle','Square','Rectangle'],
      'Material': ['Wood','Cardboard','Wood','Wood','Wood','Cardboard','Cardboard','Wood'],
        'Length': [15,25,25,15,15,15,20,25],
         'Width': [8,5,5,4,8,8,5,4],
        'Height': [30,35,35,40,30,35,40,40]
        }

df = pd.DataFrame(boxes, columns = ['Color','Shape','Material','Length','Width','Height'])

df = df.sample(axis='columns')

print (df)

Run the code, and you’ll see that a single column was randomly selected:

Random selection of a single column

Case 2: randomly select a specified number of columns

Let’s suppose that you want to randomly select 3 columns from the DataFrame. In that case, you’ll need to set n=3:

import pandas as pd

boxes = {'Color': ['Blue','Blue','Green','Green','Green','Red','Red','Red'],
         'Shape': ['Square','Square','Square','Rectangle','Rectangle','Rectangle','Square','Rectangle'],
      'Material': ['Wood','Cardboard','Wood','Wood','Wood','Cardboard','Cardboard','Wood'],
        'Length': [15,25,25,15,15,15,20,25],
         'Width': [8,5,5,4,8,8,5,4],
        'Height': [30,35,35,40,30,35,40,40]
        }

df = pd.DataFrame(boxes, columns = ['Color','Shape','Material','Length','Width','Height'])

df = df.sample(n=3,axis='columns')

print (df)

As you can see, 3 columns were randomly selected:

How to Randomly Select Columns from Pandas DataFrame

Case 3: allow a random selection of the same column more than once

What if you want to allow the random selection of the same column more than once?

In such a case, you’ll need to set replace=True in the code:

import pandas as pd

boxes = {'Color': ['Blue','Blue','Green','Green','Green','Red','Red','Red'],
         'Shape': ['Square','Square','Square','Rectangle','Rectangle','Rectangle','Square','Rectangle'],
      'Material': ['Wood','Cardboard','Wood','Wood','Wood','Cardboard','Cardboard','Wood'],
        'Length': [15,25,25,15,15,15,20,25],
         'Width': [8,5,5,4,8,8,5,4],
        'Height': [30,35,35,40,30,35,40,40]
        }

df = pd.DataFrame(boxes, columns = ['Color','Shape','Material','Length','Width','Height'])

df = df.sample(n=3,axis='columns',replace=True)

print (df)

As can be observed, the ‘Length’ column was randomly selected more than once:

Randomly Select Columns from Pandas DataFrame

Note that setting replace=True doesn’t guarantee that you’ll get the random selection of the same column more than once.

Case 4: randomly select a specified fraction of the total number of columns

Suppose that you want to randomly select a specified fraction of the total number of columns.

For example, if you set frac=0.50, then 50% of the total number of columns will be selected (meaning that 3 columns, out of the total of 6 columns, will be randomly selected):

import pandas as pd

boxes = {'Color': ['Blue','Blue','Green','Green','Green','Red','Red','Red'],
         'Shape': ['Square','Square','Square','Rectangle','Rectangle','Rectangle','Square','Rectangle'],
      'Material': ['Wood','Cardboard','Wood','Wood','Wood','Cardboard','Cardboard','Wood'],
        'Length': [15,25,25,15,15,15,20,25],
         'Width': [8,5,5,4,8,8,5,4],
        'Height': [30,35,35,40,30,35,40,40]
        }

df = pd.DataFrame(boxes, columns = ['Color','Shape','Material','Length','Width','Height'])

df = df.sample(frac=0.50,axis='columns')

print (df)

As you can see, 3 columns were indeed randomly selected:

How to Randomly Select Columns from Pandas DataFrame

You can read more about df.sample() by visiting the Pandas Documentation.

Alternatively, you can check the following guide to learn how to randomly select rows from Pandas DataFrame.