How to Convert Pandas DataFrame to NumPy Array

Here are two approaches to convert Pandas DataFrame to a NumPy array:

(1) First approach:

df.to_numpy()

(2) Second approach:

df.values

Steps to Convert Pandas DataFrame to a NumPy Array

Step 1: Create a DataFrame

To start with a simple example, let’s create a DataFrame with 3 columns. The 3 columns will contain only numeric data (i.e., integers):

import pandas as pd

data = {'Age': [25, 47, 38],
        'Birth Year': [1995, 1973, 1982],
        'Graduation Year': [2016, 2000, 2005]
        }

df = pd.DataFrame(data)

print(df)
print(type(df))

Run the code, and you’ll get the following Pandas DataFrame:

   Age  Birth Year  Graduation Year
0   25        1995             2016
1   47        1973             2000
2   38        1982             2005
<class 'pandas.core.frame.DataFrame'>

Step 2: Convert the DataFrame to a NumPy Array

You can use the first approach of df.to_numpy() to convert the DataFrame to a NumPy array:

df.to_numpy()

Here is the complete code to perform the conversion:

import pandas as pd

data = {'Age': [25, 47, 38],
        'Birth Year': [1995, 1973, 1982],
        'Graduation Year': [2016, 2000, 2005]
        }

df = pd.DataFrame(data)

my_array = df.to_numpy()

print(my_array)
print(type(my_array))

As you can see, the DataFrame is now converted to a NumPy array:

[[  25 1995 2016]
 [  47 1973 2000]
 [  38 1982 2005]]
<class 'numpy.ndarray'>

Alternatively, you can use the second approach of df.values to convert the DataFrame to a NumPy array:

import pandas as pd

data = {'Age': [25, 47, 38],
        'Birth Year': [1995, 1973, 1982],
        'Graduation Year': [2016, 2000, 2005]
        }

df = pd.DataFrame(data)

my_array = df.values

print(my_array)
print(type(my_array))

You’ll get the same NumPy array:

[[  25 1995 2016]
 [  47 1973 2000]
 [  38 1982 2005]]
<class 'numpy.ndarray'>

Step 3 (optional step): Check the Data Type

Once you converted the DataFrame to an array, you can check the dtype by adding print(my_array.dtype) at the bottom of the code:

import pandas as pd

data = {'Age': [25, 47, 38],
        'Birth Year': [1995, 1973, 1982],
        'Graduation Year': [2016, 2000, 2005]
        }

df = pd.DataFrame(data)

my_array = df.to_numpy()

print(my_array)
print(type(my_array))
print(my_array.dtype)

For the above example, the dtype is integer (int64):

[[  25 1995 2016]
 [  47 1973 2000]
 [  38 1982 2005]]
<class 'numpy.ndarray'>
int64

Convert a DataFrame with Mixed Data Types

What if you have a DataFrame with mixed data types (e.g., string/object and integer)?

For example, let’s create another DataFrame with a mixture of strings and numeric data:

import pandas as pd

data = {'Name': ['Jon', 'Maria', 'Bill'],
        'Age': [25, 47, 38],
        'Birth Year': [1995, 1973, 1982],
        'Graduation Year': [2016, 2000, 2005]
        }

df = pd.DataFrame(data)

print(df)
print(type(df))

This is how the DataFrame would look like:

    Name  Age  Birth Year  Graduation Year
0    Jon   25        1995             2016
1  Maria   47        1973             2000
2   Bill   38        1982             2005
<class 'pandas.core.frame.DataFrame'>

Let’s now convert the above DataFrame to a NumPy array, and then check the dtype:

import pandas as pd

data = {'Name': ['Jon', 'Maria', 'Bill'],
        'Age': [25, 47, 38],
        'Birth Year': [1995, 1973, 1982],
        'Graduation Year': [2016, 2000, 2005]
        }

df = pd.DataFrame(data)

my_array = df.to_numpy()

print(my_array)
print(type(my_array))
print(my_array.dtype)

As you can see, the dtype in this case is object:

[['Jon' 25 1995 2016]
 ['Maria' 47 1973 2000]
 ['Bill' 38 1982 2005]]
<class 'numpy.ndarray'>
object

You can read more about df.to_numpy() by visiting the Pandas Documentation.

Leave a Comment