Here are two approaches to convert Pandas DataFrame to a NumPy array:
(1) First approach:
df.to_numpy()
(2) Second approach:
df.values
Steps to Convert Pandas DataFrame to a NumPy Array
Step 1: Create a DataFrame
To start with a simple example, let’s create a DataFrame with 3 columns. The 3 columns will contain only numeric data (i.e., integers):
import pandas as pd data = {'Age': [25, 47, 38], 'Birth Year': [1995, 1973, 1982], 'Graduation Year': [2016, 2000, 2005] } df = pd.DataFrame(data) print(df) print(type(df))
Run the code, and you’ll get the following Pandas DataFrame:
Age Birth Year Graduation Year
0 25 1995 2016
1 47 1973 2000
2 38 1982 2005
<class 'pandas.core.frame.DataFrame'>
Step 2: Convert the DataFrame to a NumPy Array
You can use the first approach of df.to_numpy() to convert the DataFrame to a NumPy array:
df.to_numpy()
Here is the complete code to perform the conversion:
import pandas as pd data = {'Age': [25, 47, 38], 'Birth Year': [1995, 1973, 1982], 'Graduation Year': [2016, 2000, 2005] } df = pd.DataFrame(data) my_array = df.to_numpy() print(my_array) print(type(my_array))
As you can see, the DataFrame is now converted to a NumPy array:
[[ 25 1995 2016]
[ 47 1973 2000]
[ 38 1982 2005]]
<class 'numpy.ndarray'>
Alternatively, you can use the second approach of df.values to convert the DataFrame to a NumPy array:
import pandas as pd data = {'Age': [25, 47, 38], 'Birth Year': [1995, 1973, 1982], 'Graduation Year': [2016, 2000, 2005] } df = pd.DataFrame(data) my_array = df.values print(my_array) print(type(my_array))
You’ll get the same NumPy array:
[[ 25 1995 2016]
[ 47 1973 2000]
[ 38 1982 2005]]
<class 'numpy.ndarray'>
Step 3 (optional step): Check the Data Type
Once you converted the DataFrame to an array, you can check the dtype by adding print(my_array.dtype) at the bottom of the code:
import pandas as pd data = {'Age': [25, 47, 38], 'Birth Year': [1995, 1973, 1982], 'Graduation Year': [2016, 2000, 2005] } df = pd.DataFrame(data) my_array = df.to_numpy() print(my_array) print(type(my_array)) print(my_array.dtype)
For the above example, the dtype is integer (int64):
[[ 25 1995 2016]
[ 47 1973 2000]
[ 38 1982 2005]]
<class 'numpy.ndarray'>
int64
Convert a DataFrame with Mixed Data Types
What if you have a DataFrame with mixed data types (e.g., string/object and integer)?
For example, let’s create another DataFrame with a mixture of strings and numeric data:
import pandas as pd data = {'Name': ['Jon', 'Maria', 'Bill'], 'Age': [25, 47, 38], 'Birth Year': [1995, 1973, 1982], 'Graduation Year': [2016, 2000, 2005] } df = pd.DataFrame(data) print(df) print(type(df))
This is how the DataFrame would look like:
Name Age Birth Year Graduation Year
0 Jon 25 1995 2016
1 Maria 47 1973 2000
2 Bill 38 1982 2005
<class 'pandas.core.frame.DataFrame'>
Let’s now convert the above DataFrame to a NumPy array, and then check the dtype:
import pandas as pd data = {'Name': ['Jon', 'Maria', 'Bill'], 'Age': [25, 47, 38], 'Birth Year': [1995, 1973, 1982], 'Graduation Year': [2016, 2000, 2005] } df = pd.DataFrame(data) my_array = df.to_numpy() print(my_array) print(type(my_array)) print(my_array.dtype)
As you can see, the dtype in this case is object:
[['Jon' 25 1995 2016]
['Maria' 47 1973 2000]
['Bill' 38 1982 2005]]
<class 'numpy.ndarray'>
object
You can read more about df.to_numpy() by visiting the Pandas Documentation.