How to Convert NumPy Array to Pandas DataFrame

In this short guide, you’ll see how to convert a NumPy array to Pandas DataFrame.

Here are the steps:

Steps to Convert a NumPy Array to Pandas DataFrame

Step 1: Create a NumPy Array

For example, let’s create the following NumPy array that contains only numeric data (i.e., integers):

import numpy as np

my_array = np.array([[11, 22, 33], [44, 55, 66]])

print(my_array)
print(type(my_array))

Run the code in Python, and you’ll get the following NumPy array:

[[11 22 33]
 [44 55 66]]
<class 'numpy.ndarray'>

Step 2: Convert the NumPy Array to Pandas DataFrame

You can now convert the NumPy array to Pandas DataFrame using the following syntax:

import numpy as np
import pandas as pd

my_array = np.array([[11, 22, 33], [44, 55, 66]])

df = pd.DataFrame(my_array, columns=['Column_A', 'Column_B', 'Column_C'])

print(df)
print(type(df))

You’ll now get a DataFrame with 3 columns:

   Column_A  Column_B  Column_C
0        11        22        33
1        44        55        66
<class 'pandas.core.frame.DataFrame'>

Step 3 (optional): Add an Index to the DataFrame

What if you’d like to add an index to the DataFrame?

For instance, let’s add the following index to the DataFrame:

index=['Item_1', 'Item_2']

So here is the complete code to convert the array to a DataFrame with an index:

import numpy as np
import pandas as pd

my_array = np.array([[11, 22, 33], [44, 55, 66]])

df = pd.DataFrame(my_array, columns=['Column_A', 'Column_B', 'Column_C'], index=['Item_1', 'Item_2'])

print(df)
print(type(df))

You’ll now see the index on the left side of the DataFrame:

        Column_A  Column_B  Column_C
Item_1        11        22        33
Item_2        44        55        66
<class 'pandas.core.frame.DataFrame'>

Array Contains a Mix of Strings and Numeric Data

Let’s now create a new NumPy array that contains a mixture of strings and numeric data (where the dtype for this array will be set to object):

import numpy as np

my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object)

print(my_array)
print(type(my_array))
print(my_array.dtype)

Here is the new array with an object dtype:

[['Jon' 25 1995 2016]
 ['Maria' 47 1973 2000]
 ['Bill' 38 1982 2005]]
<class 'numpy.ndarray'>
object

You can then use the following syntax to convert the NumPy array to a DataFrame:

import numpy as np
import pandas as pd

my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object)

df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year'])

print(df)
print(type(df))

Here is the new DataFrame:

    Name Age Birth Year Graduation Year
0    Jon  25       1995            2016
1  Maria  47       1973            2000
2   Bill  38       1982            2005
<class 'pandas.core.frame.DataFrame'>

Let’s check the data types of all the columns in the new DataFrame by adding df.dtypes to the code:

import numpy as np
import pandas as pd

my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object)

df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year'])

print(df)
print(type(df))
print(df.dtypes)

Currently, all the columns under the DataFrame are objects/strings:

    Name Age Birth Year Graduation Year
0    Jon  25       1995            2016
1  Maria  47       1973            2000
2   Bill  38       1982            2005
<class 'pandas.core.frame.DataFrame'>
Name               object
Age                object
Birth Year         object
Graduation Year    object
dtype: object

What if you’d like to convert some of the columns in the DataFrame from objects/strings to integers?

For example, let’s suppose that you’d like to convert the last 3 columns in the DataFrame to integers.

To achieve that goal, you can use astype(int) as captured below:

import numpy as np
import pandas as pd

my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object)

df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year'])

df['Age'] = df['Age'].astype(int)
df['Birth Year'] = df['Birth Year'].astype(int)
df['Graduation Year'] = df['Graduation Year'].astype(int)

print(df)
print(type(df))
print(df.dtypes)

Using astype(int) will give you int32 for those 3 columns:

    Name  Age  Birth Year  Graduation Year
0    Jon   25        1995             2016
1  Maria   47        1973             2000
2   Bill   38        1982             2005
<class 'pandas.core.frame.DataFrame'>
Name               object
Age                 int32
Birth Year          int32
Graduation Year     int32
dtype: object

Alternatively, you can use apply(int) which will give you int64 for those last 3 columns:

import numpy as np
import pandas as pd

my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object)

df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year'])

df['Age'] = df['Age'].apply(int)
df['Birth Year'] = df['Birth Year'].apply(int)
df['Graduation Year'] = df['Graduation Year'].apply(int)

print(df)
print(type(df))
print(df.dtypes)

As you can see, the last 3 columns in the DataFrame are now int64:

    Name  Age  Birth Year  Graduation Year
0    Jon   25        1995             2016
1  Maria   47        1973             2000
2   Bill   38        1982             2005
<class 'pandas.core.frame.DataFrame'>
Name               object
Age                 int64
Birth Year          int64
Graduation Year     int64
dtype: object

You can read more about Pandas DataFrames by visiting the Pandas Documentation.

Leave a Comment