How to Convert NumPy Array to Pandas DataFrame

In this short guide, you’ll see how to convert a NumPy array to Pandas DataFrame.

Here are the complete steps.

Steps to Convert a NumPy Array to Pandas DataFrame

Step 1: Create a NumPy Array

For example, let’s create the following NumPy array that contains only numeric data (i.e., integers):

import numpy as np

my_array = np.array([[11,22,33],[44,55,66]])

print(my_array)
print(type(my_array))

Run the code in Python, and you’ll get the following NumPy array:

Numeric data

Step 2: Convert the NumPy Array to Pandas DataFrame

You can now convert the NumPy array to Pandas DataFrame using the following syntax:

import numpy as np
import pandas as pd

my_array = np.array([[11,22,33],[44,55,66]])

df = pd.DataFrame(my_array, columns = ['Column_A','Column_B','Column_C'])

print(df)
print(type(df))

You’ll now get a DataFrame with 3 columns:

How to Convert NumPy Array to Pandas DataFrame

Step 3 (optional): Add an Index to the DataFrame

What if you’d like to add an index to the DataFrame?

For instance, let’s add the following index to the DataFrame:

index = ['Item_1', 'Item_2']

So here is the complete code to convert the array to a DataFrame with an index:

import numpy as np
import pandas as pd

my_array = np.array([[11,22,33],[44,55,66]])

df = pd.DataFrame(my_array, columns = ['Column_A','Column_B','Column_C'], index = ['Item_1', 'Item_2'])

print(df)
print(type(df))

You’ll now see the index on the left side of the DataFrame:

Adding index

Array Contains a Mix of Strings and Numeric Data

Let’s now create a new NumPy array that will contain a mixture of strings and numeric data (where the dtype for this array will be set to object):

import numpy as np

my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]], dtype=object)

print(my_array)
print(type(my_array))
print(my_array.dtype)

Here is the new array with an object dtype:

object dtype

You can then use the following syntax to convert the NumPy array to a DataFrame:

import numpy as np
import pandas as pd

my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]], dtype=object)

df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year'])

print(df)
print(type(df))

Here is the new DataFrame:

How to Convert NumPy Array to Pandas DataFrame

Let’s check the data types of all the columns in the new DataFrame by adding df.dtypes to the code:

import numpy as np
import pandas as pd

my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]], dtype=object)

df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year'])

print(df)
print(type(df))
print(df.dtypes)

Currently, all the columns under the DataFrame are objects/strings:

Convert NumPy Array to Pandas DataFrame

What if you’d like to convert some of the columns in the DataFrame from objects/strings to integers?

For example, suppose that you’d like to convert the last 3 columns in the DataFrame to integers.

To achieve this goal, you can use astype(int) as captured below:

import numpy as np
import pandas as pd

my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]])

df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year'])

df['Age'] = df['Age'].astype(int)
df['Birth Year'] = df['Birth Year'].astype(int)
df['Graduation Year'] = df['Graduation Year'].astype(int)

print(df)
print(type(df))
print(df.dtypes)

Using astype(int) will give you int32 for those 3 columns:

How to Convert NumPy Array to Pandas DataFrame

Alternatively, you can use apply(int) which will get you int64 for those last 3 columns:

import numpy as np
import pandas as pd

my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]])

df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year'])

df['Age'] = df['Age'].apply(int)
df['Birth Year'] = df['Birth Year'].apply(int)
df['Graduation Year'] = df['Graduation Year'].apply(int)

print(df)
print(type(df))
print(df.dtypes)

As you can see, the last 3 columns in the DataFrame are now int64:

int64

You can read more about Pandas DataFrames by visiting the Pandas Documentation.