In this short guide, you’ll see how to convert a NumPy array to Pandas DataFrame.
Here are the steps:
Steps to Convert a NumPy Array to Pandas DataFrame
Step 1: Create a NumPy Array
For example, let’s create the following NumPy array that contains only numeric data (i.e., integers):
import numpy as np my_array = np.array([[11, 22, 33], [44, 55, 66]]) print(my_array) print(type(my_array))
Run the code in Python, and you’ll get the following NumPy array:
[[11 22 33]
[44 55 66]]
<class 'numpy.ndarray'>
Step 2: Convert the NumPy Array to Pandas DataFrame
You can now convert the NumPy array to Pandas DataFrame using the following syntax:
import numpy as np import pandas as pd my_array = np.array([[11, 22, 33], [44, 55, 66]]) df = pd.DataFrame(my_array, columns=['Column_A', 'Column_B', 'Column_C']) print(df) print(type(df))
You’ll now get a DataFrame with 3 columns:
Column_A Column_B Column_C
0 11 22 33
1 44 55 66
<class 'pandas.core.frame.DataFrame'>
Step 3 (optional): Add an Index to the DataFrame
What if you’d like to add an index to the DataFrame?
For instance, let’s add the following index to the DataFrame:
index=['Item_1', 'Item_2']
So here is the complete code to convert the array to a DataFrame with an index:
import numpy as np import pandas as pd my_array = np.array([[11, 22, 33], [44, 55, 66]]) df = pd.DataFrame(my_array, columns=['Column_A', 'Column_B', 'Column_C'], index=['Item_1', 'Item_2']) print(df) print(type(df))
You’ll now see the index on the left side of the DataFrame:
Column_A Column_B Column_C
Item_1 11 22 33
Item_2 44 55 66
<class 'pandas.core.frame.DataFrame'>
Array Contains a Mix of Strings and Numeric Data
Let’s now create a new NumPy array that contains a mixture of strings and numeric data (where the dtype for this array will be set to object):
import numpy as np my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object) print(my_array) print(type(my_array)) print(my_array.dtype)
Here is the new array with an object dtype:
[['Jon' 25 1995 2016]
['Maria' 47 1973 2000]
['Bill' 38 1982 2005]]
<class 'numpy.ndarray'>
object
You can then use the following syntax to convert the NumPy array to a DataFrame:
import numpy as np import pandas as pd my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object) df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year']) print(df) print(type(df))
Here is the new DataFrame:
Name Age Birth Year Graduation Year
0 Jon 25 1995 2016
1 Maria 47 1973 2000
2 Bill 38 1982 2005
<class 'pandas.core.frame.DataFrame'>
Let’s check the data types of all the columns in the new DataFrame by adding df.dtypes to the code:
import numpy as np import pandas as pd my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object) df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year']) print(df) print(type(df)) print(df.dtypes)
Currently, all the columns under the DataFrame are objects/strings:
Name Age Birth Year Graduation Year
0 Jon 25 1995 2016
1 Maria 47 1973 2000
2 Bill 38 1982 2005
<class 'pandas.core.frame.DataFrame'>
Name object
Age object
Birth Year object
Graduation Year object
dtype: object
What if you’d like to convert some of the columns in the DataFrame from objects/strings to integers?
For example, let’s suppose that you’d like to convert the last 3 columns in the DataFrame to integers.
To achieve that goal, you can use astype(int) as captured below:
import numpy as np import pandas as pd my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object) df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year']) df['Age'] = df['Age'].astype(int) df['Birth Year'] = df['Birth Year'].astype(int) df['Graduation Year'] = df['Graduation Year'].astype(int) print(df) print(type(df)) print(df.dtypes)
Using astype(int) will give you int32 for those 3 columns:
Name Age Birth Year Graduation Year
0 Jon 25 1995 2016
1 Maria 47 1973 2000
2 Bill 38 1982 2005
<class 'pandas.core.frame.DataFrame'>
Name object
Age int32
Birth Year int32
Graduation Year int32
dtype: object
Alternatively, you can use apply(int) which will give you int64 for those last 3 columns:
import numpy as np import pandas as pd my_array = np.array([['Jon', 25, 1995, 2016], ['Maria', 47, 1973, 2000], ['Bill', 38, 1982, 2005]], dtype=object) df = pd.DataFrame(my_array, columns=['Name', 'Age', 'Birth Year', 'Graduation Year']) df['Age'] = df['Age'].apply(int) df['Birth Year'] = df['Birth Year'].apply(int) df['Graduation Year'] = df['Graduation Year'].apply(int) print(df) print(type(df)) print(df.dtypes)
As you can see, the last 3 columns in the DataFrame are now int64:
Name Age Birth Year Graduation Year
0 Jon 25 1995 2016
1 Maria 47 1973 2000
2 Bill 38 1982 2005
<class 'pandas.core.frame.DataFrame'>
Name object
Age int64
Birth Year int64
Graduation Year int64
dtype: object
You can read more about Pandas DataFrames by visiting the Pandas Documentation.