In this short guide, I’ll show you how to create a DataFrame in Julia.
To start, here is a template that you can use to create your DataFrame:
using DataFrames df = DataFrame(Column1 = ["Value1","Value2","Value3",...], Column2 = ["Value1","Value2","Value3",...], Column3 = ["Value1","Value2","Value3",..., ... )
In the next section, I’ll review the steps to create a DataFrame in Julia from Scratch.
Steps to Create a DataFrame in Julia from Scratch
Step 1: Install the DataFrames package
To install the DataFrames package, you’ll need to open the Julia command-line:
You’ll then see this screen:
Type the following code in the command-line, and then press ENTER:
using Pkg
Finally, to complete the installation of the DataFrames package, type the code below, and then press ENTER:
Pkg.add("DataFrames")
You’ll need to wait about a minute for the installation to complete.
Step 2: Create a DataFrame in Julia
You can use the following template to create a DataFrame in Julia:
using DataFrames df = DataFrame(Column1 = ["Value1","Value2","Value3",...], Column2 = ["Value1","Value2","Value3",...], Column3 = ["Value1","Value2","Value3",..., ... )
For example, let’s say that you collected the data below, and your goal is to create a DataFrame based on that data.
Name | Age | Salary |
Jon | 22 | 30000 |
Bill | 43 | 45000 |
Maria | 81 | 60000 |
Julia | 52 | 50000 |
Mark | 27 | 55000 |
This is the complete code to create the DataFrame in Julia (note that there is no need to use quotes around numeric values, unless you wish to capture those values as strings):
using DataFrames df = DataFrame(Name = ["Jon","Bill","Maria","Julia","Mark"], Age = [22,43,81,52,27], Salary = [30000,45000,60000,50000,55000] )
Copy the code into Julia:
Once you’re ready, run the code and you’ll get the following DataFrame:
Alternatively, you could run the code in Jupyter Notebook. You will get the same DataFrame:
Calculate the maximum value using the DataFrames package
Once you got your DataFrame, you can start performing an assortment of operations and calculations.
For simplicity, let’s say that you want to derive the maximum salary in the DataFrame.
You can then use the following code to derive the maximum value:
using DataFrames df = DataFrame(Name = ["Jon","Bill","Maria","Julia","Mark"], Age = [22,43,81,52,27], Salary = [30000,45000,60000,50000,55000] ) max_value = maximum(df.Salary) print(max_value)
Run the code and you’ll get the maximum salary of 60000: