The following template can be used to create a DataFrame in Julia:
using DataFrames df = DataFrame(column_1 = ["value_1", "value_2", "value_3", ...], column_2 = ["value_1", "value_2", "value_3", ...], column_3 = ["value_1", "value_2", "value_3", ...], ... )
In the next section, you’ll see the steps to create a DataFrame in Julia from Scratch.
Steps to Create a DataFrame in Julia from Scratch
Step 1: Install the DataFrames package
If you haven’t already done so, install the DataFrames package in Julia:
using Pkg Pkg.add("DataFrames")
Step 2: Create a DataFrame in Julia
You can then use the following template to create a DataFrame in Julia:
using DataFrames df = DataFrame(column_1 = ["value_1", "value_2", "value_3", ...], column_2 = ["value_1", "value_2", "value_3", ...], column_3 = ["value_1", "value_2", "value_3", ...], ... )
For example, let’s say that you have the following data, and your goal is to create a DataFrame based on that data:
product_id | product_name | price |
1 | Oven | 800 |
2 | Microwave | 250 |
3 | Dishwasher | 700 |
4 | Refrigerator | 1400 |
5 | Toaster | 120 |
Therefore, the complete code to create the DataFrame in Julia is as follows:
using DataFrames df = DataFrame(product_id = [1, 2, 3, 4, 5], product_name = ["Oven", "Microwave", "Dishwasher", "Refrigerator", "Toaster"], price = [800, 250, 700, 1400, 120] ) print(df)
Note that there is no need to use quotes around numeric values, unless you wish to capture those values as strings.
Step 3: Run the code in Julia
Run the code, and you’ll get the following DataFrame:
product_id | product_name | price |
1 | Oven | 800 |
2 | Microwave | 250 |
3 | Dishwasher | 700 |
4 | Refrigerator | 1400 |
5 | Toaster | 120 |
Calculate the maximum value using the DataFrames package
Once you got your DataFrame, you can start performing an assortment of operations and calculations.
For simplicity, let’s say that you want to derive the maximum price in the DataFrame.
You can then use the following code to derive the maximum value:
using DataFrames df = DataFrame(product_id = [1, 2, 3, 4, 5], product_name = ["Oven", "Microwave", "Dishwasher", "Refrigerator", "Toaster"], price = [800, 250, 700, 1400, 120] ) max_value = maximum(df.price) print(max_value)
Run the code and you’ll get the maximum price of 1400.