# Example of Multiple Linear Regression in R

In this short guide, you’ll see an example of multiple linear regression in R.

Here are the topics to be reviewed:

• Collecting and capturing the data in R
• Checking for linearity
• Applying the multiple linear regression model in R

## The Steps

### Step 1: Collect and capture the data in R

Imagine that you have a fictitious economy, and your goal is to predict the index_price (the dependent variable) based on two independent/input variables:

• interest_rate
• unemployment_rate

To capture the full dataset in R:

`year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)`

### Step 2: Check for linearity

Before you apply a linear regression model, you’ll need to verify that a linear relationship exists between the dependent variable and the independent variable/s.

Here, the goal is to check that a linear relationship exists between:

• The index_price (dependent variable) and the interest_rate (independent variable); and
• The index_price (dependent variable) and the unemployment_rate (independent variable)

A quick way to check for linearity is by using scatter plots.

To plot the relationship between the index_price and the interest_rate:

`year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)                plot(x = interest_rate, y = index_price)`

Notice that a linear relationship exists between the index_price and the interest_rate. Specifically, when interest rates go up, the index price also goes up.

To plot the relationship between the index_price and the unemployment_rate:

`year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)                plot(x = unemployment_rate, y = index_price)`

You’ll now see that a linear relationship also exists between the index_price and the unemployment_rate – when the unemployment rates go up, the index price goes down (here you still have a linear relationship, but with a negative slope).

### Step 3: Apply the multiple linear regression in R

Use the following template to perform the multiple linear regression in R:

`model <- lm(Dependent variable ~ First independent Variable + Second independent variable + ...)summary(model)`

Here is the full code to apply the multiple linear regression in R:

`year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)            model <- lm(index_price ~ interest_rate + unemployment_rate)summary(model)`

Once you run the code, you’ll get the following summary:

``````Call:
lm(formula = index_price ~ interest_rate + unemployment_rate)

Residuals:
Min       1Q   Median       3Q      Max
-158.205  -41.667   -6.248   57.741  118.810

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)         1798.4      899.2   2.000  0.05861 .
interest_rate        345.5      111.4   3.103  0.00539 **
unemployment_rate   -250.1      117.9  -2.121  0.04601 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 70.56 on 21 degrees of freedom
Multiple R-squared:  0.8976,    Adjusted R-squared:  0.8879
F-statistic: 92.07 on 2 and 21 DF,  p-value: 4.043e-11``````

You can use the coefficients in the summary above (as highlighted in yellow) in order to build the multiple linear regression equation as follows:

index_price = (Intercept) + (interest_rate coef)*X1  (unemployment_rate coef)*X2

And once you plug the numbers from the summary:

index_price = (1798.4) + (345.5)*X1 + (-250.1)*X2

Some additional stats to consider in the summary:

1. Adjusted R-squared reflects the fit of the model, where a higher value generally indicates a better fit
2. Intercept coefficient is the Y-intercept
3. interest_rate coefficient is the change in Y due to a change of one unit in the interest rate (everything else held constant)
4. unemployment_rate coefficient is the change in Y due to a change of one unit in the unemployment rate (everything else held constant)
5. Std. Error reflects the level of accuracy of the coefficients
6. Pr(>|t|) is the p-value. A p-value of less than 0.05 is considered to be statistically significant
Categories R