In this short guide, you’ll see an example of multiple linear regression in R.

Here are the topics to be reviewed:

- Collecting and capturing the data in R
- Checking for linearity
- Applying the multiple linear regression model in R

## The Steps

### Step 1: Collect and capture the data in R

Imagine that you have a fictitious economy, and your goal is to predict the index_price (the dependent variable) based on two independent/input variables:

- interest_rate
- unemployment_rate

To capture the full dataset in R:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)

month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)

interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)

unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)

index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

### Step 2: Check for linearity

Before you apply a linear regression model, you’ll need to verify that a **linear relationship **exists between the dependent variable and the independent variable/s.

Here, the goal is to check that a linear relationship exists between:

- The index_price (dependent variable) and the interest_rate (independent variable); and
- The index_price (dependent variable) and the unemployment_rate (independent variable)

A quick way to check for linearity is by using **scatter plots**.

To plot the relationship between the index_price and the interest_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)

month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)

interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)

unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)

index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

plot(x = interest_rate, y = index_price)

Notice that a linear relationship exists between the index_price and the interest_rate. Specifically, when interest rates go up, the index price also goes up.

To plot the relationship between the index_price and the unemployment_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)

month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)

interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)

unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)

index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

plot(x = unemployment_rate, y = index_price)

You’ll now see that a linear relationship also exists between the index_price and the unemployment_rate – when the unemployment rates go up, the index price goes down (here you still have a linear relationship, but with a *negative slope*).

### Step 3: Apply the multiple linear regression in R

Use the following template to perform the multiple linear regression in R:

model <- lm(Dependent variable ~ First independent Variable + Second independent variable + ...)

summary(model)

Here is the full code to apply the multiple linear regression in R:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)

month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)

interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)

unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)

index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

model <- lm(index_price ~ interest_rate + unemployment_rate)

summary(model)

Once you run the code, you’ll get the following summary:

```
Call:
lm(formula = index_price ~ interest_rate + unemployment_rate)
Residuals:
Min 1Q Median 3Q Max
-158.205 -41.667 -6.248 57.741 118.810
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1798.4 899.2 2.000 0.05861 .
interest_rate 345.5 111.4 3.103 0.00539 **
unemployment_rate -250.1 117.9 -2.121 0.04601 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 70.56 on 21 degrees of freedom
Multiple R-squared: 0.8976, Adjusted R-squared: 0.8879
F-statistic: 92.07 on 2 and 21 DF, p-value: 4.043e-11
```

You can use the coefficients in the summary above (as highlighted in yellow) in order to build the multiple linear regression equation as follows:

index_price = (Intercept) + (interest_rate coef)*X_{1} (unemployment_rate coef)*X_{2}

And once you plug the numbers from the summary:

index_price = (1798.4) + (345.5)*X_{1} + (-250.1)*X_{2}

Some additional stats to consider in the summary:

**Adjusted R-squared**reflects the fit of the model, where a higher value generally indicates a better fit**Intercept coefficient**is the Y-intercept**interest_rate coefficient**is the change in Y due to a change of one unit in the interest rate (everything else held constant)**unemployment_rate coefficient**is the change in Y due to a change of one unit in the unemployment rate (everything else held constant)**Std. Error**reflects the level of accuracy of the coefficients**Pr(>|t|)**is the*p-value*. A p-value of less than 0.05 is considered to be statistically significant