In this short guide, you’ll see an example of multiple linear regression in R.

Here are the topics to be reviewed:

- Collecting and capturing the data in R
- Checking for linearity
- Applying the multiple linear regression model in R

## Steps to apply the multiple linear regression in R

### Step 1: Collect and capture the data in R

Let’s start with a simple example where the goal is to predict the index_price (the dependent variable) of a fictitious economy based on two independent/input variables:

- interest_rate
- unemployment_rate

The following code can then be used to capture the data in R:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016) month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1) interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75) unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1) index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

### Step 2: Check for linearity

Before you apply linear regression models, you’ll need to verify that several assumptions are met. Most notably, you’ll need to make sure that a *linear* relationship exists between the dependent variable and the independent variable/s.

A quick way to check for linearity is by using scatter plots.

For our example, we’ll check that a linear relationship exists between:

- The index_price (dependent variable) and the interest_rate (independent variable); and
- The index_price (dependent variable) and the unemployment_rate (independent variable)

Here is the code to plot the relationship between the index_price and the interest_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016) month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1) interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75) unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1) index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719) plot(x = interest_rate, y = index_price)

You’ll notice that indeed a linear relationship exists between the index_price and the interest_rate. Specifically, when interest rates go up, the index price also goes up.

And for the second case, you can use the code below in order to plot the relationship between the index_price and the unemployment_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016) month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1) interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75) unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1) index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719) plot(x = unemployment_rate, y = index_price)

You’ll now see that a linear relationship also exists between the index_price and the unemployment_rate – when the unemployment rates go up, the index price goes down (here we still have a linear relationship, but with a negative slope).

### Step 3: Apply the multiple linear regression in R

You may now use the following template to perform the multiple linear regression in R:

model <- lm(Dependent variable ~ First independent Variable + Second independent variable + ...) summary(model)

Using the template for our example:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016) month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1) interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75) unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1) index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719) model <- lm(index_price ~ interest_rate + unemployment_rate) summary(model)

Once you run the code in R, you’ll get the following summary:

```
Call:
lm(formula = index_price ~ interest_rate + unemployment_rate)
Residuals:
Min 1Q Median 3Q Max
-158.205 -41.667 -6.248 57.741 118.810
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1798.4 899.2 2.000 0.05861 .
interest_rate 345.5 111.4 3.103 0.00539 **
unemployment_rate -250.1 117.9 -2.121 0.04601 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 70.56 on 21 degrees of freedom
Multiple R-squared: 0.8976, Adjusted R-squared: 0.8879
F-statistic: 92.07 on 2 and 21 DF, p-value: 4.043e-11
```

You can use the coefficients in the summary above (as highlighted in yellow) in order to build the multiple linear regression equation as follows:

index_price = (Intercept) + (interest_rate coef)*X_{1} (unemployment_rate coef)*X_{2}

And once you plug the numbers from the summary:

index_price = (1798.4) + (345.5)*X_{1} + (-250.1)*X_{2
}

Some additional stats to consider in the summary:

**Adjusted R-squared**reflects the fit of the model, where a higher value generally indicates a better fit**Intercept coefficient**is the Y-intercept**interest_rate coefficient**is the change in Y due to a change of one unit in the interest rate (everything else held constant)**unemployment_rate coefficient**is the change in Y due to a change of one unit in the unemployment rate (everything else held constant)**Std. Error**reflects the level of accuracy of the coefficients**Pr(>|t|)**is the*p-value*. A p-value of less than 0.05 is considered to be statistically significant