Example of Multiple Linear Regression in R

In this short guide, you’ll see an example of multiple linear regression in R.

Here are the topics to be reviewed:

  • Collecting and capturing the data in R
  • Checking for linearity
  • Applying the multiple linear regression model in R

The Steps

Step 1: Collect and capture the data in R

Imagine that you have a fictitious economy, and your goal is to predict the index_price (the dependent variable) based on two independent/input variables:

  • interest_rate
  • unemployment_rate

To capture the full dataset in R:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

Step 2: Check for linearity

Before you apply a linear regression model, you’ll need to verify that a linear relationship exists between the dependent variable and the independent variable/s.

Here, the goal is to check that a linear relationship exists between:

  • The index_price (dependent variable) and the interest_rate (independent variable); and
  • The index_price (dependent variable) and the unemployment_rate (independent variable)

A quick way to check for linearity is by using scatter plots.

To plot the relationship between the index_price and the interest_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

plot(x = interest_rate, y = index_price)

Notice that a linear relationship exists between the index_price and the interest_rate. Specifically, when interest rates go up, the index price also goes up.

To plot the relationship between the index_price and the unemployment_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

plot(x = unemployment_rate, y = index_price)

You’ll now see that a linear relationship also exists between the index_price and the unemployment_rate – when the unemployment rates go up, the index price goes down (here you still have a linear relationship, but with a negative slope).

Step 3: Apply the multiple linear regression in R

Use the following template to perform the multiple linear regression in R:

model <- lm(Dependent variable ~ First independent Variable + Second independent variable + ...)
summary(model)

Here is the full code to apply the multiple linear regression in R:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)

model <- lm(index_price ~ interest_rate + unemployment_rate)
summary(model)

Once you run the code, you’ll get the following summary:

Call:
lm(formula = index_price ~ interest_rate + unemployment_rate)

Residuals:
     Min       1Q   Median       3Q      Max 
-158.205  -41.667   -6.248   57.741  118.810 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)   
(Intercept)         1798.4      899.2   2.000  0.05861 . 
interest_rate        345.5      111.4   3.103  0.00539 **
unemployment_rate   -250.1      117.9  -2.121  0.04601 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 70.56 on 21 degrees of freedom
Multiple R-squared:  0.8976,    Adjusted R-squared:  0.8879 
F-statistic: 92.07 on 2 and 21 DF,  p-value: 4.043e-11

You can use the coefficients in the summary above (as highlighted in yellow) in order to build the multiple linear regression equation as follows:

index_price = (Intercept) + (interest_rate coef)*X1  (unemployment_rate coef)*X2

And once you plug the numbers from the summary:

index_price = (1798.4) + (345.5)*X1 + (-250.1)*X2

Some additional stats to consider in the summary:

  1. Adjusted R-squared reflects the fit of the model, where a higher value generally indicates a better fit
  2. Intercept coefficient is the Y-intercept
  3. interest_rate coefficient is the change in Y due to a change of one unit in the interest rate (everything else held constant)
  4. unemployment_rate coefficient is the change in Y due to a change of one unit in the unemployment rate (everything else held constant)
  5. Std. Error reflects the level of accuracy of the coefficients
  6. Pr(>|t|) is the p-value. A p-value of less than 0.05 is considered to be statistically significant

Leave a Comment