Example of Multiple Linear Regression in R

In this short guide, you’ll see an example of multiple linear regression in R.

Here are the topics to be reviewed:

  • Collecting and capturing the data in R
  • Checking for linearity
  • Applying the multiple linear regression model in R

Steps to apply the multiple linear regression in R

Step 1: Collect and capture the data in R

Let’s start with a simple example where the goal is to predict the index_price (the dependent variable) of a fictitious economy based on two independent/input variables:

  • interest_rate
  • unemployment_rate

The following code can then be used to capture the data in R:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719) 

Step 2: Check for linearity

Before you apply linear regression models, you’ll need to verify that several assumptions are met. Most notably, you’ll need to make sure that a linear relationship exists between the dependent variable and the independent variable/s.

A quick way to check for linearity is by using scatter plots.

For our example, we’ll check that a linear relationship exists between:

  • The index_price (dependent variable) and the interest_rate (independent variable); and
  • The index_price (dependent variable) and the unemployment_rate (independent variable)

Here is the code to plot the relationship between the index_price and the interest_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)        
                
plot(x = interest_rate, y = index_price) 

You’ll notice that indeed a linear relationship exists between the index_price and the interest_rate. Specifically, when interest rates go up, the index price also goes up.

And for the second case, you can use the code below in order to plot the relationship between the index_price and the unemployment_rate:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)        
                
plot(x = unemployment_rate, y = index_price) 

You’ll now see that a linear relationship also exists between the index_price and the unemployment_rate – when the unemployment rates go up, the index price goes down (here we still have a linear relationship, but with a negative slope).

Step 3: Apply the multiple linear regression in R

You may now use the following template to perform the multiple linear regression in R:

model <- lm(Dependent variable ~ First independent Variable + Second independent variable + ...)
summary(model)

Using the template for our example:

year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016)
month <- c(12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
interest_rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75)
unemployment_rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
index_price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719)        
            
model <- lm(index_price ~ interest_rate + unemployment_rate)
summary(model)

Once you run the code in R, you’ll get the following summary:

Call:
lm(formula = index_price ~ interest_rate + unemployment_rate)

Residuals:
     Min       1Q   Median       3Q      Max 
-158.205  -41.667   -6.248   57.741  118.810 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)   
(Intercept)         1798.4      899.2   2.000  0.05861 . 
interest_rate        345.5      111.4   3.103  0.00539 **
unemployment_rate   -250.1      117.9  -2.121  0.04601 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 70.56 on 21 degrees of freedom
Multiple R-squared:  0.8976,    Adjusted R-squared:  0.8879 
F-statistic: 92.07 on 2 and 21 DF,  p-value: 4.043e-11

You can use the coefficients in the summary above (as highlighted in yellow) in order to build the multiple linear regression equation as follows:

index_price = (Intercept) + (interest_rate coef)*X1  (unemployment_rate coef)*X2

And once you plug the numbers from the summary:

index_price = (1798.4) + (345.5)*X1 + (-250.1)*X2

Some additional stats to consider in the summary:

  1. Adjusted R-squared reflects the fit of the model, where a higher value generally indicates a better fit
  2. Intercept coefficient is the Y-intercept
  3. interest_rate coefficient is the change in Y due to a change of one unit in the interest rate (everything else held constant)
  4. unemployment_rate coefficient is the change in Y due to a change of one unit in the unemployment rate (everything else held constant)
  5. Std. Error reflects the level of accuracy of the coefficients
  6. Pr(>|t|) is the p-value. A p-value of less than 0.05 is considered to be statistically significant