Simulating Data for a Regression Model in R

Simulating data for use in a linear regression model is fairly straightforward in R because of how the language’s vector operations work.

For instance, to simulate 1000 observations for the following population model:

Yi = 5 + X1i(0.03) + X2i(0.05) + Εi
Εi ~ N(0, 0.25)

We could run the following R code:

n <- 1000
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- 5 + x1*.03 + x2*.05 + rnorm(n, m=0, sd=.25)

Notice that to calculate y, we only had to enter our regression model. If we calculated a regression using our simulated data, we should see something similar to the following:

Call:
lm(formula = y ~ x1 + x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.71103 -0.17190 -0.00386  0.17086  0.84411 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 5.010316   0.007763 645.370  < 2e-16 ***
x1          0.038880   0.007751   5.016 6.24e-07 ***
x2          0.049713   0.007548   6.586 7.28e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2455 on 997 degrees of freedom
Multiple R-squared:  0.06377,	Adjusted R-squared:  0.06189 
F-statistic: 33.95 on 2 and 997 DF,  p-value: 5.436e-15

Notice that the coefficients and the residual standard error are close to those in our population model.

Share
"Like" our Facebook page for more posts about applied statistics, research methods, and coding:

Leave a Reply

Your email address will not be published. Required fields are marked *

© AcuPsy 2015       info@acupsy.com       Load our Facebook page   Load our Twitter page   Load our LinkedIn page