Information Technology > QUESTIONS & ANSWERS > ISYE 6501 19SP01 HW-05 Q8.1, Questions with accurate answers, Graded A+ 2022/2023 (All)
ISYE 6501 19SP01 HW-05 Q8.1 Describe a situation or problem from your job, everyday life, current events, etc., for which a linear regression model would be appropriate. List some (up to 5) predict ... ors that you might use. Solution 8.1 I would use linear regression model to forecast how much money I should allocate for gas using my car for a long trip. I would use the record of weekly mileage of my car as the predictor and weekly money I spend on gas as the response. The linear model should be like following. ‘Money spend on gas’ = a_0 + a_1 * ‘mileage’, where a_1 is the coefficient & a_0 is the intercept. Q8.2 Using crime data from http://www.statsci.org/data/general/uscrime.txt (file uscrime.txt, description at http://www.statsci.org/data/general/uscrime.html ), use regression (a useful R function is lm or glm) to predict the observed crime rate in a city with the following data: M = 14.0 So = 0 Ed = 10.0 Po1 = 12.0 Po2 = 15.5 LF = 0.640 M.F = 94.0 Pop = 150 NW = 1.1 U1 = 0.120 U2 = 3.6 Wealth = 3200 Solution 8.2 The uscrume.txt file has 15 predictors (a.k.a factor) and 47 data points. lm() is used to generated linear regression models with different combinations of predictors. Below is the summary of the lm models Model | R^2 | Adj R^2 | formula ===== | ====== | ======= | ======= model1 | 0.803 | 0.708 | (all predictors) model2 | 0.766 | 0.731 | M + Ed + Po1 + U2 + Ineq + Prob model3 | 0.7 | 0.672 | M + Ed + Po1 + Ineq model4 | 0.666 | 0.642 | Ed + Po1 + Ineq model5 | 0.738 | 0.706 | M + Ed + Po1 + Ineq + Prob model6 | 0.73 | 0.697 | M + Ed + Po1 + U2 + Ineq Since each model uses different predictor formula, we should use Adj. R^2 value, instead of R^2, for model quality comparison. ‘model1’, which uses all predictors in the modeling generation, has Adj.R^2 value of 0.708. Not all of the predictors are significant. Those significant predictors of ‘model1’ are used to generate the rest of the lm models. ‘model2’’s predictors are M, Ed, Po1, U2, Ineq and Prob. It has the best Adj.R^2 (0.731) among all models. And, it output shows all predictors are significant (p <0.05)Sample output of ‘model1’ Call: lm(formula = Crime ~ ., data = data) Residuals: Min 1Q Median 3Q Max -395.74 -98.09 -6.69 112.99 512.67 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.984e+03 1.628e+03 -3.675 0.000893 *** M 8.783e+01 4.171e+01 2.106 0.043443 * So -3.803e+00 1.488e+02 -0.026 0.979765 Ed 1.883e+02 6.209e+01 3.033 0.004861 ** Po1 1.928e+02 1.061e+02 1.817 0.078892 . Po2 -1.094e+02 1.175e+02 -0.931 0.358830 LF -6.638e+02 1.470e+03 -0.452 0.654654 M.F 1.741e+01 2.035e+01 0.855 0.398995 Pop -7.330e-01 1.290e+00 -0.568 0.573845 NW 4.204e+00 6.481e+00 0.649 0.521279 U1 -5.827e+03 4.210e+03 -1.384 0.176238 U2 1.678e+02 8.234e+01 2.038 0.050161 . Wealth 9.617e-02 1.037e-01 0.928 0.360754 Ineq 7.067e+01 2.272e+01 3.111 0.003983 ** Prob -4.855e+03 2.272e+03 -2.137 0.040627 * Time -3.479e+00 7.165e+00 -0.486 0.630708 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 209.1 on 31 degrees of freedom Multiple R-squared: 0.8031, Adjusted R-squared: 0.7078 F-statistic: 8.429 on 15 and 31 DF, p-value: 3.539e-07 Sample output of ‘model2’ Call: lm(formula = Crime ~ M + Ed + Po1 + U2 + Ineq + Prob, data = data) Residuals: Min 1Q Median 3Q Max -470.68 -78.41 -19.68 133.12 556.23 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5040.50 899.84 -5.602 1.72e-06 *** M 105.02 33.30 3.154 0.00305 ** Ed 196.47 44.75 4.390 8.07e-05 *** Po1 115.02 13.75 8.363 2.56e-10 *** U2 89.37 40.91 2.185 0.03483 * Ineq 67.65 13.94 4.855 1.88e-05 *** Prob -3801.84 1528.10 -2.488 0.01711 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 200.7 on 40 degrees of freedom Multiple R-squared: 0.7659, Adjusted R-squared: 0.7307 F-statistic: 21.81 on 6 and 40 DF, p-value: 3.418e-11We use the the following test data to get a predicted crime value from lm models. M = 14.0 So = 0 Ed = 10.0 Po1 = 12.0 Po2 = 15.5 LF = 0.640 M.F = 94.0 Pop = 150 NW = 1.1 U1 = 0.120 U2 = 3.6 Wealth = 3200 The summary of predict value per lm models lm model1 - predicted crime value = 155 lm model2 - predicted crime value = 1304 lm model3 - predicted crime value = 1321 lm model4 - predicted crime value = 1304 lm model5 - predicted crime value = 1326 lm model6 - predicted crime value = 1300 The predicted crime value of model1 is 155, which is way too small comparing the statistics of crime data, where is min. crime value is 342. Therefore, model1 may have problem. All other models has predicted crime value around 1300, which seems to be a reasonable figure w.r.t the statistics of crime data #sample of statistics of crime data stats summary of Crime values: Min. 1st Qu. Median Mean 3rd Qu. Max. 342 658 831 905 1058 1993 Later on, we use cv.lm() function to cross validate the models generated by lm() function above. I use m=4 (4-fold) for the cv.lm() for cross-validation. Below is the summary of cross validation results. Model | lm adj. R^2 | cv.lm Adj R^2 | num of predictors ===== | =========== | ============= | ================= model1 | 0.708 | 0.214 | 15 model2 | 0.731 | 0.677 | 6 model3 | 0.672 | 0.639 | 4 model4 | 0.642 | 0.596 | 3 model5 | 0.706 | 0.649 | 5 model6 | 0.697 | 0.654 | 5 We can see the model1’s cv.lm() adj. R^2 value (0.214) is much small the lm() adj. R^2 (0.708). This implies that model1 has lots of over-fitting and does not have good prediction quality. For model2, its cv.lm() adj. R^2 value (0.677) is still smaller than lm() adj. value (0.677), which suggests there is over-fitting, but the quality of this model is not as bad as the model1.code for solution 8.2 #setup environment rm(list = ls()) set.seed(123) data <- read.table("uscrime.txt", stringsAsFactors = FALSE, header = TRUE) #view data head(data) ## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq ## 1 15.1 1 9.1 5.8 5.6 0.510 95.0 33 30.1 0.108 4.1 3940 26.1 ## 2 14.3 0 11.3 10.3 9.5 0.583 101.2 13 10.2 0.096 3.6 5570 19.4 ## 3 14.2 1 8.9 4.5 4.4 0.533 96.9 18 21.9 0.094 3.3 3180 25.0 ## 4 13.6 0 12.1 14.9 14.1 0.577 99.4 157 8.0 0.102 3.9 6730 16.7 ## 5 14.1 0 12.1 10.9 10.1 0.591 98.5 18 3.0 0.091 2.0 5780 17.4 ## 6 12.1 0 11.0 11.8 11.5 0.547 96.4 25 4.4 0.084 2.9 6890 12.6 ## Prob Time Crime ## 1 0.084602 26.2011 791 ## 2 0.029599 25.2999 1635 ## 3 0.083401 24.3006 578 ## 4 0.015801 29.9012 1969 ## 5 0.041399 21.2998 1234 ## 6 0.034201 20.9995 682 cat('total number of predictor:', length(data) -1, '\n') ## total number of predictor: 15 cat('number of data point:', nrow(data), '\n') ## number of data point: 47 Try different combination of predictors and evaluate Generate the linear regression model using all predictors in the data and find which predictors are significant model1 <- lm( Crime ~ ., data = data) summary(model1) ## ## Call: ## lm(formula = Crime ~ ., data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -395.74 -98.09 -6.69 112.99 512.67 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5.984e+03 1.628e+03 -3.675 0.000893 *** ## M 8.783e+01 4.171e+01 2.106 0.043443 * ## So -3.803e+00 1.488e+02 -0.026 0.979765 ## Ed 1.883e+02 6.209e+01 3.033 0.004861 ** ## Po1 1.928e+02 1.061e+02 1.817 0.078892 . ## Po2 -1.094e+02 1.175e+02 -0.931 0.358830 ## LF -6.638e+02 1.470e+03 -0.452 0.654654 ## M.F 1.741e+01 2.035e+01 0.855 0.398995 ## Pop -7.330e-01 1.290e+00 -0.568 0.573845 ## NW 4.204e+00 6.481e+00 0.649 0.521279 ## U1 -5.827e+03 4.210e+03 -1.384 0.176238 ## U2 1.678e+02 8.234e+01 2.038 0.050161 . ## Wealth 9.617e-02 1.037e-01 0.928 0.360754 ## Ineq 7.067e+01 2.272e+01 3.111 0.003983 ** ## Prob -4.855e+03 2.272e+03 -2.137 0.040627 * ## Time -3.479e+00 7.165e+00 -0.486 0.630708 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 209.1 on 31 degrees of freedom ## Multiple R-squared: 0.8031, Adjusted R-squared: 0.7078 ## F-statistic: 8.429 on 15 and 31 DF, p-value: 3.539e-07 Generate the linear regression model using predictors which has Pr(>|t|) value smaller than 0.1 from summary of model1 model2 = lm(Crime ~ M + Ed + Po1 + U2 + Ineq + Prob, data=data ) summary(model2) ## ## Call: ## lm(formula = Crime ~ M + Ed + Po1 + U2 + Ineq + Prob, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -470.68 -78.41 -19.68 133.12 556.23 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5040.50 899.84 -5.602 1.72e-06 *** ## M 105.02 33.30 3.154 0.00305 ** ## Ed 196.47 44.75 4.390 8.07e-05 *** ## Po1 115.02 13.75 8.363 2.56e-10 *** [Show More]
Last updated: 3 years ago
Preview 1 out of 15 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
GEORGIA TECH BUNDLE, ALL ISYE 6501 EXAMS, HOMEWORKS, QUESTIONS AND ANSWERS, NOTES AND SUMMARIIES, ALL YOU NEED
By bundleHub Solution guider 3 years ago
$60
59
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Sep 03, 2022
Number of pages
15
Written in
All
This document has been written for:
Uploaded
Sep 03, 2022
Downloads
0
Views
133
Scholarfriends.com Online Platform by Browsegrades Inc. 651N South Broad St, Middletown DE. United States.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·