Information Technology  >  QUESTIONS & ANSWERS  >  Georgia Tech WEEK 3 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE, Graded A+ (All)

Georgia Tech WEEK 3 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE, Graded A+

Document Content and Description Below

WEEK 3 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE These homework solutions show multiple approaches and some optional extensions for most of the questions in the assignment. You don’t need to s ... ubmit all this in your assignments; they’re included here just to help you learn more – because remember, the main goal of the homework assignments, and of the entire course, is to help you learn as much as you can, and develop your analytics skills as much as possible! Question 1 Using crime data from http://www.statsci.org/data/general/uscrime.txt (description at http://www.statsci.org/data/general/uscrime.html), test to see whether there is an outlier in the last column (number of crimes per 100,000 people). Is the lowest-crime city an outlier? Is the highest-crime city an outlier? Use the grubbs.test function in the outliers package in R. Here’s one possible solution. Please note that a good solution doesn’t have to try all of the possibilities in the code; they’re shown to help you learn, but they’re not necessary. The file HW3-Q1-fall.R contains R code and some explanation for the following approach. First, because the Grubbs test assumes normality, we start by running a normality test that you’ll probably remember from basic statistics: the Shapiro-Wilk test. The test actually suggests that the data is not normally distributed (p=0.001882) – but looking at the Q-Q plot below, it seems that the reason for the non-normality is the tails, which might imply that the test is affected by potential outliers. The middle of the distribution looks normal, so we’ll go ahead with the Grubbs test.Figure 1. Q-Q plot of the Crime column. Note here that this is really a judgment call. On the one hand, it could be that the Shapiro-Wilk test is identifying that the tails, especially on the upper end, are really not normally-distributed, enough so that the extreme values aren’t really outliers, they’re just part of the distribution. On the other hand, it could be that the distribution really is close enough to normal, and the reason it fails the Shapiro-Wilk test is that there’s outlying data. The Grubbs test’s validity depends on which of these is closer to true. In this case, let’s go on with the Grubbs test. At worst, it’ll either show that there aren’t outliers, or it’ll identify potential outliers – then we would (if this was more than a homework assignment) investigate those data points more carefully to see what’s going on, to determine whether they seem like a real part of the distribution or whether they’re real outliers. It turns out that the lowest-crime city is unlikely to be an outlier (p-value so close to 1 that it just comes up as 1). On the other hand, the highest-crime city might be an outlier (p=0.079), and if we remove it, the secondhighest-crime city also appears to be an outlier (p=0.028). The box-and-whisker plot below shows the outliers more clearly.Figure 3. Box-and-whisker plot of the Crime column. Note that some people tried to determine whether there was an outlier on the high end and on the low end simultaneously, using the “type=11” parameter in the grubbs.test() R function. The problem with this approach is that the answer it returns is “no” – because they’re not both outliers (as we saw, the lowest-crime city isn’t an outlier). That result hides the fact that the highest-crime city probably is an outlier (and in fact, so is the second-highest). So using the “type=10” parameter is generally a better approach; it tests one side, and adding the “opposite=TRUE” parameter tests the other side. See HW2- Q3.R for details. Question 2 Describe a situation or problem from your job, everyday life, current events, etc., for which a Change Detection model would be appropriate. Applying the CUSUM technique, how would you choose the critical value and the threshold? Here’s one answer. Bird flu is a common disease in certain parts of East Asia, Middle East and West Africa. Public health organizations (such as CDC) want to identify a potential outbreak as soon as possible so that action can be taken to stop its spread. CUSUM can be used to monitor the number of cases of this disease, and detect a change when the number of cases rises above a threshold, indicating a possible outbreak. Ideally, the CUSUM statistic St should remain smaller than the threshold if an epidemic is not going to occur, and quickly cross the threshold if an epidemic is going to occur. In some uses of CUSUM, a typical choice for threshold T is 5 standard deviations, while C will be half of a standard deviation. But in the case of a potentially-deadly epidemic, those “standard” values are probably too conservative. The costof a false alarm is much lower than the cost of waiting too long to detect a change. Therefore, to save lives, the values of both T and C should be lower (and can be calibrated based on previous data). Question 3 1. Using July through October daily-high-temperature data for Atlanta for 1996 through 2015, use a CUSUM approach to identify when unofficial summer ends (i.e., when the weather starts cooling off) each year. That involves finding a good critical value and threshold to use across all years. You can get the data that you need online, for example at http://www.iweathernet.com/atlanta-weather-records [Show More]

Last updated: 3 years ago

Preview 1 out of 8 pages

Buy Now

Instant download

We Accept:

Payment methods accepted on Scholarfriends (We Accept)
Preview image of Georgia Tech WEEK 3 HOMEWORK – SAMPLE SOLUTIONS IMPORTANT NOTE, Graded A+ document

Buy this document to get the full access instantly

Instant Download Access after purchase

Buy Now

Instant download

We Accept:

Payment methods accepted on Scholarfriends (We Accept)

Also available in bundle (1)

Click Below to Access Bundle(s)

GEORGIA TECH BUNDLE, ALL ISYE 6501 EXAMS, HOMEWORKS, QUESTIONS AND ANSWERS, NOTES AND SUMMARIIES, ALL YOU NEED

GEORGIA TECH BUNDLE, ALL ISYE 6501 EXAMS, HOMEWORKS, QUESTIONS AND ANSWERS, NOTES AND SUMMARIIES, ALL YOU NEED

By bundleHub Solution guider 3 years ago

$60

59  

Reviews( 0 )

$6.00

Buy Now

We Accept:

Payment methods accepted on Scholarfriends (We Accept)

Instant download

Can't find what you want? Try our AI powered Search

147
0

Document information


Connected school, study & course


About the document


Uploaded On

Sep 03, 2022

Number of pages

8

Written in

All

Seller


Profile illustration for bundleHub Solution guider
bundleHub Solution guider

Member since 3 years

356 Documents Sold

Reviews Received
27
21
9
0
9
Additional information

This document has been written for:

Uploaded

Sep 03, 2022

Downloads

 0

Views

 147

Document Keyword Tags

More From bundleHub Solution guider

View all bundleHub Solution guider's documents »

$6.00
What is Scholarfriends

Scholarfriends.com Online Platform by Browsegrades Inc. 651N South Broad St, Middletown DE. United States.

We are here to help

We're available through e-mail, Twitter, Facebook, and live chat.
 FAQ
 Questions? Leave a message!

Follow us on
 Twitter

Copyright © Scholarfriends · High quality services·