STAT 200: Introduction to Statistics
Homework #6
1. (3 points): Many high school students take the AP tests in different subject areas. In 2007, of the
144,796 students who took the biology exam 84,199 of them were fe
...
STAT 200: Introduction to Statistics
Homework #6
1. (3 points): Many high school students take the AP tests in different subject areas. In 2007, of the
144,796 students who took the biology exam 84,199 of them were female. In that same year, of the
211,693 students who took the calculus AB exam 102,598 of them were female ("AP exam scores,"
2013). Estimate the difference in the proportion of female students taking the biology exam and female
students taking the calculus AB exam using a 90% confidence level. Interpret the results.
First, we should start off by writing down what we know (always a good place to start). If you write
down what you know and what you are trying to solve (purpose of analysis), you can usually determine
what method you need to use to solve the problem.
n1 = 144,796 n2 = 211,693
x1 = 84,199 x2 = 102,598
C = 1 - α = 1 - 0.9 = 0.1 (remember, the critical region refers to the area in the tail(s), and the confidence
interval % refers to the area in the middle)
Purpose of Analysis: For this problem, we are to estimate a 90% confidence interval on the difference in
proportion of female students taking the biology and calculus AB exam.
This difference is given by: p1 – p2
Thus, are computing a confidence interval for the difference between two population proportions.
We need to compute the margin of error and add/subtract it from the difference in the sample
proportions; we can follow the example on page 267 in the Kozak textbook.
i.) State the random variable and the parameters in words.
x1 = number of female students taking the biology exam
x2 = number of female students taking the calculus AB exam
p1 = proportion of female students taking the biology exam
p2 = proportion of female students taking the calculus AB exam
ii.) State and check the assumptions for confidence interval
a.) A simple random sample of 1447,96 students taking the biology exam is taken. A simple random
sample of 211,693 students taking the calculus AB exam is taken. Both samples were collected
from all students for each exam during a particular year. This isn’t really a sample unless the year
that was chosen at random, so this assumption may not have been met.
b.) The samples are independent since different tests.
c.) The assumptions for the binomial distribution are satisfied in both populations, since there are
only two responses (female = success, male = failure), there are a fixed number of trials, the
probability of a success is the same, and the trials are independent.
d.) Since the values:
x1 = 84,199
n1 - x1 = 144,796 - 84,199 = 60597
x2 = 102,598
n2 - x2 = 211,693 - 102,598 = 109095
are all greater than or equal to 5, we assume both sampling distributions of ?̂1 and ?̂2 can be
approximated with a normal distribution.
Page 2 of 32
iii.) Find the sample statistic and confidence interval
a.) First, we need to compute the sample proportion values:
?̂1 =
?1
?1
=
84199
144796
= 0.5815 ?̂2 =
?2
?2
=
102598
211693
= 0.4847
?̂1 = 1 − ?̂1 = 1 −
84199
144796
= 0.4185 ?̂2 = 1 − ?̂2 = 1 −
102598
211693
= 0.5153
b.) In the next step, we need to determine our critical z score and margin of error:
Remember, the z-critical value is the cutoff line which gives the % confidence level (in this case,
90%) in the middle of the graph. Graphically, this looks like:
If we refer to question 2e from homework 4, we should remember how to solve this problem.
Note that for confidence intervals, we always use the positive value (since we always want our
margin of error to be a positive value. Remember that the area of interest for this problem is 1 – α
= 1 – 0.9 = 0.1; however, there are 2 tails, so we divide this value by 2 to get the same area in
each tail: 0.1 / 2 = 0.05. Thus, we want the z-score that gives a value of 0.05 in each tail
To quickly get the z-critical value, we can put this into Excel:
“=NORM.S.INV (0.05)” = -1.645 (just remember to use the absolute value of this answer)
If you want to ensure you get the positive z-value to start, you can put the area to the left of 1 –
0.05 = 0.95
“=NORM.S.INV (0.95)” = 1.645
Now, we use this value in the Margin of Error equation:
? = ??√
?̂1?̂1
?1
+
?̂2?̂2
?2
= 1.645√
(0.5815) ∗ (0.4185)
144796 +
(0.4847) ∗ (0.5153)
211693 = ?. ????
c.) The last step is to put it all together to compute the confidence interval on p1 – p2:
(?̂1 − ?̂2
) − ? < (?1 − ?2
) < (?̂1 − ?̂2
) + ?
(0.5815 − 0.4847) − 0.0028 < (?1 − ?2
) < (0.5815 − 0.4847) + 0.0028
?. ???? < (?? − ??
) < ?. ????
iv). Statistical Interpretation: There is a 90% chance that ?. ???? < (?? − ??
) < ?. ???? contains
the true difference in proportions.
v.) Real World Interpretation: The proportion of female students taking the biology exam is
anywhere from 9.41% to 9.96% higher than the proportion of female students taking the
calculus AB exam.
Area = 0.90
-z = ? z = ?
Page 3 of 32
2. (3 points): Are there more children diagnosed with Autism Spectrum Disorder (ASD) in states that have
larger urban areas over states that are mostly rural? In the state of Pennsylvania, a fairly urban state, there
are 245 eight year-olds diagnosed with ADS out of 18,440 eight year-olds evaluated. In the state of Utah,
a fairly rural state, there are 45 eight year-olds diagnosed with ADS out of 2,123 eight year-olds
evaluated ("Autism and developmental," 2008).
Is there enough evidence to show that the proportion of children diagnosed with ADS in Pennsylvania is
more than the proportion in Utah? Why or why not? Test at the 1% level.
First, we should start off by writing down what we know (always a good place to start). If you write
down what you know and what you are trying to solve (purpose of analysis), you can usually determine
what method you need to use to solve the problem.
n1 = 18,440 n2 = 2,123
x1 = 245 x2 = 45
α = 0.01
Purpose of Analysis: For this problem, we are testing whether the proportion of children diagnosed with
ADS in Pennsylvania is more than that of Utah.
Thus, are completing a Hypothesis Test for Two Population Proportions. We can follow the example
on page 268-269 in the Kozak textbook. NOTE: Since these are proportions, we will use the standard
normal distribution.
i.) State the random variable and the parameters in words.
x1 = number of children diagnosed with Autism Spectrum Disorder (ASD) in Pennsylvania
x2 = number of children diagnosed with ASD in Utah
p1 = proportion of children diagnosed with ASD in Pennsylvania
p2 = proportion of children diagnosed with ASD in Utah
ii.) State the null and alternative hypotheses and the level of significance
?0: ?1 = ?2 or ?0: ?1 − ?2 = 0
??: ?1 > ?2 ??: ?1 − ?2 > 0
? = 0.01
iii.) State and check the assumptions for a hypothesis test
a.) A simple random sample of diagnosis of 18,440 eight year olds in Pennsylvania is taken. A
simple random sample of diagnosis of 2123 eight year olds in Utah is taken. Both samples were
taken for the same year. Thus, unless the year that was chosen was random, this assumption may
not have been met.
b.) The samples are independent since they are from different states.
c.) The assumptions for the binomial distribution are satisfied in both populations, since there are
only two responses, there are a fixed number of trials, the probability of a success is the same, and
the trials are independent.
d.) Since the values:
x1 = 245 n1 - x1 = 18440 -245 = 18195
x2 = 45 n2 - x2 = 2123 - 45 = 2078
are all greater than or equal to 5, we assume both sampling distributions of ?̂1 and ?̂2 can be
approximated with a normal distribution.
[Show More]