UCLA Statistics 10 Practice Final Prof. Lew

The time between eruptions of "Old Faithful" geyser in Yellowstone National Park is random, but is related to the duration of the last eruption. The table below shows the times for a random sample of 10 eruptions.

Eruption

Duration of the last Eruption in Minutes

Time to the Next Eruption in Minutes

Air temperature in degrees

1

2

50

70

2

1.8

57

85

3

3.7

55

75

4

2.2

47

77

5

2.1

53

70

6

2.4

50

43

7

2.6

62

48

8

2.8

57

70

9

3.3

72

79

10

3.5

62

63

Average

2.64

56.5

68

Standard Deviation

0.63118935

7.00357052

12.6570139

1. What is the correlation between duration and the time to the next eruption?

.5565 or about .56

 

2. What is the regression equation for predicting time to the next eruption from the duration of an eruption?

time= 6.1747*(duration) + 40.1988

Interpretation, the slope, for each additional minute of duration you will need to wait 6.1747 minutes for the next eruption. It is the rate of change. For the intercept, a bit nonsensical here...if duration were zero, the time to the next eruption would be 40.1988 minutes.

 

3. You just missed an eruption. The sign at the geyser says "Next Eruption in 50 minutes." Can you tell me what the duration of the previous eruption was in minutes?

Explain why you can or why you cannot do this.

No. The regression equation was estimated for predicting time from duration, not the reverse. You would need a different equation to predict duration.

 

4. Suppose the park ranger tells you the eruption you just missed lasted 3 minutes. How long must you wait around to see the next eruption?

58.72 or about 59 minutes.

 

5. Suppose the hourly wage for American workers is normally distributed with an average of $12.98 with a standard deviation of $5.21.

 

A. What percentage of American workers earn more than $20 per hour?

Should get a Z = 1.347 close enough to call it 1.35, the resulting percentage to answer the question should be about 8.85% or 8.9%

 

B. A simple random sample of 121 American workers is drawn from the population. What is the chance that the sample average will fall between $12 per hour and $13.50 per hour?

The Standard Error for the Average here is

sqrt(121) x 5.21

-------------------

121

 

the resulting Z score is 1.097 (close enough to 1.10) for the 13.50 and -2.07 (or about -2.05) for the 12.

The area associated with -2.05 is 95.96 The area associated with 1.10 is 72.87 Sum them and divide by 2 to get the final area 84.415

 

  1. Suppose the sample average of the 121 workers is actually $13.25 with a standard deviation of $6.88. Please test the hypothesis that salaries are increasing over time. State a null hypothesis, an alternative, perform a test, state a p-value, and use a 1% level of significance to make a decision. Please state your conclusion in simple English.

 

Null: average is $12.98

Alternative: average > $12.98

Test is Z = (13.25 - 12.98) divided by the standard error which is (squareroot(121) * 5.21) / 121 , you will get a Z = .57 or about .55.

P-value is (100 - 41.77)/2 = 29.115%

At the 1% level of significance, we would not reject the null because 29.115% > 1%

Conclude: there is no evidence to suggest that salaries have been increasing over time.

6. There seems to be a "gender gap" in political party preference in the United States, with women more likely than men to prefer Democratic candidates. A psychologist selects a large random sample of registered voters, both men and women. She asks every voter whether they voted for the Democratic or Republican candidate in the last election. Is this an observational study or an experiment? Why? Suppose she was going to create a scatter diagram using her two variables. Which variable is the independent variable and which one is the dependent variable? Would it be possible to calculate a meaningful correlation for these two variables? Explain why or why not.

Observational study. Subjects determining treatment (gender...), not the researcher.

Independent variable is Gender, dependent variable is vote Democratic or Republican.

Given what you have learned in Statistics 10, it is NOT possible to calculate a meaningful correlation. If you have advanced statistics knowledge from another class show me how you would do it (it is do-able). Correlation is most appropriate for numeric variables and linear relationships, arguably this one is neither.

 

Credit card companies have been harshly criticized for issuing credit cards to college students who then use the cards and wind up with credit problems even before they are old enough to drink (legally...). 9 undergraduates, selected at random, were asked about their current financial situation. Negative amounts are amounts owed to credit card companies, positive amounts are bank account balances. A zero or larger amount would indicate that the student has no credit card debt problem. Assume financial state/situation is normally distributed.

The financial states of the 9 are:

-3500, -500, -3999, -1800, 17000, -200, -2750, -3750, -1000

 

7. Calculate the mean and median of this list.

The mean is -55.44, the median is -1800

 

8. Credit card companies do admit that some students are poor credit risks and should not have credit cards, but that most students are responsible and do not have debt problems related to credit cards. Using the information from this sample of 9 students, test the credit card companies' claim. State the null and alternative hypotheses, perform a test, use a 5% level of significance as your rule, state the resulting p-value and give us your conclusions.

Null is 0

Alternative is < 0

-55.44 - 0

------------------------------ = Z = -.03

(SQRT(9) x 6176.74) / 9

this is not significant. There is no evidence to suggest that students have problems with

credit cards. The p-value is 50%.

 

9. A poll on women's issues interviewed 1,025 women and 472 men randomly selected from the United States. The poll found that 47% of the women said they do not get enough time for themselves.

(a) Construct a 90% confidence interval for the percentage of women who say they do not get enough time for themselves.

The confidence interval is 47% + or - (1.60 * 1.5589%) or something like 44.51% to 49.49%, I would also accept 47% + or - (1.65 * 1.5589) because 1.645 is what is usually used for a 90%.

(b) Your friend is taking Statistic 10 next quarter (don't worry, I'm not teaching it again this academic year...). Explain to your friend why we can't just say that 47% of all adult women in the U.S. do not get enough time for themselves.

You might simply say that 47% is a sample percentage but what you would really like is the population percentage, but you don't have or know it. As long as we're only working with samples, we're going to be off of the true percentage by chance error.

We do have tools, like the confidence interval, which allow us to make statements like "I am 90% confident that the true percentage is covered by the interval from 44.51% to 49.49%". This means that if we could repeat this sampling procedure 100 times, 90 of the intervals I give to you would capture the "truth" and 10 would not. My hope this is one of those 90 times.

 

Your company advertises that it ships 90% of its orders on time, that is, within 5 working days. The average shipping time of all orders is 3.1 days with a standard deviation of 0.4 days. You select a simple random sample (SRS) of 21 of the 10,000 orders received in the past week for an audit. The audit reveals that 18 of the 21 orders were shipped within 5 working days.

 

10. What is the sample percentage of orders shipped on time and what is the standard error for the percentage of orders shipped on time?

sample percentage = 18/21 * 100 = 85.71%

standard error =

 SQRT(21)  * SQRT(.90 * .10)
 ----------------------------  * 100 = 6.5465%
             21

 

11. A lawyer approaches you and says "Aha! You claim 90% but in your own sample the percentage is lower than that. So your 90% claim is wrong." Does the lawyer have enough evidence to sue you for false advertising? Perform a test and use a 5% level of significance as your rule. Explain why the results of your test refute or do not refute your 90% claim.

85.71% - 90%

------------------ = -.6553 = Z about -.65

6.5465 %

The probability value (p-value) associated with a Z = -.65 is 25.785% (i.e. 100 - 48.43 divided by 2)which suggests that a sample percentage of 85.71% or lower has a 26% chance of occurring which is greater than the 5% and fairly frequent. There is not enough evidence to reject your claim of 90%. Tell the lawyer to go away.

 

12.

  1. It widens
  2. They might not like the width of the 99% confidence interval. They might not need to be that confident (they are willing to live with 90%). They might have a large sample size and that might reduce the need for such high levels of confidence.
  3. Sure. It would approximately be 5.3 divided by 2.
  4. Depends on the parameter. If I'm looking at a sum, the standard error of the sum is smaller if n is smaller so choose 36. If I'm looking at either the average or a percentage, the standard error of the average and the standard error of the percentage are smaller if n is larger so choose 49. I asked you to think about it terms of the confidence interval because for any given level of confidence, the standard error is smaller for small n if you are estimating a sum. For any given level of confidence, the standard error is smaller for larger n if you are trying to estimate an average or a percentage.