UCLA Statistics 10 Practice Final Page 1 Prof. Lew

A recent report from a private research organization pointed out that at current rates of human consumption, the earth's drinkable water supply will be totally consumed by the year 2025. This report went on to note that "The rich get richer, the poor have children" and then blamed impoverished nations for the depletion of natural resources. Here is some information from the 1997 CIA World Factbook for 10 counties.

Country

Total Fertility Rate

(i.e. Average number of Children born per woman)

Per capita Gross Domestic Product (GDP) in dollars

(a measure of wealth)

Female Literacy Rate (i.e. women over age 15 who can read and write)

Afghanistan

6.07

$800

15%

Cambodia

5.81

$710

22%

Costa Rica

2.85

$5,500

95%

Indonesia

2.66

$3,770

78%

Italy

1.16

$19,600

96%

Jordan

4.94

$5,000

49%

Nigeria

6.17

$1,380

47%

Russia

1.35

$5,200

97%

United Arab Emirates

3.62

$23,800

80%

United States

2.06

$28,600

97%

AVERAGE

3.669

9436

67.6%

STAND. DEV.

1.8486

9888

30.27%

1. What is the correlation between gross domestic product and fertility?

The average of the products of GDP and Fertility = 24272.69

The correlation is:

24272.69 - (9436 * 3.669)

----------------------------------- = -.5661 or about -.57

9888 * 1.8486

2. What is the regression equation for predicting fertility from per capita Gross Domestic Product?

slope = -.5661 * (1.8486)

---------------------- = -.0001058

9888

intercept = 3.669 - (-.0001058*9436) = 4.667

fertility = -.0001058(GDP) + 4.677

 

3. Cuba's GDP is $1,480. What is the predicted fertility rate for Cuba?

 

4.511 = -.0001058(1480) + 4.667

 

4. Suppose I told you that Cuba's fertility rate is actually 1.54. Is the result you got in (3) different from 1.54? If it is different, give us some reasons why your predicted result might different from 1.54. If there is no difference, give us some reasons why your prediction is exactly on target.

The line is just the best fitting line, it's not expected to go through the points exactly. It is however, expected to go through the mean value of the Y variable for each level of X. What is implied is that there is a distribution of Y for each X and Cuba happens to be on the low end of the distribution.

Another thing to consider is the literacy information that was given, it actually has a higher correlation with fertility than GDP. In other words, there may be an unmeasured variable out there that better explains fertility.

 

Your company advertises that it ships 90% of its orders on time, that is, within 5 working days. The average shipping time of all orders is 3.1 days with a standard deviation of 0.4 days. You select a simple random sample (SRS) of 21 of the 10,000 orders received in the past week for an audit. The audit reveals that 18 of the 21 orders were shipped within 5 working days.

5. What is the sample percentage of orders shipped on time and what is the standard error for the percentage of orders shipped on time?

sample percentage = 18/21 * 100 = 85.71%

standard error =

 SQRT(21)  * SQRT(.90 * .10)
---------------------------- * 100 = 6.5465% 21

 

6. A lawyer approaches you and says "Aha! You claim 90% but in your own sample the percentage is lower than that. So your 90% claim is wrong." Does the lawyer have enough evidence to sue you for false advertising? Perform a test and use a 5% level of significance as your rule. Explain why the results of your test refute or do not refute your 90% claim.

85.71% - 90%

------------------ = -.6553 = Z about -.70

6.5465 %

The probability value (p-value) associated with a Z = -.70 is 24% which suggests that a sample percentage of 85.71% or lower has a 24% chance of occurring which is fairly frequent. There is not enough evidence to reject your claim of 90%.

 

 

 

 

7. There are 20,000 restaurants in the County of Los Angeles. A sample of 200 restaurants is drawn at random. The average monthly sales per restaurant are $12,500 and the standard deviation is $5,600.

Pick one of the choices below and fill in the blanks

(i) A 90% confidence interval for the average monthly sales in the sample is

_______ to _________.

(ii) A 90% confidence interval for the average monthly sales in the county is

$11,847 to $13,153 . CORRECT

(iii) 90% of the restaurants in the county have average monthly sales between

_______ to _________.

 

A black box has a slot on top that is just wide enough to insert your hand. The box has six tickets in it. The tickets are labeled "1, 2, 3, 4, 5, 6". 10 draws will be made at random with replacement from this box.

8. What is the expected value for the average of the 10 draws and what is the standard error?

The expected value for the average is the box average which is the average

of the 6 numbers or 3.5.

The standard deviation of the box is SQRT( ((12 + 22 + 32 + 42 + 52 + 62)/6) - (3.5)2)

and equals 1.7078

The standard error is

SQRT(10)  *  1.7078
---------------------- =  .5401
      10

9.After 10 draws, your average is 3.9. Your best friend comes by and draws ten at random with replacement too. Your friend's average is 5.7. What is the chance that the average will be between 3.9 and 5.7?

Z scores. 
 3.9 -3.5                                    5.7 - 3.5
----------  = .74 or about .75               ----------- = 4.07 or about 4.10
  .5401                                              .5401

The associated table areas are 54.67 and 99.9959, this translates into a chance of about (99.9959 - 54.67) /2 or 22.66%

10. You noticed that when your friend drew from the box, s/he seemed to draw a lot of tickets labeled "6". What is the chance of having an average of 5.7 from 10 draws from this box? Is there evidence to suggest that your friend might have "X-ray vision (like Superman)" or could it just be luck? State the null hypothesis and alternative hypothesis, perform a test, state the p-value and give us your conclusions.

H0: average = 3.5

Ha: average > 3.5

appropriate test is Z.

5.7 - 3.5

---------- = 4.07 or about 4.10

.5401

The chance of getting a Z score of 4.10 or more (or an average of 5.7 or more) results in a p-value of about .0021% or like 2 times in 100000. Your friend couldn't be this lucky so I would reject the null in favor of the hypothesis. Your friend isn't

normal.

 

11. What is the chance that on the first four draws you will pick the ticket labeled "6" each time?

 

(1/6)x(1/6)x(1/6)x(1/6)

 

Twelve jurors were selected at random from a large pool of prospective jurors. The twelve were shown a crime scene and then asked to give it a brutality rating where -5 = no brutality whatsoever; +5= the most brutal crime ever; and a rating of 0 = about average in brutality.

The twelve scores were:

5, -3, 1.5, -1.5, 2, 0, 3, 4, 1, -3.5, 2, -5

12. Calculate the median and standard deviation of this list.

The sorted list is -5, -3.5, -3, -1.5, 0, 1, 1.5, 2, 2, 3, 4, 5 the middle of this list is1.25.

The standard deviation is 2.9893

13. Suppose a criminologist came in and transformed the scores by adding 5 to each of them and then multiplying each one by 9. What is the inter-quartile range of this list now? What is the standard deviation now?

Original list: -5, -3.5, -3, -1.5, 0, 1, 1.5, 2, 2, 3, 4, 5

Transformed list: 0, 13.5, 18, 31.5, 45, 54, 58.5, 63, 63, 72, 81, 90

IQR original= 4.75,

New IQR = 42.75 (9 times the original)

New SD = 26.9035 (9 times the original)

 

 

14. Test the hypothesis that the larger pool of jurors would judge that this crime is about average in brutality using the information from this sample of 12 jurors. State the null and alternative hypotheses, perform a test, state the resulting p-value and give us your conclusions. USE THE ORIGINAL UNTRANSFORMED LIST.

 

H0: average = 0

Ha: average > 0 (seems like our sample thinks the crime is brutal)

use a t-test, the population SD is unknown and your sample size < 26.

Will need to use the sample SD of 2.9893 and change it to SD+ or

 2.9893*(SQRT(12/11 ))   =  3.1222

The SE of the average is

SQRT( 12)  *  3.1222
---------------------- =  .9013
      12

The t-test then is:

.4583 - 0

------------- = .5085 with 11 degrees of freedom. If you look this up,

.9013

it's greater than 25% so you would NOT reject the hypothesis that the population of jurors would find the crime to be about average in brutality. In other words, while the sample average might be greater than zero (and suggests that jurors think this crime is above average in brutality) it can't support the argument that all jurors think it is so. The fact that this result from a sample of 12 differs from zero is due to chance error.

The Public Health Service studied the effects of wine drinking on cholesterol in a large sample of representative households in the United States and in France. For men and for women in each age group in both counties, those who had drank moderate amounts of wine had lower cholesterol levels than those who drank no wine. But in the U.S. those who drank no wine had lower cholesterol levels than those who drank large amounts of wine. In France, those who drank no wine had higher cholesterol levels than those who drank large amounts of wine.

 

15. Why did they study men and women and the different age groups separately?

The researchers are interested in the relationship between wine consumption and cholesterol levels. Age and gender affect both wine consumption and cholesterol. Wine consumption is different for different age groups and men tend to drink more than women. Cholesterol levels are different for different age groups and for men and women. The researchers studied these groups separately to control for these confounding factors.

[NOTE: identifying that confounding is the problem is key to answering this question correctly]

 

16. The lessons one learns from this study seems to be, if you drink lots of wine and are concerned about your cholesterol levels, you should live in France (if you do not already) and if you do not drink wine (and are concerned about your cholesterol levels), you should live in the United States. Is this correct? Explain. Be brief.

This is not correct. This is an observational study and we cannot conclude that wine drinkers who move to or live in France can reduce their cholesterol nor can we conclude that people who don't drink wine should move to or live in the US if they want to have low cholesterol. There could be unobserved factors (such as a stressful lifestyle) that could explain why wine in large amounts doesn't seem to help Americans like it seems to help the French.

[NOTE: identifying that this is an observational study, being able to discuss the problems associated with them and then providing an illustration to communicate your point is key here.]