## STAT 13

(Sec. 1a-1c)

Introduction to Statistical Methods for the Life and Health Sciences

## Instructor: Ivo Dinov, Asst. Prof.

Departments of Statistics & Neurology

Due Date:

# Friday, Mar. 14, 2003, turn in after lecture

See the HW submission rules. On the front page include the following header.

• (HW_7_1) In 1996 the New Zealand Consumers' Institute conducted a survey on home computer use. 7,400 subscribers to Consumer Magazine were randomly selected and sent a survey form. Of those surveyed 2,730 had a computer for personal use at home. The respondents who had a home computer were given a list of computer activities and were asked to indicate all of those that they engaged in. They were also asked to indicate the number of hours per week that they used their computer. The Consumers' Institute used the results to draw conclusions about subscribers who own a home computer. The results of the survey are given in the two tables below:
 Computer Activities Number Computer Use (Hours Per Week) Number Word-processing 2621 Not Used 27 Games 1502 Used For Less Than 2 Hours 328 Spreadsheets 819 Over 2 and Up To 7 hours (incl) 764 Accounting 655 Over 7 and Up To 14 hours (incl) 710 Databases 437 Over 14 and Up To 21 hours (incl) 546 Internet 328 Over 21 and Up To 28 hours (incl) 109 Drawing 300 Over 28 and Up To 35 hours (incl) 136 Desktop Publishing 246 Over 35 and Up To 44 hours (incl) 55 Fax/Answering Machine 82 Over 44 hours 55 Total ??? Total 2,730
• Is this survey representative of the home computer use of New Zealanders? Briefly justify your answer.
• What proportion of the respondents use their computers for drawing?
• What percentage of the respondents use their computers between 21 and 28 hours (inclusive) per week?
• Is there any evidence to suggest from the survey that there is a significant difference between the proportion of respondents who use their computer for drawing and the proportion of respondents who use their computer for desktop publishing. Carry out a Z-test to investigate this. Calculate a 95% confidence interval for the difference in these proportions. Interpret your results.

• (HW_7_2) Tuberculosis (TB) is known to be a highly contagious disease. In 1995 a study was carried out on a random sample of 1074 Spanish prisoners. The study investigated factors that might be associated with the tuberculosis infection. The results follow.
•  Variable PrisonersWithTuberculosis Total Number Of Prisoners Gender Male 556 984 Female 36 90 Race White 496 886 Gypsy 74 152 Other 22 36 IntravenousDrugUsers Yes 361 629 No 231 445 HIV_Positive Yes 186 294 No 406 780 Re-imprisonment Yes 272 456 No 320 618

Is there any evidence to suggest that the race of the prisoner (White or Gypsy) makes any difference to whether they contracted tuberculosis? Carry out a significance test to answer this question and then calculate an appropriate 95% confidence interval.

Let pW be the proportion of White prisoners infected with TB and pG be the proportion of Gypsy prisoners infected with TB.

• Identify the parameter t (theta).
• State the hypotheses.
• Write down the estimate and its value.
• Find the P-value.
• Interpret the P-value.
• Calculate a 95% confidence interval for the parameter.
• Interpret the 95% confidence interval.

• (HW_7_3) The manager of an importing company purchased a new machine for packaging rice. The specifications of the machine claim that the amount of rice put in each package will be Normally distributed with an average amount of rice as specified and a standard deviation of 2.7 grams. The machine is set to fill packets with 506 grams of rice. The manager requires the machine to produce packets containing rice weighing within the range 500-
512 grams for at least 95% of the packets.
• Assuming that the specifications for the machine are accurate, calculate the probability that a packet of rice contains the desired amount of rice. Will the managers requirements be met?
• Let X be the mean weight from a sample of 25 packets of rice. Assume that the specifications for the machine are accurate.
• Give the distribution and parameters for X .
• Calculate the interval within which the central 95% of values of X should fall.
• Let Pˆ be the proportion of packets of rice outside the 500 – 512 gram weight range from a sample of 800 packets of rice. Assume that the specifications for the machine are accurate.
• Give the distribution and parameters for Pˆ.
• Was the central limit theorem needed to answer the question above?
• Calculate the interval within which the central 95% of values of Pˆ should fall.
• A sample of 25 packets of rice was filled and weighed. Assume the sample size = 25, sample mean = 504.5 grams, sample standard deviation = 3.93 grams. The resulting weights were as follows:
 502.8 507.6 515 499.2 511.9 503.4 506.3 505.5 502.6 501.9 500.8 502.9 503.9 510.3 502.4 502.9 505.4 510.4 501.6 501.4 498.3 504 504.4 504.5 503.9
• Create a stem-and-leaf plot of the data (either by hand or by computer).
• Does the sample of weights appear to have an exactly Normal distribution? Briefly justify your answer.
• Are there any features of the stem-and-leaf plot that suggest major departures from the Normal distribution? Briefly justify your answer.
• Based on the above answers, is it likely that the sample data has come from a Normal distribution?
• Assuming that the specifications of the machine are correct, would it be unusual to get a sample of 25 packets of rice with a mean weight of 504.5 grams Does getting a sample of 25 packets of rice with a mean weight of 504.5 grams cast any doubts on the specifications of the machine? [Hint: refer to part (b).]
• A sample of 800 packets of rice are produced. Of these, 26 were found to be outside the 500 – 512 gram weight range.
Assuming that the specifications of the machine are correct, would it be unusual to get a sample of 800 packets of rice with 26 of the packets falling outside the 500 – 512 gram weight range. Does getting a sample of 800 packets of rice with 26 of the packets falling outside the 500 – 512 gram weight range cast any doubts on the specifications of the machine? [Hint: refer to part (c).]
• Write a brief report (a couple of paragraphs) discussing whether or not the given specifications of the machine appear to correct.

• (HW_7_4) The U.S. Bureau of the Census recently published statistics on educational attainment of the non-institutional population of the United States, based on the March 1998 Current Population Survey. 172,214 people were surveyed and classified by age group and highest educational qualification attained. The following table summarises the results of the survey.
Age
 Level of Education 25–34 35–44 45–54 55–64 Over-64 Total Did not complete high school 4,754 5,326 4,341 4,558 10,580 29,559 Completed high school 12,569 15,136 10,943 8,311 11,215 58,174 Attended university for between 1 and 3 years 19,587 20,450 14,921 7,379 8,478 70,815 Attended university for 4 or more years 2,444 3,548 3,854 2,007 1,813 13,666 Total 39,354 44,460 34,059 22,255 32,086 172,214
• Which age category accounted for the:
• lowest number of Americans who did not complete high school?
• the highest percentage of Americans who attended university for at least one year. Show your work.
• What percentage of Americans in this survey:
• did not attend university?
• who only completed high school or were aged between 35 and 44 years?
• Given that a randomly chosen American in this survey had only completed high school, what is the probability that he or she is at most 54 years of age?
• Among Americans in this survey aged between 25 and 34 years what proportion did not attend university?
• Is the proportion of people ages 25-34 who attended university for 4 or more years statistically different form the proportion of people over-64 that did not complete high-school?