Midterm 1 Solutions
Summary of class conversation
In case you missed it, we discussed some of the difficulties of the midterm
in class today.
1) The SD question (Question #4) came out of the blue.
2) Many questions were worded unfamiliarly
3) The simulation question: it was hard to match the response to the different
questions.
4) The test emphasized definitions/concepts more than calculations and "doing".
I think these are all valid observations. One thing I want you to be
able to do is solve problems in a novel setting or circumstance. For
that reason, I always try to put a problem on the exam that is solveable
given what you've learned, but unfamiliar. Also, questions are worded
differently because, well, becaues I word them, and not the authors of the
book. Still, I try to use the same terminology. But I'll try
to give more quizzes so you'll see more questions asked differently.
Number (4) above is an interesting observation. To help you study,
here's how I think of this class. The most important thing you should
know is how to Interpret results. This means I place an emphasis on
your being able to explain graphs and statistics and analyses (we'll see
more of these later) in the context of the problem, and to be able to explain
consequences. Second most important are "Concepts/ Definitions". You
should know what words mean and understand what concepts they represent and
how those concepts are used. Question 4 on the MT is a good example
of testing about the concept of the standard deviation. The least important
are calculations. You are expected to be able to get the correct numbers
when called on, but you will get most of the credit on the problem for interpreting
and explaining then you will for the number. For that reason, I try
to de-emphasize purely number-oriented questions. (This is why putting
a "normal curve" question on the exam was less important to me than putting
on a question about interpreting 2-way tables (#1c) or a simulation -- which
has both calculation and interpretation.)
So I'm going to continue to put "new" problems on the exam, and I can't really
change the way I write questions. Our problem, then, is to see what
we (me and the TAs) can do to help you prepare for this. Below are
some general suggestions.
How to improve your score next time:
1. Read the book. I'm trying hard to use the same language in the tests
as in the book, and also the same structure. So for example, when so many
people wonder how to answer the separate questions on the simulation, I suspect
that part of the problem (at least for some people) is they haven't read
that section of the book.
2. Attend lecture. The lecture emphasizes some points, and de-emphasizes
others, and you can use this to focus your studies. For example, given that
we spent 2 days on simulations, you should expect them to play an important
role in the exam.
3. Pay attention to "vocabulary" words. Statistics, like many disciplines,
has lots of "jargon". Learning the subject means learning what its words
mean. The "Key Concepts" section of the book emphasizes important terms
and concepts, and you should know these cold. For example, questions
1b&c on the midterm were straight-forward vocabulary question.
We also were testing for vocabulary/terminology in the simulation problem.
Do you know what a response variable is? If so, then you should
be able to answer that question correctly.
4. Read the exam all the way through before answering questions. Some
people had to write a LOT more for the simulation problem because they didn't
read the whole question (on both pages) before beginning.
The most commonly missed questions on the exam were 1c and the simulation.
Question 4 was next, and Question 2 seemed to be answered pretty well.
Question 1c tests your ability to use two-way tables. If you missed
this, or got it right but don't understand it, you should study this some
more. This means ASK QUESTIONS. Both TAs have reported that
their office hours are not well attended, and I always have time in my office
hours.
The simulation was missed for a variety of reasons, but the most common was
people gave the right answer to the wrong question. Too many people
thought that the question was asking you to estimate the chances of getting
bit by a malarial mosquito. But you weren't asked to estimate this, you were
told that it was 10%. Your job was to estimate the number of bites
until the first malarial mosquito. Another common mistake was made
in the "State your conclusion" section. Some people suggested that,
since the outcome was variable, the chances of being bit by a malarial mosquito
was changing. In fact, the chance is always 10%, but the point of doing
simulations is to see that even if the chance stays the same, the actual
outcomes vary. This is what Statistics is all about. Variable
outcomes. Another common mistake in the analysis section was to make-up
your own, sometimes rather bizarre, analysis. The entire first week
of the class was spent on what to do to summarize data. Now you've
got data: 4 observations. Summarize them. Don't invent
something new -- stick to the tried and true.
Solutions
1. The following table consists of a random sample of 2,002 Los Angeles
residents who filled out their census forms in the 2000 census. The
variables are self-reported race and highest level of educational attainment.
The K-8th category, for example, includes people who's highest level of attainment
was some grade between kindgergarten and 8th grade (inclusive). Although
the racial categories were self-reported, I have combined several categories
together, and so these are not the names used in the census. The sample
is not representative of all LA, but of only a few census tracts in the Long
Beach area.
|
White
|
African Amer.
|
Asian/Pac Isld.
|
Two or more
|
Other
|
Total
|
None or preschool
|
54
|
5
|
16
|
11
|
54
|
140
|
K-8th grade
|
207
|
38
|
51
|
29
|
59
|
384
|
9-12
|
315
|
60
|
57
|
40
|
165
|
637
|
at least some college
|
501
|
79
|
143
|
44
|
74
|
841
|
Total
|
1077
|
182
|
267
|
124
|
352
|
2002
|
a) What type of variables are Educational Attainment and Race?
They are categorical variables.
b) Find the marginal distribution of educational attainment.
For full credit you need to include the values and their frequencies:
None 140/2002= .07
K-8 384/2002=.19
9-12 637/2002= .32
College 841/2002=.42
c)c(5) A sociologists wishes to compare the educational attainment of Whites
and Asians. Which should he compute: row percentages or column
percentages or cell percentages? Why?
Column percentages. The races have different numbers of members, and
if we're comparing educational attainment we don't want the fact that there
are simply more whites than asians to affect our conclusions.
2. Again drawn from the 2000 census for the LA area. Shown are the
reported personal incomes (dollars), excluding incomes of 0 or less.
a) (5) Briefly compare the distributions of men and women in one or two
sentences. The vertical axis shows relative frequency. The vertical
line in the histogram indicates the median value for each group.
(I can't get the picture to copy into this document, so you'll have to examine
your tests.)
Your answer should include references to center, spread, and shape of the
distributions.
The men have a slightly higher median income. Although both distributions
are right-skewed, the men have more outliers and, it appears, a greater range
of incomes than do the women.
b) YELLOW: (5) True or false and explain: a majority of people have
above average incomes.
False: The right-skewed distribution means that the average income is greater
than the median. This means that less than half of the people are above the
average. You get 2 points for getting the "false", 3 points for the
explanation.
b) BLUE: True or false and explain: a majority of people have less
than average incomes.
True: the right-skewed distribution means that the median is less
than the average. So more than half of the people are below the average.
You get 2 points for "true" and 3 for the explanation.
c)(5) A student claims that we should expect about 68% of all people
in this sample to have incomes within one standard deviation of the mean.
Do you agree? Explain why or why not.
No. This would be the case if the distribution were approximately
symmetric. But this is very skewed.
3. Malaria is a serious illness transmitted by mosquitos that can
cause death. Suppose in a certain part of the world that 10% of all
mosquitos carry malaria, and that if a malarial mosquito bites you, you will
get malaria. About how many mosquitos would have to bite you before
you get malaria? Design a simulation:
a)(2) Identify the component to be repeated.
The component to be repeated are the mosquito bites.
b) (2) Using the random numbers provided below, explain how you would model
the outcome.
Let a 0 represent malaria, and 1-9 represent no malaria. Although any similar
coding will work just as well.
c) (2) Explain how you will simulate the trial.
Select a random number to represent a mosquito bite. Continue until
the first 0. This constitutes 1 trial.
d) (2) State the response variable.
The number of random numbers selected until (and including) the first 0.
YELLOW
e) (1) Using the random numbers provided, run 4 trials. Start by using
the first random number provided. Write the current value of the response
variable below each random digit.
36254 44136 69138 65665 0***first trial == 21 bites ***6335 490***2nd trial
= 7 bites***97 81683 81153 30***3rd trial equals 14 bites ***667 63846
25339 45818 98380***4th trial = 23 bites*** 61014 88448 26114 20167 69682
84572 64490
f) (5) Analyze the response variable.
Our outcomes were 7,14, 21, 23. The average is 16.25, the median is
17.5
g) (5) State your conclusions.
It takes about 16 to 17 bites, on average.
BLUE
e) (1) Using the random numbers provided, run 4 trials. Start by using
the first random number provided. Write the current value of the response
variable below each random digit.
0 *****first trial = 1 bite ***3674 31651 72812 66486 16663 63846 69287
34278 31735 5980***2nd trial 49 bites ***6
84771 0***3rd trial = 7 bites **4691 67681 74357 21924 22359 76580***4th
trial = 29 bites*** 63157 68254 85323
f) (5) Analyze the response variable.
Outcomes were 1,7,29,49
The average is 21.5, the median is 18.
g) (5) State your conclusions.
It takes about 18 bites, although there's quite a bit of variability.
YELLOW (BLUE is same question, but different order)
4. Choose 4 digits from among 0,1,2,3,4,5,6,7,8,9, repeats allowed
so that
a) (2) the standard deviation is as large as possible
0,0, 9, 9
b) (2) the standard deviation is as small as possible
2,2,2,2 or any set of four repeats
c) (2) Is your answer to (a) unique? Yes or no; no explanation needed.
Yes.
(Any other combination will result in a smaller SD. This is the only
one that gives the largest.)
d) (2) Is your answer to (b) unique? Yes or no; no explanation needed.
No.
(Any set of 4 repetitions will provide the same SD of 0 -- and this is the
smallest possible SD.)