Student question:

> I know it's a little late, but if you have time can you please explain to

> me how the SE can be equal to both sqrt(N) *S.D. and sqrt(pq)/sqrt(n)

> *100%. Is'nt pq the SD also? Are N and n the same thing? I'm really

> confused.

>

Professor response:

the first is the SE of the sum of the count, the second is the SE of the percentage. The standard deviation of a two outcome situation is equal to the p*q, as you see. N and n are the same thing. But there are several SE

 

Student question:

> Professor Cochran,

>

> I was watching a program on Dateline last night and they conducted a poll

> on people's opinion concerning the issue of DNA cloning. The top of the

> screen showed a +/- factor of 4.5. Is this value the standard error, or is

> it the standard deviation? what exactly is the difference between the

> two?!?

> I'm really frustrated with my progress in class. When I'm in lecture I feel

> like I have a full understanding of what is going on. But when I try to

> apply the theories in real life or when I'm using them to solve questions

> on our homework assignments, all my knowledge seems to go out the window.

>

> All the theories seem so abstract to me. I know that sounds strange because

> statistics deals with tangible things like data, percentages, and numbers.

> But i can't seem to make the connection between theory and application.

> Watching the program last night made me realize just how lost I am in this

> class. I can't even tell the difference between the SE and the SD!!!!!!

>

> Please help.

>

> a drowning student desperately trying to stay afloat,

>

Professor response:

Ms. ,

I'm glad you asked. Actually you are closer to knowing the answer than

you think. You _know_ there is a difference between SD and SE (and you

may not have even known that SD and SE existed a couple of months ago).

SD is an average of the variation in our observations. SE is an estimate of the average variation in the mean of our observations. The answer to your question is that it is SE--but here is the reasoning that you can use. When the information they are trying to convey to you involves the mean (and the percent of people who feel a certain way can be thought of as an expected value or mean--like I toss a coin 50 times and expect to see 25 heads or 50% heads-its the mean of my expected distribution) and the concern is with how accurate that estimate is, it will be bracketed with an SE. If instead they are trying to communicate to you the average range of values (like 68% of Americans have 14 years of education + or - 3.4 years) then it will be an SD--or an estimate of the expected average variation in the population.

With polls, as a rule of thumb, it is always an SE, or should be.

Sometimes though the analyst, like you, is unsure and will forward to

their boss the wrong number. The boss, which might be you one day, needs to know which is which or risk _severe_ (:)) embarassment. So another way to know for sure that it is SE and not SD is to quickly calculate a rough estimate of the SD (which is the SD for the box--the squareroot of the percent of yeses*percent of no's)--if the number is much smaller then it is an SE for the percentage (because the SE is the SD divided by the squareroot of the sample size). But if you do the math with the Dateline show, your calculations wont give you the 4.5. The reason is that polls are generally cluster samples and the estimate of the variance is a little more complicated than that. The rule of thumb is that the 4.5 should be a litte bigger than what you calculate, but still much much smaller than the SD.

Dr. C.

 

Student question:

> Professor Cochran,

>

> I have a question as to how to figure out the average for a box model

> with large #'s. For example if you have a box that looks something like

> this 30,000 (1) 12,000 (0). How do you find the average so that then you

> could calculate the expected value of # of draws x box average. I

> understand how to do the steps, but I do not understand the math on

> getting an average for sample surveys.

>

> A Student

>

>

Professor response:

Ms. ,

The size of the numbers don't matter (they just make things harder to

calculate so we don't tend to use them in examples) The average of the

box above would be (30000*1) + (12000*0) divided by the number of elements

in the box or 42000. Or another way to calculate is 30000/42000. Either

way it's .71. The math looks about right conceptually, right?, cause

(lopping off 3 zeros) we've got about 30 1's and 12 0's and that should

average to around 3/4 of the value of the 1 (because 30 is about 3/4 of

42).

If we wanted to know what a random sample (with replacement) of 100 would

sum to out of this box it's 100*average or 100*.71 = about 71 1's being

drawn out of the box (precisely 71.42).

There are two numbers to think about--the numbers in the box (which is the

size of your population, as in 'in a population of 50,000 high school

seniors'). and then the other number is the size of the sample (or number

of draws from the box) as in "a simple random sample of 300 high school

seniors'.

Putting it all together (if I can make it look right in an email) the

diagram goes like this:

A reseacher polled 300 hs seniors drawn randomly from the 50,000 who g...

# of draws from a box containing

She found that 60% supported the rights of students to choose their own..

the average of the sample, used to create our expectation of

the structure of the box

 

So the box, we predict should contain 30,000 1's and 20,000 0's

(we can make this prediction because of the law of large numbers which

states that with repeated samples from a box the average we observe will

move closer and closer to the average of the box--and up past around 100

draws we will be so close we can expect them to be the same if everything

is done fairly (no bias, no changing probabilities in the box, no hanky

panky in our sampling, etc))

But there is still some chance that we are not exactly on the mark of what

the real box looks like. We've just created an expected box. So we can

take this a step further and estimate a range of values for what the real

box might be.

We can do this by invoking the central limit theorem. That says that if

we take repeated samples of the same size from a box, the means of all

those samples will be distributed normally with a mean equal to the mean

of the box and a standard error (se) that we can link to the normal

distribution.

Well, we only took 1 sample of 300 students. But that is an element in

the sampling distribution of the means (the repeated samplings from the

box I just described). Because it is an element of this distribution,

it's as good as any for the average of the distribution. (Just like if

you have money in your pocket, and I pull one coin or bill out of your

pocket, I can use that to estimate the average of your pocket without

knowing anything else--if it's a quarter I'll estimate 25 cents--if it's a

$5,000 bill (do they exist???) I would estimate $5,000--it's better than

guessing off the top of my head). So we set .60 as the estimated average

of the sampling distribution of the means (which by the theorem would also

be the average of the box).

Then we have to calculate the se of the sampling distribution. We do this

in three steps (according to the book):

1> estimate the sd of the box: (1 - 0) * square root

[(30000/50000)*(20000/50000)] = .49

2> estimate what the se of the count would be out of this box with 300

draws: square root (300) * sd of box = 17.32*.49= 8.49

3> estimate the se for the percent: se for count/number of draws * 100% =

8.49/300 * 100% = 2.8%

Now we have an estimate of the average of the sampling distribution of the

means (.6) and an estimate of the average spread of this distribution (se=

.028). And this distribution is normally distributed. So if we go out 2

SE in either direction we will include 95% of all possible values in this

distribution.

So we can say with 95% confidence that the percent of high school seniors

(in the box) who support the rights of students... is 60% plus or minus

5.6%.

Now, we never saw the box. We only drew 300 times out of it. We used the

mean we observed and our estimate of how much we think that result might

vary due to chance to place a bet that the box really has somewhere

between 54.4% and 65.6% in favor of the rights of students. We can be

wrong in our bet--we WILL be wrong 5% of the time. The actual, real % in

the box is either in that interval or it isn't. The population in the box

does not change--what changes is the values we draw from it (just like if

I drew a coin or bill from you pocket and made a guess--the amount in your

pocket doesn't change--what does change is the outcome of my draw).

This is pretty powerful stuff. We went from observing values in 300

people and by two theorems linked that to making a prediction about a

population (box) we can never see. It may not seem like a big thing,

because intuitively all of us already do that (we meet 3 people on a floor

in the dorm and rather quickly assume that everyone on that dorm floor is

the same sort of person). But here is the mathematical basis for why we

can do that. And also a caution to us to temper our sweeping

generalizations about things around us--sometimes what we see is not

exactly what the whole population is like, and there is spread or

diversity within populations, and within our samples of populations.

But you can see, the size of the numbers don't change the formulas. And

the key is to figure out what is an element of the sample, what is an

element of the population, and how to construct the box.

Good luck in your studying.

Dr. C.

 

Student question:

> Prof. Cochran,

> I have a question about exercise Set B in Chapter 20 #2c. The question

> asks to find how many Democrats are between 39% and 41% of the registered

> voters. I know that 40% are Democrats with a SE of about 1.5%. So then

> wouldn't they be 1SE away and have a chance of 68%? Please help. The

> correct answer is 48%?

>

>

Professor response:

Ms. ,

40% + 1.5% = 41.5% right? 41% -40% = 1% right? ok...the SE is 1.5% but

the question asks what happens if we go out 1% (not 1 SE). so if we go to

the left of the mean (which is 40%) 1 percent, that is about 2/3 of a SE

(2/3's of 1.5). The question is asking what is the chance that the sample

will show a mean that is within 2/3 of an SE of the population mean.

Well, looking at the normal table about 48% of the time when we draw a

simple random sample of 1000 from a population where 40% have a certain

characteristic we will observe a value within 2/3's of an se (between 39%

and 41%).

The trick here where you are getting confused, I think, is that there are

several uses of the word 'percent'. There is the percent in the sample,

the se of the percent, the percent to the left and right you are

concerned with, and then the percent in the normal distribution. Walk

through each of them and be sure that you differentiate the concepts (how

do you do this?--ask yourself questions about each one of them and answer

those questions, preferably aloud as if you were explaining it to another

person). Of course if you have a roommate...make sure they've gone out,

or they'll think you're nuts!

Good luck with your studying,

Dr. C.