E-mails Prior to Final

(Real questions asked by other students just like you

--look here for answers to common questions.)

Student's Question Professor's Response
Hello Dr. Cochran,

I would like to know the difference between the correction factor and the SD+? When do we use the Standard error for the percentage?

I presume you are referring to the correction factor from earlier in the course which was used in calculating a SE...this corrects for sampling without replacement from a small population the SD+ is a correction to the SD of a sample when that sample is small (<100). That is it is an unbiased estimate of the size of the average deviation from the mean in a sample. Using the SE for %: well...in a lot of situations...that's a pretty big question. Whenever we are concerned with the average amount of chance error in our estimate of the mean (a percentage can be thought of as a mean--50% is the mean for number of heads in a coin flip for example). So we would use the SE for the percentage in confidence intervals, in test statistics
Dr. Cochran:

I am still having trouble with the regression method, both as explained in the book and in class (using the formula for the slope of the regression line). I think I am confused by the relationship of the estimate for the difference between the average of the score and its new value, and the Z score. These seem to be the same thing (?), but I still get confused by the method to use when working out a predicted value for y from a given value of x. I have gone through the examples in the book and noted from lecture. Could you briefly outline the method for me, or refer to a place where this is located?

Thank you,

Let's start with where I think you 'know' something. You know there is a linear equation, and you know how to solve for unknowns if you know all other pieces of information. So if I give you the correlation and the SD of both y and x from that you can calculate the slope. and if I give you the mean of x and y, you can calculate the intercept. And if i give you a value of x, you can calculate (now having 3 + 2 +1 pieces of information) the predicted value of y. That's my assumption about what you know. If I'm wrong then what I say next might confuse you--so read through that again and be sure you know that stuff (it's high school math with a little new stat thrown in but that's all).

Now in lecture, a way to do all this differently was presented. This method made use of percentiles and ranks. Here, we said we can look at a value of x and think of it in terms of its distance from the mean of x. So a score is a certain number of SD's away from the mean. and we 'know' that given a value of x, the associated value of y is r*SD of x = SD of y units from the mean of y(p. 151 in the text). So given a value of x we can calculate the estimated y value without ever developing the linear equation as we had to do above--the advantage? I could using only knowledge of x's deviation (never knowing the value of x or the mean of x) calculate an expected value of y. We can also use knowledge of a person's ranking in a distribution and convert that into SD or use SD's to convert it into percentiles (here we know nothing about the score or the mean of either x or y). Look at the web--lecture 16. You're right. These are all the same thing in many ways because it is simply one equation. But the first method requires all 6 pieces of information to generate an answer. These other methods can be used when you have less than 6 pieces of information. In the world of being coddled by textbooks and classes, you've always had all 6 pieces of information. These latter methods simply show you that you now have ways to jury rig answers with less--this is more likely to happen in the real world--and the skill to develop now is to think about what method do I use when I know all 6? when I only know these 3? when I only know 1? and so on.

Dr. C.

we have a few questions:

1. what is a regression line?

2. what is the difference between a regression line and a SD line?

3. when do you use them?

4. could you please explain the the problem on page 7 of lecture 17 in the "class notes" handbook?

we are very confused.

thank you!

THE TA RESPONDS

These are pretty broad questions to answer over email, and I'm not sure where your exact questions about these concepts are, but we'll try...

For 1-3 (note-- SD(y) is standard deviation of y, and similarly for SD(x)) The idea is that we try to fit a line to a scattergram, or we want to try to summarize points using a line. There are different ways of doing this. One is to use the SD line, which has slope (SD(y)/SD(x)). Another option is to use the regression line which has slope r*(SD(y)/SD(x)). Both lines go through the point (Xbar, Ybar). It turns out that the regression line does a better job of summarizing the scattergram.

There are lot of aspects to this question... I don't know where the confusion is, so I'm not sure what part to talk about. have you looked at the notes? She explains some of these concepts in more detail around lecture 16. As far as when to use them, it might be helpful to look at homework problems and problems done in lecture to see some examples.

For 4, I think you are talking about the question "What % of 24.5 yr olds have cholesterol level of 100 or less?"

Here's another interpretation of the question--First, think of the scattergram with cholesterol going up vertical axis and age going along horizontal axis. Now, think of vertical strip of points corresponding to people of age 24.5. We want to know what percent of these people (or points) have cholesterol level of 100 or less. We use a normal approximation. For the mean, (Xbar), we use the estimated mean cholesterol level for 24.5 year olds. This is computed in the notes on the top part of the page. For the SE, we use the rms, which is computed on that page as well. We get the z-score as usual--(X-Xbar)/SE. From there, use the normal table to get the percent that lies below the z-score (since we are looking for a level of 100 OR LESS). For more on this, see Chapter 11, section 5 of your book.

I hope this helps. If you have more specific questions, I'll check my mail again at 6. but I'm leaving soon after that. Thanks and hope it goes well tomorrow!

Prof Cochran,

I'm having great difficulty understanding and doing the r.m.s. problems when you use the normal curve inside a vertical strip. When I work out the problems, they seem to turn out wrong.I've been checking them over and over, but I can't seem to find out what I'm doing wrong.

Can you walk me through the problems that I am having such a hard time with? Please?

On page 198 From exercise set E, problem #2(b) and #3(b). I think these are the same 'type' of problems. and I tried to resemble them to the example, which I did, but I keep coming up with the wrong answer.

Thanks,

Feeling Frustrated.

 

Dear Student,

For prob 2:

step 1> What is the average height of son's of 6 foot fathers? 6 foot fathers are (72-68)/2.7 = 1.48 sd up so their sons on average are .5*1.48= .74 sd above average or .74*2.7 + 69 = 71.0 inches tall

step 2> What is the expected average deviation from that estimate? squareroot (1- .5*.5) * SD of sons = 2.34

Step 3> What percent of sons in this vertical slice are 6 feet tall or taller? z = (72-new mean)/new deviation = (72-71)/2.33 = .43. This cuts off about 33% in the tail

Problem 3 is little more tricky.

You are looking for the % in the slice that have forearms between 17.5 and 18.5 inches.

Step 1> What is the average arm length of men 68 inches tall. 68 is 0 SD, so their arm length should be 0*.8=0 sd away or 18

Step 2> what is the expected deviation--new deviation in slice = squareroot (1- .8*.*) * SD arm length = .6

Step 3> What percent of 68 in tall men have forearms between 17.5 and 18.5 in? That's like: (18.5-18)/.6 = .84 = z right, see? we need the slice of 1 (.5 to either side), and .5 is equivalent to a z of .84, which cuts off 60% approximately.

Keep trying

Dr. C.

Dr. Cochran,

I have a few questions for you.

1. When do you make a chi-square test and when do you make a z-test? I don't know how to figure out which one to use just by looking at the word problem.

2. On page 485 of the book, the authors mention something about a left

handed tail. I don't understand that whole section. Can you please explain it to me?

3. In chapter 29 the authors say that large samples can be bad and can skew the data. Is that actually what they are saying, or have I interpreted it wrong?

Please answer these questions if you have time.

Thanks,

1. chi-square analyzes counts (each person or subject thrown in one and only one cell) and z and t analyze differences in means of the sample or samples (each person contributes a score, but you are only analyzing the deviation of the mean of the group from some standard)

2. here the issue was that you 'expect' chance variation to happen. the left tail of a chi-square (very, very rarely used--I've never seen it before in 20 years of work, but as he points out you can do that) evaluates departures toward an absence of chance error. the data are too good to be true. You can think of it as flipping a coin 10 times and getting 5 heads and repeating this process 100 more times and always getting the same result--it is too unlikely to be true

3. the issue with large samples is that N is in the denominator so the math appears to be very very precise--but, well one of the ways to think about it is there is a certain amount of bias (our design tries to make it zero but it won't be that exactly) and when we can get chance to be very very small (remember early in the course you learned that if you repeat a chance process many times the deviation you observe as a sum is large but the percent away from the expected is very, very small--well it's that later that ends up in our test statistics denominator) then we can find significance with even trivial differences. It's like using too powerful an instrument to do something. For example, I'm analyzing some data now where my sample is 9400 women. When I divide them into two groups the difference between 33.2% and 33.6% is 'significant'--that is unlikely to be due to chance...but my hunch is that bias could easily acount for a percent or two. If my sample instead was 200, then i'd have to see 25% vs 35% or so to find significance. that would make me feel more comfortable that the difference were real and also meaningful

good luck tomorrow

Dr. C.

Hello. I have two questions:

1. When doing a chi-squared test, is it ever possible to have a two tailed chi-square problem? I understand how a two-tailed test arises for a z and t test but I am not quite sure for the chi-square.

2. I am also having some difficulty understanding the formation of the alternative hypothesis for the chi-square. Rather than, H(sub1): f(sub0)-f(sub1)=3D0, for at least one cell; why isn't it something like H(sub1): f(sub0)-f(sub1)=B90 for at least one cell. Why is it not emphasizing the difference rather than the simililarity?

Thank you,

1> The null hypothesis for t and z is that the means are exactly equal. One can very clearly see that one mean could be more or one mean could be less. The null under Chi is that the frequencies for each cell are exactly equal for all cells. The alternative is that for at least one cell the frequency observed is not what is the frequency expected. But think about how the chi-squared is calculated, literally. We look at each cell and estimate a deviation, whatever deviation there is going one direction also shows up going the other direction (like if you have a two by two, males and females got sick at the picnic or did not, and 100 of 200 people got sick, then if more men got sick than expected, you must have fewer women getting sick than expected--the system is closed, one cell's gain is another cell's loss). And then we sum up these deviations. Nowhere are we tracking is one group higher or lower. So the tail we generally figure relates to going toward more deviation than expected. The other chi-square tail, that we generally don't consider (the left side), is actually a tail that says we have less deviation than expected (like if our data were too perfect...Freedman has one example of that in the book). So the two tails are sort of like t and z, the left side is that the observed deviates less than the expected, the right side is that the deviation is more than expected.

But it doesn't translate with the same meaning as we can use with t and z. But I guess we could; though it's hard to imagine a research hypothesis that would do that that we would be interested in. Take the class example of asking 100 students what their year in school was. They are what they are. At one extreme they are too close to the expected 25%-25%-25%-25% split; at the other extreme they are too far away from it. It is the latter that we usually are asking about (Does year in school vary among UCLA students using the student store?) not a two-tailed Chi-square hypothesis (Do UCLA students use the student store either too much as we expect or too much as we don't expect?)

Now for t and z, the research hypothesis (is there variation or difference?) relates to either tail---but not for the Chi-square test.

2> it's fsubObserved-fsubExpected not equal to 0, for at least one cell

Yer welcome,

Dr. C.

I have a few questions regarding the homework.

1. In Chapter 27, #2, how do you calculate the SE's for those two samples? I used the square root of sample number (100) x SD (1.4). This gives me 14. From this, I divide it by 100 to find the SE average, and got .14. I also calcluated the SE for the second group using the same method,and got .967. When I squared those, I got .977, and your answer key said 1.8. Could you tell me what I am doing wrong for that problem?

2. In problem # 6, of the same chapter, I also did not get 1.6 and 2.7 for the SE values. The only way that I would get that is by using the t-test medthod of finding SE. However, in this prob. , n=500, so isn't the population too big for the t test?

3. In prob. # 3, Chapter 28, how did you calculate the expected values on the table?

1> I don't quite know where you are getting this strategy from...how about this: the se for the diff = squareroot(sd*sd/n for 1st group + sd*sd/n for 2nd group) = squareroot (15.3*15.3/100 +16.1*16.1/250) = 1.8 or the se for the ave of 1st group = 10*15.3/100 = 1.53. The se for the ave of 2nd group = sqrt(250)*16.1/250=1.02. The se for diff = sqrt(1.53*1.53 + 1.02*1.02) = 1.8

2> you're probably making the same mistake as above...the se for the ave of the 1st group = sqrt(250)*sqrt(18/250 * 232/250)/250 = 1.6%

3> for the observed cell containing 679...one margin is 679+103+114=896. the other is 679+63+42=784. and the total count for the table is 1074 so the expected is 896*784/1074=654.1

Dr. C.

Dear PRofessor Cochran,

Hi .... How are you? I've been looking at the lecture notes on the web and I have a question about one of the example problems. It is on the lecture titled t-tests and it is the question about the students and whether or not the new teaching style leads them to better testing. ON the lecture notes on the web the research hypothesis is that students that had the new teaching style did BETTER on the tests. HOwever in the alternative hypothesis you stated that the new teaching style students is not equal to the other 9th grade students. THis implies that the test is two tailed, which you wrote on the web. I am confused bc I thought that by saying BETTER, you are implying that it is a one-tailed test. I think in class you also used a one tailed test. Can you please explain the difference and email me back? THanks

Dear Student,

Yes, but as I said in class, when one does an intervention with human beings, one worries about causing harm as well as benefit, and so tests are generally two-tailed, even if the hypothesis does not state explicitly. On the final, I will be absolutely clear in any hypothesis I state in a question.

Dr. C.

hi, professor cochran,

i do not understand significance tests. why do you make a null hypothesis? how do you determine whether or not the null hypothesis can be rejected on the basis of 'p'? what *is* 'p' in relation to the normal curve? i just don't understand.

Why the null...first, do you know what the null is? in general it is a hypothesis of no difference, no deviation except for chance (i'm assuming that you have down the idea that chance adds or subtracts some amount to each observation, always, without fail, and that each observation is made up of it's true value + chance + bias). So the null states that whatever deviation we observe from what we expect is due to chance alone (not a real difference in true value, not bias). If and only if the null is correct, then a z-test or a t-test has a known distribution. And the area under the curve is known. this area has a known percent of the distribution--the percentile is P. So if the null is correct, then the z we obtain from a z test can be compared to the table in the back of the book to find out what percent of the distribution is cut off. So for example, if you get a z of a about 2 (1.96), then that is associated with an area of 95.45 (look in your book table), meaning that 100 -95.45 or approximately 5% is outside that area in the two tails. If we have a two tailed alternative hypothesis (for example there is a difference and we don't specify whether it is bigger or smaller), then we choose a P = .05, or a z value of about 2 as a cut off for deciding to reject or not our null hypothesis. if we reject our null (that what we observe is what we expect), then we must accept the logical alternative (what we observe is not what we expect). the logical alternative or the alternative hypothesis is our research hypothesis in statistical form. So today, we thought that giving people a drug would make their esp ability improve so that it was different from those who did not get the drug.

We have to test the null and not the alternative because only the null is testable. A z-test for example divides the difference between what we observe and what we expect by a weight (the SE) for how much average deviation we expect on average. Under the null the numerator is 0 except for chance deviation and since we divide by an estimate of our chance deviation the z should be close to 0 + or - some small number (generally < 2 or so). Under the alternative the numerator is not zero (think about why this is so...) and the numerator contains both true difference and difference due to chance. We just don't know what this number should be. So we can't evaluate the problem.

Hang in there. Keep reading the book; keep coming to class. This stuff is hard to learn, but it is the heart of the statistical technique.

Dr. C.

Dr. Cochran

on question 4 of the practice final questions, as far as I can tell, I am doing everything corectly. My answer does not make sense to me, and I have looked at many other examples to see what I am doing wrong. I can't see what I am doing wrong. I got that the z-score equals 29.74. Maybe that is correct, but it seems high. Should we be getting z-score like that?

sure. why not. the difference is a mixture of chance error and true difference (hs vs college). If chance error in estimating the mean is small as it would be with such a large sample, but the true difference is large then z can be huge. Like I said in class, these questions were cast offs and so were not edited by the ta's and myself--normally we would remove a result like that (by changing the numbers around) because the large z would throw students on a test. But there is no reason why you can't obtain a value like that (imagine timing the 100 yard dash in a group of college sprinters vs 4th graders--you'd get a real whopping big z value).

Dr. C.

I am having trouble understanding percentile ranks for joint distributions (in the regression chapters) I get confused on where the final percent comes from. The examples in the book leave out a couple of steps because I think they assume that the reader understands where the numbers are coming from. Could you please give me an example that includes all the steps.

thank you,

a student

Ms. Student,

That's a very broad question; much of a long answer from me would probably not address what is confusing you. Can you give me a problem in the book where you are unclear?

The basic idea is that regression (like correlation) reflects the extent to which scores share similar percentiles. So if I am 1 SD up on height and height and weight perfectly correlate (r =1.0) and we know by definition the slope predicting weight from height is r*SD_weight/SD_height, then I will be 1 SD up on weight. The key here is to remember that 'given my SD on the predictor variable, if I multiply that by the correlation, it gives me what I expect my SD to be on the new variable of interest'. Now SD's you can easily translate into percentiles and vice versa. So the steps would be:

1> Take the percentile for the variable you are using to predict the percentile on the other variable and convert it to an SD using the normal table

2> Multiply the SD by the correlation to get the SD you expect for the new variable

3> Convert the new SD back to a percentile using the normal table

If the problem is that you don't know how to convert from SD to percentile, then the place you got lost was in Chap 5. Go back and work through that again, and then face the regression material.

Let me know how your doing...

Dr. C.

Good afternoon Prof. Cochran

I went and looked at the grading sheet on your office door today and i am not quite clear. I think I know what the percentile means but i was wondering what it meant towards grades. If I remember correctly, I am in the fifty-fourth percentile. What does that mean gradewise in the class? Thank you for your time, I just want a grasp on how i am doing.

P.S. Do you have any suggestions for studying? Before this last midterm I did almost all the practice problems in the book, went over the homework and studied the overheads book, yet i still did not do well. Any pointers?

In the syllabus (which is also on the web), I gave the grading curve, which unfortunately I don't know off the top of my head. The grading curve translates more or less directly into percentiles, like if the top 26% of the class are in A range, then having a percentile greater than or equal to 74th is up in the A range. Take a look at the syllabus.

Well, relatively speaking you did about as well on the last exam as you did on the first. that's good, because it suggests that you'll do about the same on the final (whereas others who are more variable are likely to do worse...how come...think about it...if everyone keeps their highest score, the highest is highest for effort reasons and chance, and chances are chance won't contribute as much the next time). Two strategies: 1> when you are studying ask yourself how confident you are that you know the material--if you're not, go back and restudy it. 2> apply the material...try to explain it to others, even if just phantom others in your room, look around your world and look for where to apply things. Without seeing where you ran into trouble on the exam, I can't be much more helpful than that. Come on by my or the ta's office hours and let's go over things to get more specific.

Dr. C.

Professor Cochran,

I'm currently enrolled in your stat 10 class. I am concerned the scores that I got on the 2 midterms. I feel that I've studied sufficiently for the both midterms but I don't seem to be getting the results back on the midterms. I always do perfect on the solving parts, but I don't do as well on the multiple choice part of the exam.

So far, I've been preparing for the exam by studying the lecture notes, homeworks, and trying to understand the concepts. Am I doing something wrong? Because when I'm going into the exam, I feel pretty confident that I know the material but the way you ask your multiple choice questions always gets me confused. I was wondering if you can suggest a better way of studying for the final so that I may do well on the final. Thank you for your time.

Ms. Student,

Calculations are easier, sometimes, because they don't require that you know concepts, just that you know where to start, how to work through it, and when to stop. The multiple choice on the other hand requires that you can step back and say what you are looking at. So, yes, for many students it is harder. Here's some suggestions. 1> I have posted on the web under exam review a couple of things, one of which are multiple choice questions from last an earlier final. So that should give you more practice with stepping back and saying what something means. 2> When you do the work, take a step back and ask what things are. Don't just worry about getting the right answer mathematically. Try explaining stat concepts to others, even if only an imaginary person in your room. All of this will find the holes in your understanding of things. Go to discussion section, use office hours, mine and the ta's. Ask "what if" questions about the world around you. Can you find someone using a standard error? What does a standard deviation look like in a social situation. Things like that.

Let me know how it goes.

Dr. C.

hello

i was looking through the reader today and i do not quite understand something. i am talking about lecture 21 pg. 118-19 (ninth graders and test scores). at first you point out the desired significance level to be P=0.05 however at the end of the problem it says that we are looking for a value of 2.1 to reject null.

i am just a little confused. thank you for your time and help!

sincerely

Ms. Student.

The 2.1 refers to the t (was it a t-test? I'm at home and I don't have the overheads book with me) and that value is associated with a p of .05, or another way of saying it is a t-value that large or larger has a probability of 5% or less of occurring simply by chance alone. We set our hypothesis testing so that if there is a 5% chance or less that the null hypothesis is true (there is no true difference) then we reject it as a plausible explanation of what we have observed.

Good luck with your studying.

Dr. C.

Hi Prof. Cochran,

In the course reader you wrote that we need to know how to calculate standard errors of the means. Is this the SE for the average or something else?

Thanks for your help,

It's the SE for the average (the average is the mean--so standard error of the means is the SD of the sampling distribution of the means, the theoretical distribution we would observe if we randomly sampled with replacement a large number of equal size samples from the box and made a new distribution out of the means of each sample--it's the estimate of variation in the mean we expect to see due to chance)

good luck

Dr. C.

hi professor,

i couldn't make your office hours cause i had a final but i was wondering if you had time....

1. what exactly does a 95%CI define (i feel like there is a lot of discrepencies in the text)

2. question 20.12- why couldn't you use the bootstrap method and have the > expected be the % that the population showed?

3. is there a cutoff for when a box is so lopsided that normal approximation won't work?

4. did we cover section22.5 cause a few answers to problem assigned depended on it

5. when calculating r with cov which SD do u square (x or y)

6. is correlation only applicable to longitudal data (why won't it work for cross sectional)

7. problem 10.5 b- how is this causation vs. just association

thanks so much

1> an interval that 95 out of 100 times will cover the population parameter, but this time we do it, it may not. Just like we think a coin will come up heads at least half the time, so we flip it twice and expert 1 head, on average, but this one time we flip it twice, well, it may or may not do this. The uncertainty is in our construction of the interval (95 out of 100 times we make a good interval that covers the population parameter) and any one time we make it, who knows if it is correct or not.

2> i don't what question you mean here...

3> yeah, but it depends also on how many draws. normal approximations work pretty well except with small numbers of draws. think about a coin flip...it doesn't take long for the 50% average to show up.

4> no...nothing depends on this that i know of

5> neither. to calculate r, you might have to multiply x * y, but you never have to really calculate the covariance of the two

6> correlation is used a lot, and a lot with cross-sectional data ...it just can't prove causation

7> i don't quite understand which problem you are talking about. chap 10, prob 5, part b shows an inappropriate use mathematically of r

Dr. C.

Hi, Prof. Cochran. I have some questions.

1> I'm still not completely sure about the definition of a 95% confidence interval. Does this interval only applies to the population, and not the sample?

2> I cannot do p. 198, 3b. I know that we need to use 67.5 and 68.5 for the numbers, but after I used those numbers, I still cannot come up with the correct answer.

3> on p. 387, #8, why is the answer true? I thought that the 95% means either the sample is in the interval or not, but no chance should be involved. Please explain.

4> Lastly, on p. 484-485, #5, how did they get the answers for those questions? I really have no clue. Sorry for asking so many questions, but please respond as soon as possible. Thank you.

1> The interval is centered around the mean of the sample. It is a guess with 95% certainty of where the population mean might be, but the population mean is where it is--there is no uncertainty about it, not in our mind, but in truth about what it is. So the interval is created based on the sample and is a guess about the population.

2>on p. 198, 3b...the rms error = SDy*square root ( 1 - r*r) ...I don't see what you're seeing. Maybe you have the wrong problem?

3> well, no, it doesn't quite work that way. You have a box...the box is real and fixed (like a pair of dice) and you sample from it. As soon as you engage in sampling you inject chance into the process (like tossing a pair of dice)...so the campus has 54% women (like the numbers on a pair of dice, it's fixed), you sample (inject chance) and calculate an estimate of how much chance there might be (SE = 1.6%) in your sampling. You have a 95% chance that the sample percentage lies plus or minus 2 SE to either side of the population percentage, just like if I toss a coin 100 times I expect 50% heads to occur plus or minus 2*square root (100)*square root of (.5*.5) or 50% +- 10%, 95% of the time. You agree that the coin is fixed at what it is, 50/50 Heads, right? You also agree that if I toss it 100 times the outcome is constrained by the nature of the coin but is variable, right? It's true that the outcome of this sampling process will be what it will be, but we can have expectations (or notions about the constraints on the outcome based on the box)--that's not quite the same as the box itself, which is fixed before we even come on the scene. That's why we say, going from the sample to the box, the population parameter is either in the interval or not, but when we go from the the box to the sample, we can say there's a 95% chance the sample mean is in that interval.

4> Hah, tricky, huh? Ok, the z-test is a test of deviation due to chance when the null hypothesis is true. So all he's doing here is showing what happens with repeated samplings from a box where the average is 50. You would expect that the samples should have, on average, means close to and surrounding 50, right? z = (mean observed - 50)/SE. So we expect our z values to cluster around 0, plus or minus 2 SE 95% of the time. So, (a) is just do you know how to calculate a z-test: if the observed is greater than 50 should z be positive? (b) if we expect our z's to have a nice normal distribution centered at zero, wouldn't half be greater than zero? (c) a z of 2 is 2 SE out leaving about 2.5% in that tail, so we expect 2.5% of the 100 outcomes to be a z of greater than 2--there are 3--pretty darn close to what we expect. (d) z = 2, in the table in the book is 95.45. In terms of percentile that's 95.45 + (100 - 95.45)/2 = 97.7%. That leaves about 2.3% in the right hand tail, so the P of z = 2 is .023. the difference between 2.5% and 2.3% is just the difference between doing sloppy math and exact math--if you set your decision for rejecting the null hypothesis at P < .05, then being neat or sloppy is not a deal breaker (and the sloppiness on both Freedman's and my part reflect an awareness that the math formulas are more precise than what real life statistics is all about where chance and bias are going on).

Good luck with your studying.

Dr. C.

Pf. Cochran,

I'm having problems finding the SD when just given percentages. Can you help me with these two problems.

p. 510, Chp. 27, Set B, #6) Drug Abuse Survey....how do you find the SE, how do you find the SD from the given percentages

p. 542, Chp. 28, Set C, #6) Demographers think about 55% of newborns are male. 569 out of 1000 consecutive births are male. (1-sample z-test). How to find SD?

Think about it: for sample 1 there are 700 people (N) so there are 219*700 users and (1-.219)*700 nonusers. From this you can calculate the sd and then the se. the sd is just the square root of (.219*(1-.219))

SD? the sd is just the squareroot of (.568*(1-.568))

good luck.

dr. c.

I did the practice final. I have some questions and I was wondering if you can either give me the answers or tell me if my answers are correct or not.

1. (e) either b or c. -how it this chart with replacement? Are these randon draws from the box of individual student scores and are they the average of the entire country?

2. (a) Probability due to chance. P is greater than 5%.

3. (d) The sample size for Austria is smaller than Australia but SD bigger. - Can you expand on the correct answer b/c I'm not sure if the answer is (d) or (b)?

1> if it were a random draw wouldn't you expect the countries to show up more than once unless the universe of countries is rather large, wouldn't you?

2> i don't have the question in front of me...but if p is greater than 5% then what you are saying is that there is more than 5% chance that we would observe this if only chance is operating

3> the SE is tighter the larger the sample size

good luck

Dr. C.