Data Analysis for High School Teachers
Winter, 2000
Instructor: Robert Gould
email: rgould@stat.ucla.edu
phone: 310-206-3381
office: Math Sciences Building, 6151
Assistant: Allen Martin
email:
aemartin@flash.netWeb Page:
http://www.stat.ucla.edu/~rgould/x401w00Prerequisites: This is a "second" course in Statistics, designed for those with some knowledge of the basics. It is intended for those who teach Statistics in the high schools. Thus, one year experience teaching Statistics at the high school level, or a comparable amount of college-level course work is required and assumed.
Purpose: It is my intent to provide you with some "real-life" experience in doing Statistics. That last sentence will mean different things to different statisticians, and so you will get a fairly idiosyncratic version of statistics. To some extent, this can't be helped, because unlike, say, Mathematics, Statistics is more craft than science, and so can be a very personal activity. In practice, what this means is that this course will in no way be comprehensive, or even try to be, and may not even be representative of what "most" statisticians do.
It is important to all of us, I think, that your experience in this class help you in teaching statistics to your own students. I am not an expert in pedogogy, and know very little about educational theory, and so this class will not be about that. I have never taught high school, and so can offer no lesson plans or activities for your students. But, I hope that by giving you some insight into what (at least some) statisticians really do, you'll have a better sense of what to teach your students.
I want this class to be as useful as possible for you, and for that reason I plan to spend some time each week talking about topics that have come up in your class, and also about designing and/or planning activities for your class. Allen Martin has agreed to assist in this course, and I hope will serve as what we can rather grandly describe as a "liason" between the Statistics community (me) and the High School Teachers community (you). With his help, and by discussing things together, we will direct this course towards topics most meaningful to you.
The central activity of this class will be data analysis. Each week we will consider real data sets. There might be short lectures to introduce relevant techniques, and there will definitely be discussions about the hits and misses of various techniques. But still we will spend class time working at the computer analyzing data.
Outline:
I have not included an outline for this course, because I first wanted to spend some time together to decide how best to proceed. My plan is to begin by introducing some data sets which we will analyze together. Ideally, these will lead to topics to explore and examine.
Text: It is impossible for me to select a textbook for this course for several reasons. (1) We probably have a wide variety of statistical and mathematical experience, and will have a hard time finding a book accessible to everyone, even in a class of only 3 people. (2) As far as I can tell, no one has ever written a text for a class like this. In fact, textbooks on applied statistics are few and far between and those that exist do not, in my opinion, deserve much attention. (3) Most applied statistics books focus on a particular technique (for example, Regression or Linear Models or Time Series).
Because realism is a major theme of this course, I looked for texts with real data sets and realistic problems. If you would like to purchase any of these books as reference guides, feel free to do so. However, I am enthusiastic about only the Chatfield book, and even in this case must recommend it with some reservations.
So we will make do with handouts. But here are some books you might want to look over:
1) Chatfield, Christopher, Problem Solving: A Statistician's Guide, Chapman and Hall. I think this is a great book, although perhaps it is not for everyone and certainly not for all occasions. This book does not contain any techniques or "how to"s. The first half of the book offers a philosophy of statistics and, more importantly, guiding principles for approaching statistical problems. It is also filled with helpful practical details, including what to look for in software, useful books and resources, and how to write a report. In my opinion, some of the chapters in this first half of the book are invaluable. The second half of the book consists of data sets and exercises, with discussion. Some of these data sets are somewhat artificial, and some of the solutions are idiosyncratic and some use quite advanced techniques. The explanations are often rather curt, and assume the reader is familiar withthe techniques but maybe inexperienced in their application.
Before you rush out and buy this book, it is based on a paper which I will pass out in class. Still, the book goes into more detail and might make a nice souvenir.
2) Cox, D.R. and Snell, E.J. Applied Statistics: Principles and Examples. Chatfield's book owes a debt to this one both in its structure and its general philosophy. However, in my mind, Chatfield comes out on top. Cox and Snell's book also has two sections. The first is a "how to" and explains many principles and concerns of applied statistics. The second half gives data and examples of analyses. Both books have interesting first halves, but Chatfield's comes across as more prescriptive and Cox and Snell's more descriptive, at least to my way of thinking. I prefer Chatfield's approach, but this is a matter of personal taste. Also, I think that to really get much out of Cox and Snell, you need to already have a fair amount of experience. Certainly, to understand their examples requires considerable sophistication sometimes. Still, I think the data sets are valuable if they are used as practice problems, and while the explanations might be intimidating, they are occasionally illuminating.
Note that neither of these two books are good references if you want to learn or brush up on particular techniques.
(3) Box, George E.P., Hunter, William G., and Hunter J. Stuart, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. Wiley & Sons. This book focuses on designed and controlled experiments. Most of the examples and data sets are realistic, but not real. The applications are oriented towards engineering and industry, and sometimes, for example the discussion on the scientific process, the explanations are fairly idealized. Also, this book assumes familiarity with probability and statistical theory. With that said, it gives formulas for standard tests and techniques, and offers advice that sounds based on considerable experience. (For example, when one should and shouldn't use the t-test.) Also, it has a nice explanation of when to block and not to block while designing experiments.
(4) Ramsey and Schafer, The Statistical Sleuth: A course in methods of data analysis. Duxbury. I'm not that fond of the overall layout of the book (which to my way of thinking encourages an approach that looks for the right data to be applied to the techniques, rather than showing you the techniques you can apply to solve your problems), and the explanations are not always that clear. But this is written at a more accessible level than the Box, Hunter & Hunter book, and has a wider variety of examples. The book has many nice case studies which provide a good context for the techniques discussed, and it has formulas for those wishing to look up certain techniques. In short, I think it might be a good reference book, but I wouldn't rely on it to learn a topic. The worst fault is a tendency to quickly sweep perplexing problems under the rug. For example, in the section on logistic regression: "The natural choice of link for a normally distributed response...is the identity link." This begs the question, of course, of why this is the natural choice. Another example in Chapter 3: they discuss applying the t-test to compare two groups. They admit the data are not symmetrically distributed, but say that "it doesn't matter" with no explanation of why it does not.
(5) Others: We'll keep looking throughout the quarter. One source is software manuals, which sometimes are very helpful not just for learning how to apply a technique, but for understanding the technique.
Computers: We will be using Macs. iMacs to be precise. This is not a philosophical statement so much as a practical matter. However, everything we do can easily be transferred to a PC. We will spend some time at the first meeting getting acquainted with the computers.
Software: "Real" statisticians need to be acquainted with a variety of software packages, primarily because no single package will do everything you need. (In my experience many statisticians (myself included) are surprisingly ignorant of a large number of packages.) The more "famous" professional packages, SAS, SYSTAT, SPSS, STATA (for some reason, statistical packages love acronyms) are so complicated they require courses unto themselves. They are quite powerful, and often include features designed to provide for advanced data manipulation. We do not have time to learn how to use such a package, and so we will be using some simpler, and yet fairly comprehensive packages.
The lab computers are equipped with the student version of DataDesk. The student version limits the number of variables and size of the data set allowed, and also restricts the types of analyses that can be done. None-the-less, it will probably do everything we need. If not, we will switch to the "full" version of DataDesk. The student version comes bundled with ActivStats and retails for about $40.
For pedagogical purposes, we will also use a program called ARC. ARC is designed for linear models only, but is also very useful for exploratory and descriptive analyses. ARC runs on both PCs and Mac's and is available for free at:
http://www.umn.edu/arc/
Grading
Your grade will be based on a project turned in no later than March 14 (which is the last class.) Projects should be 5-10 page papers that either
a) Report on a data analysis
b) Report on design of experiment and results
c) Lesson plan involving data analysis.
Details should be discussed with either me or Allen.
There will also be various "homework" assignments that will not be graded, but will be discussed each week.