There are two major dichotomies that will appear again and again and again in this course, and so its worth struggling over them a bit at the very beginning.
Populations vs. Samples
The first, and perhaps most difficult to understand, is that of the population and the sample. Both consist of collections of objects (groups of people, tax returns, poker hands) although we're usually more interested in measurements taken on these objects rather than these objects themselves (for example, the heights of people, the incomes on tax returns, or the number of Aces in poker hands.) Populations might also be abstract, such as the group of all people ever born and who ever will be born.
A population is the collection of objects that consists of all of the objects. Typically, it is a group we wish to study and know more about. For example, all U.S. voters, all children in South America, all stocks exchanged in the Pacific Stock Exchange, all left shoes in California.
A sample, on the other hand, is a collection of objects from a population. The students in this Statistics class, for example, comprise a sample of the population of all UCLA students. (Also a sample of all U.S. college students.)
One researcher's sample might be another researcher's population. For example, I might be interested in knowing the ages of just the students in my class, and so I might think of you all as my population of students. And then any smaller group would be a sample from that population. On the other hand, I might want to know the ages of all U.S. college students, and so then this class would just be a sample of that much larger population.
Perhaps it's occurred to you that populations might be quite large. Particularly if they are abstract (all people ever born) or infinitely large, it might be impossible to list all of the members of the population, much less take measurements on them. For this reason, samples are almost always much smaller and more manageable.
An important feature about populations is that, for most purposes (or at least some of the most interesting ones), the entire population is unknown! Usually what I mean by this is that it is so large that we can't possibly know everything about it, but it might be more than just its size that makes it difficult to know all about it. So for example, it would take an incredible effort for me to know the ages of all U.S. college students currently enrolled at a College or University. By the time I assembled this information, the list might even have changed! So I'm going to have to rely on my knowledge of a smaller, more manageable sample to make inferences about the population.
This takes us to dichotomy number 2Ö.
Descriptive vs. Inferential Statistics
We often find ourselves with a list of numbers representing measurements on some objects. For example, a list of shoe sizes, stock quotes, IQ's. Perhaps the list represents a population, but more often than not it's a sample from a larger population. In any event, we use Descriptive Statistics to, well, describe the list. Why do we need a description? Primarily to summarize, but also to point out features that we think are important or telling about the list. For example, at a glance, what can you make of the list:
-5 0 10 -1 2 -4 9 18 1 -8
You will soon learn that often lists can be adequately summarized by just two features: their "center" and their "spread". For example, here the average of this list is 2.2 and the difference between the biggest and smallest number is 26. So the two numbers, 2.2 and 26 tell us quite a bit about the list.
There are also graphical summaries, literally pictures of the list. We'll see examples of these soon enough, too.
In general, descriptive statistics not only give a "snapshot" of the list, but sometimes even point to patterns or trends that were otherwise hidden. Remember John Snow and the Broad Street Pump? Or Florence Nightengale's statistics that demonstrated that many more soldiers died of disease than war injuries? These are examples of descriptive statistics.
Before the twentieth century, descriptive statistics were really all there were.
Another type of statistic is the inferential statistic. This isn't really so much a number as a procedure, a model, an entire way of thinking about numbers. Inferential statistics are descriptions of samples that are meant to give us insight about the population. Here's the Oxford English Dictionary's (2nd ed.) definition of inference:
the forming of a conclusion from data or premisses, either by inductive or deductive methods; reasoning from something known
or assumed to something else which follows from it; = ILLATION. Also (with pl.), a particular act of inferring; the logical form in
which this is expressed.
Obviously, the success of your inference depends a lot on how representative your sample is of the population. If I stand a street corner of Beverly Hills and ask people what their incomes are, I probably will not have a good picture of the incomes of all residents of the U.S. We will talk about the need for a representative sample next in Chapter 3.
But another important part of inference is having
an idea of how large your errors might be. Put differently, it's important
to know how much trust you can place in your inference. One thing that
inferential statistics provide us is a quantification of our error in making
an inference. To understand how to do this, we will first have to study
some basic Probability (Chapter 4.)