Ó 2004, S. D. Cochran. All rights reserved.

 

ASKING QUESTIONS, COLLECTING DATA

 A. Overview 

Science is concerned with posing questions that can be answered, generally by translating questions into something that we can count or measure. Not all questions can be answered this way ("Does God exist?"). But many questions that seem to be unanswerable, could be if asked correctly. Statistics are simply one tool that scientists use to allow them to ask and answer the questions that they have about the world.  

You are all already, in a sense, scientists and statisticians. If you went to the Student Store to buy your textbooks would it be better to do it at lunch time or at 8:00 am? How do you know? Have you observed that the lines are longer at lunch than early in the day? Notice that you automatically translate shorter lines into better lines. You also may not know that shorter lines actually exist at 8:00 in contrast to lunch hour, but you know from your other experiences in the world that this is likely to be so.  

But you could also ask this question in a way that could not be answered. If you went to the Student Store to buy your books, would it be better to do so on Wednesday? Notice that the question implies not only a quantifiable answer but also a comparison to something you have not specified, something that is not known. We cannot answer questions like this. 

B. The methodology of research design involves: 

1. Designing the question that you want to ask 

2. Collecting data or observations that helps to answer the question

3. Making sense of the data 

4. Using what you have found to answer your question 

The clearer sense we have of what we are trying to do, the better chance we have of actually finding out what is of interest to us. 

C. Specifying our research question 

The first step in conducting research is to translate our inclinations, hunches, suspicions, beliefs into a precise question. 

Example: Is this drug effective?, Does lowering the interest rate cause inflation? 

The second step is to look closely at the question we have asked and assure ourselves that we know what an answer to the question would like 

Example: Is this drug effective? Do we know exactly what drug we are referring to, how big a dose, given to whom? Can we define what we mean by effective? Do we mean effective for everyone? Is it a cure? What about side effects? 

D. Collecting data 

We have to make three major decisions when we collect data 

  1. We have to know how we are going to either view or tinker with the phenomena we wish to study--that is we have to have a research design
  2. We have to decide exactly what type of data we need to answer our research question--that is we need to choose the right instrumentation (For example: If we wanted to know whether or not UCLA students like pizza more than salads, finding out that they think tuition is too high does not help us to answer the question)
  3. We have to decide where is the right place to obtain the information we need--that is we need a sampling plan  

For example: If we wanted to know how UCLA students felt about the current debate over alcohol use on campus, we would have to devise a way of collecting information from UCLA students. It wouldn't do to survey USC students. 

E. Sampling plans 

Normally when we ask a question, the answer we desire is very general--or another way of saying it is we seek an answer that we can generalize--we want to go from the particulars of what we observe to something bigger.

If we wanted to know how UCLA students feel about an increase in their tuition, then we are talking about every one of you and also every other UCLA student that is not here in this room. If we ask a hundred students the question about tuition, we want be able to say on the basis of those 100 responses what UCLA students feel, not just what these 100 students feel.

Basic definitions 

Population--the entire set of things (people, animals, events) we want our answer to apply to. Example: All UCLA students. 

This set can be defined. That is, we can specify who or what is in this set and who or what is not

Sample--a subset of the population. Generally, research uses information gathered from samples because most of the time is not feasible to collect information from the whole population and also it is simply not necessary, for reasons that you will learn during this course 

A numerical fact about a sample is called a statistic. If we were to ask the students in this class what their major is, the percentage of students who are English majors would be a statistic generated from the class sample. 

A numerical fact about a population is called a parameter. For example, we could, hypothetically, find out how many college students in this country are Enlgish majors. This would be a population parameter. But you can imagine the headache in doing that. Instead, we use statistics, such as the percent of English majors in this room to estimate the population parameter. If we have planned our study well, our statistics can closely approximate parameters.

Sampling plans refer to how we draw or select our sample from the population 

One method is simple random sampling--everyone in the population has an equal chance of being selected into the sample 

But there are many other ways of doing sampling 

F. Bias occurs when the statistic we observe is not a good estimate of the population parameter 

Bias can occur because of problems in sampling plans 

Bias can occur because of instrumentation problems 

Bias can occur because of design problems