2004, S. D. Cochran. All rights reserved.

SUMMARIZING DATA GRAPHICALLY 

  1. Data are the building blocks of research
  1. Researchers begin with recording observations and then must organize them so that they can tell their story 
  2. Researchers don't record all they see, but only those events in the domains that they are interested in--these are called variables 
  1. Each observation is comprised of two components 
  1. Where or who it comes from--this is referred to as the case 
  2. What is observed--this is called the value 

The value is the observation we record for the case in a specific domain or variable 

  1. Types of variables 
  1. Statisticians make distinctions among types of variables, because different types of variables can be analyzed in different ways. 
  2. The first distinction is whether the values a variable takes on are qualitative or quantitative 
  1. Qualitative variables--these are domains where the observed value for any case has no particular order 

These are also called nominal (or name) variables because they are nominally scaled 

Qualitative variables are also called categorical variables because at most we can sort them into categories 

We could sort them into categories like this:

Category Count

Females

2

Males

2

Qualitative variables are also called discrete variables because observations can only take on certain fixed values. In the example above, you are either SHARIKA or BRAD, there is no in between point. 

  1. Quantitative variables--these are domains where the observed value for any case has a natural ordering 

The precision with which we can order values of variables differs 

  1. Some variables have values that can only be put in order--these are ordinal variables (they have ordinal scaling)--we can rank them from first to last, smallest to biggest, closest to farthest but we can't say how far apart the values are 
Evaluation Count %
Hate pizza 1 25%
It's OK 1 25%
Love pizza 2 50%

Notice that we can't tell how far the value OK is from HATE. Is OK really closer to LOVE than HATE?  

  1. Some variables can be put in order and we can also say something about how far apart the values are from each--these are called interval variables (they have interval scaling) 

1

2

3

4

5

Definitely
won't

     

Definitely
will

We find on this new variable, INTENT, the observations had the following values: 1, 4, 5, 5. 

Notice, now we can say that value 1 is farther from 5 than the value 4 is from 5, that is we can say something about the interval distance though not with much precision. 

Interval scales are said to be continuous variables. In the example above, our students could conceivably have intentions that fell between any of the two numbers. Another way of thinking about it, we could have used a 10-point scale or a 100-point scale. The point is with a continuous variable, you can never measure the value precisely--there is always some finer measurement you could have done. The values of the variable are said to have a continuous distribution

  1. For a very few variables, we can put them in order, we can say how far apart the values are (how wide the interval is) and we can say where 0 is--these are called ratio variables (they have a ratio scale) 

When a variable has ratio scaling we can say something meaningful about the ratio of any two values.  

  1. When a variable has an underlying continuous distribution, we can step down in the hierarchy and treat the values we measure as discrete, but we can't go the other way 

After we collect data, the scaling of any variable can always be rescaled to a lower level of precision in the hierarchy of scaling. That is, ratio scaled variables can be measured intervally, ordinally, or nominally. But we cannot go the other direction.

  1. Distribution of a variable 
  1. Definition: The pattern of variation of a variable. A distribution is the set of values that a variable takes on. 
  1. Distributions are commonly organized by grouping values into classes.  
  1. Each class has a boundary or limit that separates it from other classes. These are called left-sided and right-sided 
  1. Each observation falls into a unique class. By convention, classes include observations that have the left-sided boundary value 
  • Example: If we divide AGE into two classes, 0-18 years and 18 and above, the '18' is the right-sided boundary for the first class interval and the left-sided boundary for the second class interval. By convention, 18 year olds are included in the second class, unless specifically stated elsewise. 
  1. Histograms
  1. A histogram is a graphical display of data translating frequency or counts of values into percentages of area 
  1. The total area under the histogram is 100%. Remember the formula for the area of a rectangle is base X height and for a triangle is 1/2 base X height 
  1. Steps to making a histogram 
  1. Group data into class intervals 
  2. Determine what percentage of the responses fall into the interval 
  3. Determine the height of the rectangle by the width of the class interval 
  4. Draw the histogram