Statistics 10
Lecture 20


A t-test problem for review

Suppose your present commute to work results in an average travel time of 40 minutes per trip.

A new route has been suggested by a friend who claims that it will save you time. Suppose you tried this route on 10 randomly chosen occasions and these are the resulting times (in minutes):

44, 38.5, 37.5, 39, 38.2, 36, 42, 36.5, 36, 34

Do these data establish the claim that the new route is shorter at the 5% level of significance? How about at the 1% level of significance?

What is the null hypothesis?
What is the alternative hypothesis?
What is the appropriate statistical test?

Perform the test:

t =        38.17 - 40
   ------------------------  = -1.9471 about -1.95
    (SQRT(10) x 2.9721) / 10

What do you conclude?

Correlation (Chapter 8)

A. Basic Definitions

  1. Scatterplot or Scatter Diagram

    A scatterplot is a two dimensional plot of data. The horizontal dimension is called x, and the vertical dimension is called y.

    Each point on a scatterplot shows two values, an x value and a y value. Each point represents a single case. A single case could be a single person or object, but a single case could be a matched pair (e.g. father-son, twins, husband-wife)

    Handout

  2. Positive and negative relationships

    There is a POSITIVE relationship if above-average values of x are associated with above-average values of y. conversely, there is a NEGATIVE relationship if above-average values of x are associated with below average values of y.

  3. Warning! Scatter diagrams only show association, but association does not mean causation (firefighters, fire damage)

    In the Social Sciences, X and Y are usually called the INDEPENDENT and DEPENDENT variables respectively. They are given these names because the independent variable is thought to influence the dependent variable.

    There is nothing to stop us from reversing the relationship.

B. Another Statistic: The correlation coefficient

  1. The CORRELATION COEFFICIENT, denoted r, measures how close the data are to a straight line or in other words it measures the strength of association. This is a numerical summary of the scatter diagram graphic.

    The correlation coefficient can take values from -1 to +1. Values near zero mean that the data is not close to a straight line. Values near the ones (both positive and negative) mean that the data is very close to a straight line.

  2. Formulas

    Your text gives you a very long formula for calculating the correlation coefficient (pp 132-134) and I am not certain how useful it is. Instead, read the technical note on p. 134, the formula is reproduced here:

    
          (average of the products xy) - ((average x) X (average y))
    r=    ---------------------------------------------------------
           (Standard Deviation x)  X (Standard Deviation y)
    
    

C. Properties of r summarized

D. Examples

  1. Given the five points {(2,7), (3,3), (5,1), (8,4), (13,2)}, find r.

    Answer: r = -0.47.

                 x      y     product of x & y
               ----    ----   ----------------
                 2      7          14
                 3      3           9
                 5      1           5
                 8      4          32
                13      2          26
    
    Average:    6.2     3.4        17.2 
    Stdev  : 3.9699  2.0591
    
    r =   (17.2) - (6.2 x 3.4)
          -------------------- = -0.47
          (3.9699 x 2.0591)
    

  2. The dataset on the left has a correlation of r = 0.415. Find the correlation for the dataset on the right.
    		 x	 y		 x	 y
    		---	---		---	---
    		 1	 2		 4	-12
    		 1	 3		 6	-12
    		 2	 6		12	-11
    		 3	 5		10	-10
    		 5	 9		18	 -8
    		 7	 8		16	 -6
    		11	 8		16	 -2
    		13	 4		 8	  0
    		13	 7		14	  0
    
    

    Since the new list is just a transformation of the old list (i.e., the "new" x = 2y, and the "new" y = x-13), the correlation is the same as in the previous list: r=0.415.

    Note:If you only modify one of the lists (either x or y) by adding or multiplying by a constant, it will not change the correlation.

E. Homework Set #6 -- Due 12/4/98


button Return to the Fall 1998 Statistics 10/50 Home Page

Last Update: 30 November 1998 by VXL