Lab 3: Summarizing Linear Relations

Purpose: The purpose of this lab is to determine whether the duration of an eruption of Old Faithful can predict the time until the next eruption.

Background: If you are a tourist on a tight schedule, and hope to see a Yellowstone Park geyser eruption, then you probably want to know what time you should be there. If you are a park ranger and you're continually pestered by tourists wanting to know when the geyser will next erupt, you probably want to know what to tell them. Old Faithful is a geyser that earned its name because of its supposedly reliable eruptions. The idea is that the duration of one eruption can be used to predict the time until the next eruption.

 

Data: The data are the same as for Lab 2 and can be found at: http://www.stat.ucla.edu/~rgould/51f98/GEYSER1.DAT

 

Activity:

1. Make a scatterplot. Choose your x and y variables carefully. Explain your choice.

How would you describe the trend? Based on this plot do you think you can predict when the next eruption occurs if you knew how long the previous eruption lasted? With what accuracy? Do you feel comfortable summarizing this trend with a correlation coefficient? If so, what is the correlation? If not, why not?

(Hint: If you do choose to do a correlation, you can select "correlation" from the Data Analysis tools. For input range, specify both columns. For example, B1:C222. It will then output a 2X2 table, and the number you want will be in the lower left-hand quadrant.)

2. Perform a regression. What is the equation for using duration to predict the time until the next eruption? Do you think this is a good summary? What's the most you'll be wrong, based on the data? Can you give some idea of what a "typical" error might be? (Hint: it might help to superimpose the regression line on your scatterplot. To do this, select the scatterplot and under Chart on the menu, choose "Add Trendline." I recommend you save your work before doing this, since it sometimes makes my computer crash.)

3. Suppose that you wanted to use the length of the current eruption that you were watching to tell how long it had been since the last eruption. In other words, if you had just witnessed a 2.5 minute eruption, could you tell how long it had been since the last eruption? Do this two ways: first, use the equation you got above. Second, redo the regression after switching the x and y variable and use this new result to tell you how long since the last eruption. Do they agree? (Did you expect them to? ) Why or why not?