Lab 2: Descriptive Statistics
Due Friday, October 16
Purpose: Describe the eruption cycle of Old Faithful. Just how faithful is it?
Background: Old Faithful is a geyser in Yellowstone park that earned its moniker because of the reliability of its eruptions. I took the section below from someones web page. The general problem with geysers is that they can be unpredictable. Some are quite spectacular to see, but you can predict when they will erupt only with great difficulty. Old Faithful, on the otherhand, is supposed to be fairly easy to predict. Here's what one particular geyser-lover's homepage says about Old Faithful:
Old Faithful erupts every 35 - 120 minutes for 1.5 - 5 minutes. The rangers say that 90% of their predictions are within +/- 10 minutes. The time to the next eruption is predicted using the duration of the current eruption. The longer the eruption lasts, the longer the interval until the next eruption. For instance, a 2 minute eruption results in an interval of about 50 minutes and a 4.5 minute eruption results in an interval of about 85 minutes. It is not possible to predict more than one eruption in advance (ref: http://www.yellowstone-natl-park.com/geyser.htm)
For information about visiting Old Faithful, see
http://www.nps.gov/yell/Data: The data are available at
http://www.stat.ucla.edu/~rgould/51f98/GEYSER1.DATYou can also access the data through the "data" part of the course homepage.
Topics: Descriptive Statistics, Histograms, Scatterplots.
Activity:
1. The description above that "Old Faith erupts every 35 to 120 minutes" is rather vague. Let's see if we can do better.
Make a graphical summary that describes how often OF erupts. (Be sure to label your graph carefully. Make sure to note which variable you used to make the graph.) In words, describe the graphic. Do you agree with the statement above ("...every 35 to 120...")?
What numerical summaries does your graphic suggest might be useful? Are there any? Could you be more precise than 35 to 120 minutes? Explain.
2. Make a numerical summary of the length of the eruptions. (From Excel, look under "Descriptive Statistics", choose the correct input range, identify an output range, and click on "Summary Statistics".) Now make a histogram of the same variable. Do the mean (really the average) and the median seem like good summaries of the center of the distribution to you? Why or why not?
3. The description above suggests that the length of the eruption and the time until the next eruption are related. Let's look at this:
If you didn't do it in the last question, make a histogram of the time between eruptions. Describe the shape of the histogram.
Does the fact that the time is related to the length of the eruptions offer a possible explanation for the shape of the histogram? Explain.
Make a scatterplot of the eruption length versus the eruption time. How would you describe the relation? (Strong, weak? How does time between eruptions vary with eruption length? For example, does one increase as the other decreases? )
4. Pretend you were writing a homepage describing Old Faithful to visitors. Write one paragraph that tells them how to decide when to show up to see an eruption. Use the paragraph at the start of this lab as your model.