Some general techniques that might help you with your homework.
You might also want to read the Announcements
about troubles with ARC in Young Hall.
Entering Data By Hand
Suppose you want to enter a variable called "myvalues",
and the values you recorded were: 6, 15, 12, 2.3. In the Listener
window type:
(def myvalues (list 6, 15,12,2.3))
Then, under the Dataset menu select "Add a variate..." In the
space that becomes available, type
myvalues = myvalues
You will then be able to make plots and analyze this variable. Warning: your new variable must have the same number of observations as all of the other variables in the data set.
Another approach is to use a word processor to put the data into a file, and then load that file as we usually do.
Linked Plots
You already know how to make histograms. But
if you make histograms for two separate variables X and Y, the histograms
will be "linked." This means that if you select one part of the histogram,
the corresponding observations will be shaded in the other histogram, too.
This works not just for two, but any number of histograms. And it works
not just for histograms, but for all graphics. You can use this to
find help understand unusual observations. For example, if you see
an outlier in one variable, select it, and see if that person is also unusual
on other variables, too. You can also use it to help find trends.
Select the higher values on one variable, and see if they tend to be high
(or low) on the other variables also.
Deleting Cases
You can highlight an observation on any graphic,
and then select "Case Deletions" beside the graph to remove that point
and see what it would look like without that observation. (You can
also select several obserations.) This will affect all of the graphics
you are currently displaying. But don't worry, the same command will
easily add the deleted observations back. This is one technique for
examining potentially "influential" observations.
Scatterplot
You can compare two variables by selecting "Graph&Fit:
Plot of..." Put the independent variable (x) on the H axis, the dependent
on the V. H stands for horizontal, V for vertical. O stands
for out -- the z- axis. You can put a third variable on the O axis
and get a three-dimensional spin plot which you can then spin around and
view from every imaginable angle.
Regression
Once you've made a scatter-plot, you can view the
regression line by moving the "slider" where it says OLS. (OLS stands
for Ordinary Least Squares.) If it says "NIL" on the right side,
then there is no regression line visible. If you move it one click
to the right, it will say "1" and superimpose the best fit line (y = a
+ bx). If you move one more click to the right, it will say "2" and
superimpose the best fit quadratic line y = a + bx + cx^2. You can
continue this process for I don't know how long. This doesn't let you see
the values of a and b; its merely a descriptive means for you to see what
the best fit line might look like. Once you have a best-fit line,
you can also look at the residuals (although I don't know if we'll have
covered these in class by the time you're reading this.) Just select
"Rem Lin Trend" in the upper left corner. (This means REMove
LINear TREND.) This will give you a plot of the residuals on the
vertical axis, and x on the horizontal axis.
If you want to see the values of the best-fit intercept
and slope, go to "Graph&Fit: Fit Linear LS". Put the variable
on the V axis as the "Response" and
the variable on the H axis as the "Term/Predictor" and click on OK.
You will see a long table printed out in the "Listener." All you
want from this table (for now -- next quarter you'll want more) is the
column that says "Estimates." Under "Estimates" the "Constant"
is the intercept and the estimate beside the x variable
name is the slope. So that y = constant + slope*x.
Descriptive Statistics
You can get means, medians, SDs, etc. from the Dataset
menu. (This might not say "Dataset", if you named this dataset something
else. So for example, if, when you downloaded the data, you called
it "classdata", then it will say "classdata".) Just select "Display
Summaries". You can also select "Display Data" to get a list of all
of the values side-by-side. This is useful for finding particular
observations and comparing two sets of numbers side by side.
Sub-groups
Suppose you want descriptive statistics of a sub-group.
For example, if you selected "height" under "Dataset: Display Summaries"
it would give you the average of all heights. But suppose you wanted
the averages of the males and females separately? Then, select "Dataset:
Table Data". Under "Variates" put height and under "Condition on"
put "gender". Note that whatever you put under "Condition on" MUST
be a categorical variable, or at least have only a small number of values.
You can then check boxes for whatever summary statistics you need.
(Put .5 under "quantiles" to get the median. Put .25 to get the first
quartile, etc.) A table will print out that will give you these
summaries broken down by the categories you conditioned on.