110A  Some ARC Hints for HW 3

Some general techniques that might help you with your homework.
You might also want to read the Announcements about troubles with ARC in Young Hall.

Entering Data By Hand
    Suppose you want to enter a variable called "myvalues", and the values you recorded were: 6, 15, 12, 2.3.  In the Listener window type:
(def myvalues (list 6, 15,12,2.3))
Then, under the Dataset menu select "Add a variate..."  In the space that becomes available, type
myvalues = myvalues

You will then be able to make plots and analyze this variable.  Warning:  your new variable must have the same number of observations as all of the other variables in the data set.

Another approach is to use a word processor to put the data into a file, and then load that file as we usually do.

Linked Plots
    You already know how to make histograms.  But if you make histograms for two separate variables X and Y, the histograms will be "linked."  This means that if you select one part of the histogram, the corresponding observations will be shaded in the other histogram, too.  This works not just for two, but any number of histograms. And it works not just for histograms, but for all graphics.  You can use this to find help understand unusual observations.  For example, if you see an outlier in one variable, select it, and see if that person is also unusual on other variables, too.   You can also use it to help find trends.  Select the higher values on one variable, and see if they tend to be high (or low) on the other variables also.

Deleting Cases
    You can highlight an observation on any graphic, and then select "Case Deletions" beside the graph to remove that point and see what it would look like without that observation.  (You can also select several obserations.)  This will affect all of the graphics you are currently displaying.  But don't worry, the same command will easily add the deleted observations back.  This is one technique for examining potentially "influential" observations.

Scatterplot
    You can compare two variables by selecting "Graph&Fit:  Plot of..."  Put the independent variable (x) on the H axis, the dependent on the V.  H stands for horizontal, V for vertical.  O stands for out -- the z- axis.  You can put a third variable on the O axis and get a three-dimensional spin plot which you can then spin around and view from every imaginable angle.

Regression
    Once you've made a scatter-plot, you can view the regression line by moving the "slider" where it says OLS.  (OLS stands for Ordinary Least Squares.)  If it says "NIL" on the right side, then there is no regression line visible.  If you move it one click to the right, it will say "1" and superimpose the best fit line (y = a + bx).  If you move one more click to the right, it will say "2" and superimpose the best fit quadratic line y = a + bx + cx^2.  You can continue this process for I don't know how long. This doesn't let you see the values of a and b; its merely a descriptive means for you to see what the best fit line might look like.  Once you have a best-fit line, you can also look at the residuals (although I don't know if we'll have covered these in class by the time you're reading this.)  Just select
"Rem Lin Trend" in the upper left corner.  (This means REMove LINear TREND.)  This will give you a plot of the residuals on the vertical axis, and x on the horizontal axis.
    If you want to see the values of the best-fit intercept and slope, go to "Graph&Fit:  Fit Linear LS".  Put the variable on the V axis as the "Response" and
the variable on the H axis as the "Term/Predictor" and click on OK.  You will see a long table printed out in the "Listener."  All you want from this table (for now -- next quarter you'll want more) is the column that says "Estimates."  Under "Estimates"  the "Constant" is the intercept and the estimate beside the x variable
name is the slope.  So that y = constant + slope*x.

Descriptive Statistics
    You can get means, medians, SDs, etc. from the Dataset menu.  (This might not say "Dataset", if you named this dataset something else.  So for example, if, when you downloaded the data, you called it "classdata", then it will say "classdata".)  Just select "Display Summaries".  You can also select "Display Data" to get a list of all of the values side-by-side.  This is useful for finding particular observations and comparing two sets of numbers side by side.

Sub-groups
    Suppose you want descriptive statistics of a sub-group. For example, if you selected "height" under "Dataset: Display Summaries" it would give you the average of all heights.  But suppose you wanted the averages of the males and females separately?  Then, select "Dataset: Table Data".  Under "Variates" put height and under "Condition on" put "gender".  Note that whatever you put under "Condition on" MUST be a categorical variable, or at least have only a small number of values.  You can then check boxes for whatever summary statistics you need.  (Put .5 under "quantiles" to get the median.  Put .25 to get the first
quartile, etc.)  A table will print out that will give you these summaries broken down by the categories you conditioned on.