xlispstat glossary

Some examples and commands that we'll use in class.  Check back for updates.

Boxplots

Suppose you have two variables, x and y, and you want to make side-by-side boxplots to compare them:

Correlation/Covariance

Nothing is ever easy... To compute correlations, you must first compute the covariance. To
do this, you need to compute the covariance matrix, which is a matrix that has the variances of
x and y on the diagonal, and the covariance on the off-diagonal.  I'm going to do this the
slow, simple, imprecise way, but you can combine these into one step to get a more precise
answer.  Also, if you want to try your hand at writing functions, you can write your
own correlation function. (I'm not going to have time to teach this until much later, though. So you're
on your own.)

This example assumes you have variables named weight and height, perhaps from the classdata.  But any
two variables with equal lengths will do.

  1. 1.  (def  ourmat (covariance-matrix weight height))
  2. 2. (print-matrix ourmat)
  3. 3. (def sweight (standard-deviation weight)
  4. 4 (def sheight (standard-deviation height)
  5. 5 (/ -10.94 (* sweight sheight))

Notes:

#1 creates a covariance matrix named "ourmat".  #2 prints it.  The off-diagonals are the
covariance.  #3 defines the standard deviation of weight as "sweight" (this is just the square
root of the first entry in the matrix).  #3 does the same for height.  #5 computes the correlation. The
-10.94 is the off-diagonal entry in "ourmat".  This is the sample covariance.  I don't know how to extract this directly
from the matrix, so I just typed it in.
 

A Word of Caution:  the class data contain missing values, which are coded as -9.  In practice, these
need to be removed before computing any statistics.  For now, we assume  there are no missing observations.

Regression

Assume you already have two lists of equal length called x and y. We have two things to handle here: performing a regression and viewing the output. To do both things requires learning a new sort of lisp object. Do the following: This will provide simple output: coefficients, correlation coefficient, and estimate of error standard deviation. To do more complicated displays: The first step created a "regression object" called myregress. Now that you have such an object, you can send it messages. In return, it will respond to your messages. To send a message, you type "(send myregress :message)" where the message you wish to send follows the colon. What messages can you send? Well, in the example above we sent the message "help". In return, we got a list of all allowable messages that we could have sent. For example, "coef-estimates" would have returned the values of the estimated slope and intercept. "plot-residuals" would have plotted the residuals.

Generating Data

You can make up your own data from various probability distributions.
(uniform-rand 50)
creates 50 observations from a (continuous) uniform distribution on the interval (0,1)
(binomial-rand 50 5 .3)
generates a sample of size 50 from 5 coin tosses with a coin whose probability of landing "heads" is 0.30.
(normal-rand (list 10 10 10))
generates a list of three lists, each of which has ten standard normal random observations.
(sample (iseq 1 20) 5)
draws a random sample of size 5, without replacement, from the integers 1 to 20.
(sample (iseq 1 20) 5 t)
does the same sample, but now with replacement.

Saving Work

If you want to record everything you do, type

To save variables (for example, suppose you have (def height (list 1 4 5)) and (def weight (list 3 5 6)) ):

Help

Online help is not great in xlispstat, but there are some choices:

Binomial Density

Other densities follow similar commands.

Normal Density

Xlispstat does the standard normal, and leaves you to compute the rest. Try (help* normal) for a full listing.

Missing Data

If your data set has missing data (for example, the "student" data set from HW 8), you'll need to treat the data specially. There are statistical concerns here (sometimes ignoring missing data can introduce bias), but in this case its safe to just ignore the missing values. But you have to have a way to tell the computer to ignore them. The first step is to save the data and then edit it with your text editor. In this case the missing values are indicated by a "." where the number should be. (That's a period, by the way.) Because xlispstat has a hard time mixing numbers with characters (as do most statistical packages), a standard technique is to replace the "." with a value that could not possibly be part of the data. In this case, since the data are SAT scores and therefore positive, replacing "." with any negative number will do. A common choice is -9. Use the "search and replace" feature of your text editor, but be careful! If you just type ".", then the editor will also replace decimal points, as well as periods! You can get around this by replacing " . ", that is, space-period-space, with a -9. Next, enter xlispstat and download the data as usual (using read-data-columns). Define your two sat variables, say Now we need to create new variables that do not have the -9's in them. The command to do this is
(def satverb (select satverb (which (/= -9 satverb))))
And you just replace "satmath" for "satverb" to do the same for the satmath variable. This is how this works:
(/= -9 satverb)
This command returns a list of T's and F's: T if the ith element of satverb is not equal to -9, F if it is.
(which ...)
"which" returns the indices which have T's. Hence if the first command returns (T T F F T F), then "which" returns (0 1 4). (Remember that the first element of a list has index 0, not index 1.)
(select satverb (which ...))
You've seen this command before. (select list index) selects only those items in "list" that are listed in "index". In this case, "list" is "satverb" and "index" is the result of the "which" command.
Be sure to create a new vector for gender that ignore the same missing values:

Selecting Based on Variable Values

The "student" data set for HW8 contains a variable for gender (1 for female, 0 for male) and another for sat score. To compare the sat scores for men and women, you need to create two new variables, satm and satw, for example, so that satm has the sat scores only for the men, and satw only for the women. We assume that the gender variable and the sat variable are the same length, so if you removed missing values from sat, you have to remove the SAME entries from gender. (Note that gender is not missing any values, so you have to remove the same one's that are missing from either satmath or satverb (they are the same in this case) to make sure both variables are the same length.) The following example creates a list of sat verbal scores for women.

Using XLISPSTAT as a Calculator

The trick here is to remember that operations come first, so 3+4 is (+ 3 4).