xlispstat glossary

Some examples and commands that we'll use in class. Check back for updates.

Boxplots

Suppose you have two variables, x and y, and you want to make side-by-side boxplots to compare them:

1. (boxplot (list x y))

Correlation/Covariance

Nothing is ever easy... To compute correlations, you must first compute the covariance. To
do this, you need to compute the covariance matrix, which is a matrix that has the variances of
x and y on the diagonal, and the covariance on the off-diagonal. I'm going to do this the
slow, simple, imprecise way, but you can combine these into one step to get a more precise
answer. Also, if you want to try your hand at writing functions, you can write your
own correlation function. (I'm not going to have time to teach this until much later, though. So you're
on your own.)

This example assumes you have variables named weight and height, perhaps from the classdata. But any
two variables with equal lengths will do.

1. (def ourmat (covariance-matrix weight height))
2. (print-matrix ourmat)
3. (def sweight (standard-deviation weight)
4 (def sheight (standard-deviation height)
5 (/ -10.94 (* sweight sheight))

Notes:

#1 creates a covariance matrix named "ourmat". #2 prints it. The off-diagonals are the
covariance. #3 defines the standard deviation of weight as "sweight" (this is just the square
root of the first entry in the matrix). #3 does the same for height. #5 computes the correlation. The
-10.94 is the off-diagonal entry in "ourmat". This is the sample covariance. I don't know how to extract this directly
from the matrix, so I just typed it in.

A Word of Caution: the class data contain missing values, which are coded as -9. In practice, these
need to be removed before computing any statistics. For now, we assume there are no missing observations.

Regression

Assume you already have two lists of equal length called x and y. We have two things to handle here: performing a regression and viewing the output. To do both things requires learning a new sort of lisp object. Do the following:

(regression-model x y)

This will provide simple output: coefficients, correlation coefficient, and estimate of error standard deviation. To do more complicated displays:

(def myregress (regression-model x y :print nil))
(send myregress :help)

The first step created a "regression object" called myregress. Now that you have such an object, you can send it messages. In return, it will respond to your messages. To send a message, you type "(send myregress :message)" where the message you wish to send follows the colon. What messages can you send? Well, in the example above we sent the message "help". In return, we got a list of all allowable messages that we could have sent. For example, "coef-estimates" would have returned the values of the estimated slope and intercept. "plot-residuals" would have plotted the residuals.

Generating Data

You can make up your own data from various probability distributions.

(uniform-rand 50): creates 50 observations from a (continuous) uniform distribution on the interval (0,1)
(binomial-rand 50 5 .3): generates a sample of size 50 from 5 coin tosses with a coin whose probability of landing "heads" is 0.30.
(normal-rand (list 10 10 10)): generates a list of three lists, each of which has ten standard normal random observations.
(sample (iseq 1 20) 5): draws a random sample of size 5, without replacement, from the integers 1 to 20.
(sample (iseq 1 20) 5 t): does the same sample, but now with replacement.

Saving Work

If you want to record everything you do, type

(dribble "myfile")
Everything you type and all output goes into this file. You end the recording by typing
(dribble)

To save variables (for example, suppose you have (def height (list 1 4 5)) and (def weight (list 3 5 6)) ):

(savevar 'height "height") OR for more than one:
(savevar '(height weight) "myvariables")
You can now exit safely. When you return to xlispstat, type
(load "myvariables") to retrieve the information.
(variables) gives you a list of all defined variables.
(undef 'height) removes, or "undefines" the variables height.

Help

Online help is not great in xlispstat, but there are some choices:

If you know the name of a function, for example, "median", then (help 'median) will tell you something about it. (Note the quote before the function name.)
If you know the name of part of the function, for example, you know it has "norm" in it, use (help* 'norm) to get help for ALL functions with a "norm" in them.
If you know the name of part of the function, but don't want help for ALL functions with that name in them, type (apropos 'norm) to get a list of what these functions are, and then use (help 'function) to get help for the one you want.
The help function uses particular notation to tell you about the functions. For example, you'll see something like Args: (x y z), which means it requires 3 input arguments. You might also see Args: (x &optional y (z t)), which means that x is required, but y and z are optional, and you can call (function-name x), (function-name x y), or (function-name x y z). But NOT (function-name x z). The (z t) means that if you DON'T input z, it will be given the value "t".)

Binomial Density

Other densities follow similar commands.

To calculate the probability that X=k, where X is a binomial random variable with parameters n (number of coin flips) and p (probability of a Head), type (binomial-pmf k n p). pmf stands for probability mass function.
To calculate the Cumulative Distribution Function at any value of x, type (binomial-cdf x n p)
For example, a "loaded" coin lands on its Heads with probability .4. Suppose we toss the coin 10 times? What's the probability of getting exactly 5 heads? What's the probability of getting 5 or fewer heads? What's the probability of getting more than 8 heads.
1. (binomial-pmf 5 10 .4) returns 0.200658
2. (binomial-cdf 5 10 .5) returns .833376
3. (- 1 (binomial-cdf 8 10 .4)) returns .001677772

Normal Density

Xlispstat does the standard normal, and leaves you to compute the rest. Try (help* normal) for a full listing.

To compute the probability of being less than or equal to x, use (normal-cdf x) (Question: What if your rv X has mean 10 and SD 3?)
To compute the pth percentile, use (normal-quant p)
To generate a list of n numbers from a standard normal distribution, type (normal-rand 5)

Missing Data

If your data set has missing data (for example, the "student" data set from HW 8), you'll need to treat the data specially. There are statistical concerns here (sometimes ignoring missing data can introduce bias), but in this case its safe to just ignore the missing values. But you have to have a way to tell the computer to ignore them. The first step is to save the data and then edit it with your text editor. In this case the missing values are indicated by a "." where the number should be. (That's a period, by the way.) Because xlispstat has a hard time mixing numbers with characters (as do most statistical packages), a standard technique is to replace the "." with a value that could not possibly be part of the data. In this case, since the data are SAT scores and therefore positive, replacing "." with any negative number will do. A common choice is -9. Use the "search and replace" feature of your text editor, but be careful! If you just type ".", then the editor will also replace decimal points, as well as periods! You can get around this by replacing " . ", that is, space-period-space, with a -9. Next, enter xlispstat and download the data as usual (using read-data-columns). Define your two sat variables, say

(def satverb (select student.dat 2)) (Remember that xlispstat starts counting with 0, not with 1.)
(def satmath (select student.dat 3))

Now we need to create new variables that do not have the -9's in them. The command to do this is (def satverb (select satverb (which (/= -9 satverb)))) And you just replace "satmath" for "satverb" to do the same for the satmath variable. This is how this works:

(/= -9 satverb): This command returns a list of T's and F's: T if the ith element of satverb is not equal to -9, F if it is.
(which ...): "which" returns the indices which have T's. Hence if the first command returns (T T F F T F), then "which" returns (0 1 4). (Remember that the first element of a list has index 0, not index 1.)
(select satverb (which ...)): You've seen this command before. (select list index) selects only those items in "list" that are listed in "index". In this case, "list" is "satverb" and "index" is the result of the "which" command.

Be sure to create a new vector for gender that ignore the same missing values:

(def gender (select student.dat 0))
(def gender2 (select gender (which (/= -9 satverb))))

Selecting Based on Variable Values

The "student" data set for HW8 contains a variable for gender (1 for female, 0 for male) and another for sat score. To compare the sat scores for men and women, you need to create two new variables, satm and satw, for example, so that satm has the sat scores only for the men, and satw only for the women. We assume that the gender variable and the sat variable are the same length, so if you removed missing values from sat, you have to remove the SAME entries from gender. (Note that gender is not missing any values, so you have to remove the same one's that are missing from either satmath or satverb (they are the same in this case) to make sure both variables are the same length.) The following example creates a list of sat verbal scores for women.

(def satv (select student.dat 2))
(def gender (select student.dat 0))
(def satv2 (select satv (which (/= -9 satv))))
(def gender2 (select gender (which (/= -9 satv))))
(def satvfemale (select satv2 (which (= 1 gender2))))

Using XLISPSTAT as a Calculator

The trick here is to remember that operations come first, so 3+4 is (+ 3 4).

Adding two lists: if x and y are each lists of the same length, then (+ x y) results in a new list in which each x_i was added to y_i
Exponents: (^ 4 3) for example, is 4 raised ot the 3rd power.
(^ 4 (/ 1 2)) is 4 raised to the 1/2th power (in other words, the square root of 4.)
(log 10) gives the log
(exp y) is the inverse of the natural log