Introduction to Software
Stats 110A and 110B
In these modern times, Statistics is unthinkable without computers. So it's essential that you spend some time working on a computer to really understand what data analysis is all about. There are many, many software packages out there all designed for a variety of different needs and price ranges. These packages all share similar features, and so learning one package will help you learn others more quickly. Still, some are easier to learn than others.
For this course, I wanted to find a software package that you could "take home" with you as cheaply as possible. This package, Arc, is free and runs on both PCs and Macs. The drawback is that it is designed to implement a limited set of analysis tools, and so (particularly in the 110B course) will not always be useful. However, Arc is actually a "front end" to another package, X-lispstat, that is quite flexible. The price for flexibility is, sadly, a very steep learning curve. So the purpose of this introduction is to get you started on the learning right away.
Two Ways to Compute
There are two ways, or maybe it's more accurate to say two places, that you can meet your computing needs. If you have your own computer, or access to a friend's, you can do all of your work at home. Otherwise, you can use the computers in the Science Learning Center on the fourth floor of Young Hall. Not all of their computers will have the correct software installed, and I'll give you more information about this in class. If you don't have your own computer and you don't have access to someone's, you can skip the next section.
Installing Arc on your own Computer
Before you begin, you must know some things: Do you have a Mac or a PC? You also must have a modem with Netscape or the Microsoft browser running.
If you have a Mac, this only works on 68020 or better processors (roughly translated this means your computer is not much more than 4 years old.) You also need an installer utility called Stuffit, which is available at the same place as Arc. First, check to see if your computer already has Stuffit. If it doesn't, go to
http://www.stat.umn.edu/arc/macintosh.html, click on the link for Stuffit, and follow the instructions.When you're ready to install on your Mac, go to
http://www.stat.umn.edu/arc/macintosh.html and click on ArcMac.sea.hqx, and follow the instructions.If you have a PC, you need Windows 3.11, 95, 98, or NT. Go to
http://www.stat.umn.edu/arc/windows.html and follow the directions.What is it?
Xlispstat is best described as an interactive programming language designed for statisticians. "Interactive" means that whenever you type something in, it responds. And "programming language" means that it sometimes takes a little programming to get it to do what you want it to, but its designed to make it easier to do statistics. Xlispstat is based on the old programming language LISP, which some claim stands for Lost In Silly Parentheses. You'll see why soon enough.
Because xlispstat is so awkward to use, particularly for beginners, several people have written more user-friendly "front-ends". Arc is one such front end. For many purposes, you need never know a thing about xlispstat. But it does help if you do.
Where can I get help?
You might want to download "A Surfer's Guide to Lisp-Stat", which can be found at
http://samizdat.mines.edu/surfers/. I'll photocopy excerpts in class, so you don't need to download it if you don't want to. This is a little harder to download, because you need a printer that can handle postscript.You can also
get more documentation than you know what to do with from
http://stat.umn.edu/~luke/xls/xlsinfo/xlsinfo.html.But for the most part, you will get enough directly from me and don't need outside sources of help. Usually, the Surfer's Guide goes into more detail than we'll need.
Lesson 1: Getting Started
You should read this next section at the computer, working along with the text to see how the computer responds.
Lesson 1a: Basic Computations (See 2.4 of Surfer's Guide)
Start Arc. (On a Mac, this means you double-click on the Arc icon.) A window will come up with a cursor prompt that looks like this: >
Xlispstat has the following format, although there are wrinkles. To do anything you type:
(operation element1 element2 ...)
where "operation" is a function like "sum" or "mean" or "+". After the operation, you write the elements that are required by the operation. Usually, (always, actually), elements are something called a list. Lists are lists of numbers, for the most part. Some lists have only one item, some have more.
Here are some examples. Try them out yourself to see what happens. (Don't type the ">".)
>(+ 1 3)
>(- 1 3)
>(sin pi)
>(cos pi)
(+ (* 3 4) (/ 6 3))
You can also create variables that are assigned specific values:
>( def fred 13)
>fred
>(def wilma '(1 2 3 4 5))
> wilma
> (sum wilma)
(mean wilma)
Wilma might need some explaining. First, you'll notice that there's a quote mark: ' before that list of numbers. The reason for this is that xlispstat ALWAYS evaluates the first thing after a parenthesis as if it were an operation. But "1" is not an operation, it is an object. So the quote acts to prevent the evaluation. Think of operations as verbs and objects as nouns. Xlispstat always wants a verb after the left parenthesis. But if there's a noun there instead, then it requires a quote in front to make xlispstat understand that its really a noun. Try this:
>(1 2 3 4 5)
>'(1 2 3 4 5)
The first case doesn't work because "1" is not a function. The second one works; we've basically just typed in a list, and so xlispstat gives it back to us.
Fred is just a number; a scalar. Wilma is a list of numbers. Usually, since we're working with data, we'll be working with lists. Notice that the function "sum" adds all the numbers in the list, and "mean" gives the average.
Lesson 2a: Inputting Data
There are three ways you can enter data into Arc and/or xlispstat. (1) You can type in each variable yourself into xlispstat. (2) You can type a file using some sort of text editor (such as Word) and then download it into "Arc", (3) you can download someone else's file into Arc.
The first method is not recommended because it can take forever (some data sets have thousands of entries!) and because while xlispstat lets you do it, you can't (easily) use what you've typed for Arc. Here's the preferred method:
1) Edit a data file. For example, suppose we've collected the height and weight of everyone in this class. We'll enter the heights in the first column and the weight in the second. This means that each row of this file will represent a person. The first number will be that person's height, the second their weight. The numbers must be separated by at least one space or a tab. Save this file as text only; use the "Save As..." feature and make sure you choose text only. Give it a name, like myclass.dat.
Try this now. Open up a file and put the following numbers into it:
60 155
72 199
68 120
And call it "test.dat".
2) Go to the menu. Under either File or Arc, select Load. A dialog box will open up, and here you give it the file you want to load. Try it with test.dat.
Arc will show you three dialogue boxes in sequence. The first will ask you to name and describe the data set. You can just hit "OK" to accept the defaults, or you can give it a name (like "test"). Next, it will ask you to name "Var0" and will show you an example of some values. Type "Height" over the text "Var0". Next, it will ask you to name "Var1". Type "weight".
You'll now a new menu heading labelled "Test".
To download other people's data files, you follow the same procedures. You must make sure that the file is in the correct format: each entry separated by space, each object in its own row with the same number of entries in each row, and the file must be a text file.
Getting Data
In this class, I'll often make data available over the internet. When this happens, you need to open up the page in your browser, select "Save As" and save it in your own directory. It will then be ready for input. You'll see some examples of this next week.
Lesson 1c: Basic Graphics
This is just to give you a quick demo of some of the graphics functions. Assuming that you've followed the above instructions and loaded "test.dat", go to the xlispstat window and type:
>(plot-points height weight)
>(histogram height)
>(histogram weight)
You'll get some plots. We'll do much more with these next time.
Lesson 1d: Quitting
If you just select "quit" form the file menu, your variables will not be saved. To save them, you must first do the following: go the menu item that says either "Dataset" or "Test" (It will say whatever name you gave for the dataset when you first loaded it.) Choose "Save data set as..." and give it a name. For example, suppose you called it "test". You can now quit safely.
The next time you begin Arc and want to look at this test data, choose "Load" as before. This time, select test.lsp. Your variable names will now be saved and you can continue your analysis.
Exercises
1. Define a variable x and give it the value 15.8: (def x 15.8)
2. Calculate (^ x 2) (which is x squared)
3. Calculate (exp x) (which is e to the x)
4. Calculate e to the minus x-squared.
5. Create a file named "coinflips.dat" using a word processor or text editor. Flip a coin 10 times and on the first line of this file write the number of heads. Repeat and put the number on the second line. Do this five times, total.
6. Load your file into Arc.
7. What's the average number of heads you got?
8. Make a histogram of the number of heads.