1. Hw etc. 2. Plotting the sample mean. 3. R Cookbook. No late hw will be graded, after hw1. For hw1, you can hand it in til Thur 10:30am but there will be deductions if it's late. Discuss hw2. -- Hw2 is on the course website http://www.stat.ucla.edu/~frederic/202a/F11 in word and pdf, and is be due Tue, Oct 30, 1030am. Late homeworks will not be accepted! We will talk about kernel smoothing on the 16th and 18th. Loading in the housing data. I took out the first line about the overrall county, I removed spaces from the city names, and I removed $s and %s. x = scan("LAhousingprices.txt",skip=2,what="char") z = matrix(x,ncol=9,byrow=T) y = as.numeric(z[,3]) x1 = as.numeric(z[,4]) x2 = as.numeric(z[,7]) x3 = as.numeric(z[,9]) z1 = (!is.na(y) & !is.na(x1) & !is.na(x2) & !is.na(x3)) myy = y[z1] myx1 = x1[z1] myx2 = x2[z1] myx3 = x3[z1] Read up through ch7 in R Cookbook. 2. Plotting the sample mean. cumsum() outputs the cumulative sum from i = 1 to k of a vector, as k goes from 1 to n. ## Suppose we want to plot the sample mean of 200,000 iid N(0.12, 1)s. n = 200000 x = rnorm(n, mean=0.12, sd = 10) y = cumsum(x)/(1:n) plot(y) ## this is useful to see if the sample mean has converged. ## problems: a) the line is so thick you can't see what it's converged to. ## b) the y-axis is too broad to see what it's converged to. ## c) the x and y labels. plot(c(1,n), c(-0.08, 0.16), type="n", xlab="k", ylab="") mtext(s=2,l=2,cex=0.7, expression(paste(frac(1,k), " ",sum(x_i,i==1, k)))) points(1:n, y, pch=".") ## or lines(1:n,y) abline(h=mean(x),lty=2) mean(x) ## By the central limit theorem, ## the standard error is sigma/sqrt(n) = 1/sqrt(200000) ~ 0.002236, ## and a 95% range for mean(x) is 0.12 +/- 1.96/sqrt(200000) ~ (0.1156, 0.1244) se2 = 1.96*10/sqrt(1:n) lines(1:n,y+se2,lty=2,col="blue") lines(1:n,y-se2,lty=2,col="blue") 2. R Cookbook. You do x[[1]] or x[[2]] to get the first or 2nd objects in the list. p91 illustrates the useful function paste(), which merges character strings and numeric objects. x = 1 y = 3.4 z = paste("The answer to problem",x,"is",y,".") cat(z) ## Note that paste adds a space between elements by default. ## You can change sep to "" to change this. z = paste("The answer to problem",x,"is",y,".",sep="zz") cat(z) You can extract part of a character string using substr(). substr(z,5,10) a = "abcdefghij" substr(a,3,5) ## Done with Chapter 4. p 96 x[[2]] = the 2nd element of a list, but strangely, x[2] yields a list containing the 2nd element of x. The single bracket outputs a list. See pp 110-111 for more on this. p 97, print(x) outputs different things depending on what x is. p98 gives an example of printing a vector and a matrix. For 3-d arrays, use dim(). See p99. D = 1:12 dim(D) = c(2,3,2) print(D) D[1,2,1] ## note that the output is 3, not 2. Data frames, p 100. Basically a table of typical regression-type data. Similar to a list. p103, the recycling rule. When the shorter vector is done, R goes back to its beginning, recycling its elements. x = 1:6 y = 1:3 x + y As noted on p105, when x is a vector and y is a scalar, x+y is just a special case of the recycling rule. p106, examples of factors, for categorical variables. p107, stack() seems useful, to combine your data into 2 big vectors. x = runif(2); y = rnorm(4); z = rexp(2) w = stack(list(uniform = x, normal = y, exponential = z)) w p114, use NULL to remove an element of a list. w = list(x = 1:2, y = rnorm(4), z = 3:4) w w$y = NULL w p115, unlist() takes a list and makes it into a big vector. cat(w) ## doesn't work on lists. w2 = unlist(w) cat(w2) p118, matrix() m = matrix(w2, 2, 2) m Note that by default, the elements are read in column by column. m2 = matrix(w2, 2, 2, byrow=T) m2 p119-120, some matrix operations. t(m) ## transpose. m %*% m2 ## matrix multiplication. m * m2 ## element by element multiplication. solve(m) ## inverse diag(m) ## diagonal help("%*%") p120, rownames() and colnames(). rownames(m) = c("ORANGE", "KIWI","blue","brown") colnames(m) = c("YELLOW", "GREEN","black","white") m m2 p122, data.frame() to create a new dataframe from vectors and factors, and as.data.frame() to coerce a list or matrix into a dataframe. p124 describes how to get a list of rows of data into one data frame, using y = do.call(rbind, x) This is when x is already a list of dataframes. Also, on p125 Teetor says how to get a list of lists into one data frame, using Map(). y = do.call(rbind, Map(as.data.frame, x)) Map(as.data.frame, x) takes your list x of lists and makes it a list of data frames, and then do.call(rbind, ...) takes this list of dataframes and binds them into one big dataframe. If this stuff is confusing to you, don't worry about it. In my experience it very rarely comes up. p127 describes getting a subset of a dataframe. It's similar to lists. x[[n]] gets the nth column, and x[n] returns a dataframe consisting of the nth column. pp128-129 give a nice example. names = c("abigail", "barbara", "carl", "david") location = c("westwood", "santa monica", "venice", "palms") status = c("undergrad", "phd", "masters", "undergrad") grade = c(1,2,3,4) x = data.frame(names, location, status, grade) x x$grade ## the 4th column, as a vector x[[4]] ## the same thing x[4] ## a dataframe consisting only of the 4th vector x[c(1,2,4)] ## the same as x without the 3rd column. x[[c(1,2,4)]] ## error