1. Hw etc. 2. R Cookbook. 1. Hw etc. Read up through ch8 in R Cookbook. We will be using Absolute C++ by Walter Savitch in a couple weeks. I have the 1st ed. We will be using it mainly for reference. It might not be necessary to buy it. Plotting the sample mean. ## Suppose we want to plot the sample mean of 200,000 iid N(0.12, 1)s. n = 200000 x = rnorm(n, mean=0.12, sd = 1) y = cumsum(x)/(1:n) plot(y) ## this is useful to see if the sample mean has converged. ## problems: a) the line is so thick you can't see what it's converged to. ## b) the y-axis is too broad to seeing what it's converged to. ## c) the x and y labels. plot(c(1,n), c(0.08, 0.16), type="n", xlab="k", ylab="") mtext(s=2,l=2,cex=0.7, expression(paste(frac(1,k), " ",sum(x_i,i==1, k)))) points(1:n, y, pch=".") ## or lines(1:n,y) abline(h=mean(x),lty=2) mean(x) ## By the central limit theorem, ## the standard error is sigma/sqrt(n) = 1/sqrt(200000) ~ 0.002236, ## and a 95% range for mean(x) is 0.12 +/- 1.96/sqrt(200000) ~ (0.1156, 0.1244) 2. R Cookbook. p 96 lst[[2]] = the 2nd element of a list, but strangely, lst[c(2,3)] yields the 2nd and 3rd elements of the list. The single bracket outputs a list. See pp 110-111 for more on this. lst = list(x = rep(1,5), y = rep(2,4), z = 1:7) p 97, print(x) outputs different things depending on the class of x. p98 gives an example of printing a vector and a matrix. 3-d arrays, p 99, using dim(). D = 1:12 dim(D) = c(2,3,2) print(D) D[1,2,1] ## note that the output is 3, not 2. Data frames, p 100. Basically a list representing a table of typical regression-type data. p103, the recycling rule. When the shorter vector is done, R goes back to its beginning, recycling its elements. p104 is useful. x = 1:6 y = 1:3 x + y p105, interesting to note that x + y, when x is a vector and y is a scalar, is just a special case of the recycling rule. p106, examples of factors. p107, stack() seems useful. x = runif(2); y = rnorm(4); z = rexp(2) w = stack(list(uniform = x, normal = y, exponential = z)) w p114, NULL to remove an element of a list. w = list(x = runif(2), y = rnorm(4), z = rexp(2)) w w$y = NULL w p115, unlist() cat(w) ## doesn't work w2 = unlist(w) cat(w2) p118, matrix() m = matrix(w2, 2, 2) m Note that by default, the elements are read in column by column. m2 = matrix(w2, 2, 2, byrow=T) m2 p119-120, some matrix operations. t(m) ## transpose. m %*% m2 ## matrix multiplication. m * m2 ## element by element multiplication. solve(m) ## inverse diag(m) ## diagonal help("%*%") p120, rownames() and colnames(). rownames(m2) = c("ORANGE", "KIWI") colnames(m2) = c("YELLOW", "GREEN") m m2 p122, data.frame() and as.data.frame(). p124 describes how to get a list of rows of data into one data frame, using y = do.call(rbind, x) This is when x is already a list of dataframes. Also, on p125 Teetor says how to get a list of lists into one data frame, using Map(). y = do.call(rbind, Map(as.data.frame, x)) Map(as.data.frame, x) takes your list x of lists and makes it a list of data frames, and then do.call(rbind, ...) takes this list of dataframes and binds them into one big dataframe. p127 describes getting a subset of a dataframe. It's similar to lists. dfrm[[n]] gets the nth column, and dfrm[n] returns a dataframe consisting of the nth column. pp128-129 give a nice example. p136, na.omit() is interesting. It removes the ROWS with NAs. rbind() and cbind() are extremely useful, pp138-139. p140, merge() combines two dataframes with the same entries in one column. Note that the the rows (like "Larry" in the example on p140) in one dataframe but not the other get removed by default. with() on p141 is kind of silly. You might as well use attach(), p142, to compute something on a subset of a dataframe. But watch out when changing an element of a dataframe after detaching it! See pp 142-143. pp144-145 show how to change modes. Kind of useful. ## Done with ch5.