1. Final projects and lecture cancellations. 2. 2-d kernel smoothing. 3. Simulating from a density by rejection sampling. 4. Maps. 5. R Cookbook ch. 8. 6. R Cookbook ch. 9. 7. R Cookbook ch. 10. 1. Final projects and lecture cancellations. Read up through the end of ch11 in R Cookbook for next class. There will be no lecture Thur Nov 1 and Tue Nov 6, election day. Thur, Nov 29 1. MALEKMOHAMMADI, MAHSA . CHUGH, KARAN . XIA, FANGTING . 2. BELOGOLOVA, HELEN . WANG, YUTING . ELLIOTT, PETER WILLIAM . 3. LIANG, MUZHOU . CHEN, YU-CHING . CARLITZ, RUTH DENALI . 4. DANIELSON, AARON . CUNG, BIANCA . CARLEN, JANE . 5. HUANG, MING-CHUN . KARYEKAR, AMIT SUHAS . SHI, HAOLUN . 6. CHANG, HUA-I . TRICKEY, KIRSTEN ALEXANDRA . SKIPPER, PETER TYREL . Tues, Dec 4 7. ZHU, HUA . KIM, BRIAN . LACOUR, MICHAEL JULES . 8. UPPALA, MEDHA . SHEN, ZHIYUAN . LU, YANG . 9. HUANG, YAFEI . CHOI, ICK KYU . LIU, JIA LIN . 10. GLEASON, TANIA MARIE . LUO, YIYANG . JASWANI, DRUMIL PARAS . 11. LI, JINCHAO . CHEN, RUI . WANG, YAN . 12. WANG, CHENHUI . GAURAV, ADITYA . TCHEMODANOV, NATALIA . 13. XIE, JIANWEN . XIAO, QIAN . BUDHRANI, VINEESHA . Thur, Dec 6 14. CAO, RENZHE . WANG, WENJIA . ZHOU, YUXI . 15. VASILYEV, ALEXANDER . LIAN, XIAOCHEN . TSUI, JASON . 16. BAE, JOONBUM . LI, LU . NAYAK, SHILPI . 17. RICHMOND, ROBERT JAMISON . LI, ZHONGBO . ZHOU, YUAN . 18. ARFA, JONATHAN MICHAEL . HO, KING CHUNG . WOLF, JEFFREY ARIEN . 19. KAPADIA, HIRAL . NIE, XIAOHAN . CHEN, XIANJIE . !!! YOU CAN ALSO EMAIL ME AND I WILL PUT YOU IN TOUCH WITH YOUR TEAMMATES!!! 2. 2-d kernel smoothing install.packages("splancs") library(splancs) ## First, input 23 points using the mouse. n = 23 plot(c(0,1),c(0,1),type="n",xlab="longitude",ylab="latitude",main="locations") x1 = rep(0,n) y1 = rep(0,n) for(i in 1:n){ z1 = locator(1) x1[i] = z1$x y1[i] = z1$y points(x1[i],y1[i]) } bdw = sqrt(bw.nrd0(x1)^2+bw.nrd0(y1)^2) ## a possible default bandwidth b1 = as.points(x1,y1) bdry = matrix(c(0,0,1,0,1,1,0,1,0,0),ncol=2,byrow=T) z = kernel2d(b1,bdry,bdw) attributes(z) par(mfrow=c(1,2)) image(z,col=gray((64:20)/64),xlab="km E of origin",ylab="km N of origin") points(b1) x4 = seq(min(z$z),max(z$z),length=100) plot(c(0,10),c(.8*min(x4),1.2*max(x4)),type="n",axes=F,xlab="",ylab="") image(c(-1:1),x4,matrix(rep(x4,2),ncol=100,byrow=T),add=T,col=gray((64:20)/64)) text(2,min(x4),as.character(signif(min(x4),2)),cex=1) text(2,(max(x4)+min(x4))/2,as.character(signif((max(x4)+min(x4))/2,2)),cex=1) text(2,max(x4),as.character(signif(max(x4),2)),cex=1) mtext(s=3,l=-3,at=1,"density (pts/km^2)") 3. Simulating from a density by rejection sampling. To do simulation by rejection sampling, you need the density f(x) you're interested in to have the property f(x) <= b g(x), where b is some constant and g(x) is some density that's easy to sample from. For instance, if f(x) is known to have some known maximum c, and finite support of length d, then you can use g(x) is the uniform distribution, and let b = cd. Basically, the way it works is you simulate x0 from g(x), keep it with probability f(x0)/[bg(x0)] and delete it otherwise, and repeat. First, for simplicity, suppose the density has a nice closed form, like the triangular density on x in (0,2), f(x) = 1 - |x-1|, Here c = 1, and the support is (0,2), so d = 2, and b = cd = 2. ## Plot the density x = seq(-1,3,length=101) y = rep(0,101) y[(x>0) & (x<2)] = 1 - abs(x[(x>0) & (x<2)] - 1) plot(x,y,type="l",ylab="f(x)",lty=2) To simulate from this density using rejection sampling a) Simulate a draw x0 from g(x) = uniform on (0,2) = 1/2 on (0,2). b) Keep x0 with probability f(x0) / [bg(x0)] . c) Repeat. ## 10000 draws from the triangular f(x) b = 2; g = 0.5 n = 10000 x = c() ## these will contain the results. i = 0 while(i < n){ x0 = runif(1)*2 fx0 = 0.5 * (1 - abs(x0 - 1)) if(runif(1) < fx0 / (b*g)){ ## keep x0 i = i+1 x[i] = x0 if(i/100 == floor(i/100)) cat(i, " ") } } hist(x,nclass=40,prob=T,add=T) What if you want to sample from a more complicated density f(x), such as a kernel smoothing? Again, let c = the maximum of f(x), which you can find by looking at a plot of f(x). The support might not be finite, technically, but you can often find a finite region that effectively contains almost all the support of f(x). Let's use the simple example from last lecture. x1 = c(1,2.8,3) grid1 = seq(-2,6,length=101) bw1 = bw.nrd(x1) y = density2(x1,grid1,bw1) plot(grid1,y,type="l",xlab="x",ylab="density") Suppose we want to simulate 30,000 draws from this density f(x). The density clearly does not go higher than 0.5 anywhere, and (-2,6) is plenty wide to contain almost all the support of f(x). So, c = 0.5, d = 8, b = cd = 4. The only hard part is that f(x0) now needs to be computed for each x0. b = 4; g = 1/8 n = 30000 x = c() ## these will contain the results. i = 0 while(i < n){ x0 = runif(1)*8 - 2 ## this simulates the uniform g(x) on (-2,6) fx0 = density2(x1,x0,bw1) ## this computes f(x0) if(runif(1) < fx0 / (b*g)){ ## keep x0 i = i+1 x[i] = x0 if(i/100 == floor(i/100)) cat(i, " ") } } hist(x,nclass=40,prob=T,add=T,density=10) 4. Maps. Map of California counties. install.packages("maps") library(maps) map('county', 'California') plot(seq(-118,-116,length=10),seq(33,35,length=10)) map('county', 'California', add=T) ## Hard to make it look good. 5. R Cookbook ch8. dnorm(), pnorm(), qnorm(), rnorm() are the normal pdf, cdf, quantile function, and simulator functions, respectively. dnorm(1.2) 1/(sqrt(2*pi))*exp(-1.2^2/2) pnorm(1.96) pnorm(0) qnorm(.5) qnorm(0.975) ## see p190 qnorm(0) qnorm(1) qnorm(1.1) ## error rnorm(3) The default is mean = 0, sd = 1, but you can set these. rnorm(3, mean = 50, sd = 10) qnorm(0.975, mean = 50, sd = 10) rnorm(3, 5, 40) ## mean = 5 and sd = 40 rnorm(3, 5, mean=40) ## mean = 40 and sd = 5 You can substitute "norm" for "binom", "geom", etc. See tables on pp177-178. We've seen "unif" already. See the warning about the rexp() on p178. The parameter is the rate 1/beta, not the mean beta. rexp(3, 10) rexp(3, 0.1) The arguments can be vectors instead of scalars, and R will cycle through them one at a time, as described on p182. rexp(3, c(.01, 10, .02)) signif(rexp(3, c(.01, 10, .02)),3) signif(rexp(3, c(.01, 10)) ,2) ## it cycles through, so the 3rd has mean 100. The note about ?Normal and ?TDist on pp 178-179 is silly. You can use help() with a function as usual. For instance, functions for simulating normal or t distributed random variables are rnorm() and rt(), and you can do help(rnorm) or help(rt). You just can't do help(norm) or help(t) to get these help files, because norm() and t() are matrix operator functions [norm and transpose, respectively]. p179, choose(n,k) gives the number of combinations. p180, combn(1:5,3) or combn(5,3) gives the actual combinations. See for instance combn(4,3) combn(c(8,-5,40,20),2) combn(c(8,8,40,20),2) p184, sample(1:10, 5) chooses 5 items randomly from the vector (1:10). sample(1:10,5) By default it is without replacement, but you can do sample(1:10,5, rep=T) sample(1:10,c(5,4,2)) ## doesn't do what you'd think. Only the 5 is read. If you want to generate 3 samples of different sizes from the vector 1:10, you could do y = c(5,4,2) x = list() for(i in 1:3){ x[[i]] = sample(1:10,y[i]) } You can also have sample choose elements with different probabilities, by giving it as an argument a vector of probabilities, as on p185. sample(1:10, 5, rep=T, prob = (1:10)/55) p187, diff() calculates the differences between successive elements of a vector. It returns a vector of length n-1, if your initial vector had length n. Note that when your vector is a list of times, often you want to sort your vector before using diff(). t = rpois(10, lambda=20) diff(t) t1 = sort(t) t1 diff(t1) pp 191-192 show an interesting example of plotting densities. He calculates the density on a vector x of points all at once. dnorm((1:10)/10) We can do this with the exponential distribution. x = seq(0,10,length.out=100) y = dexp(x) plot(x,y, main="Exponential distribution", type="l", ylab="density", xlab="value") Note that he uses the polygon() function to shade in a region. region.x consists of all the x-coordinates of the region. region.y consists of all the y-coordinates of the region. x1 = x[1 <= x & x <= 3] region.x = c(min(x1), x1, max(x1)) ## Teetor uses x1[1] and tail(x1,1) instead of min() and max() y1 = y[1 <= x & x <= 3] region.y = c(0,y1,0) polygon(region.x, region.y, col="red", density = 20)