0. Nov 5, HW3, and final projects. 1. pi. 2. dnorm. 3. Sum of squared differences between observations. 4. Teetor ch. 11. 5. Teetor ch. 13.1 and 13.2. 0. Nov 5, HW3 and final projects. NO CLASS TUE NOV 5 due to faculty retreat. Group 12 moved to 5. Tue, Nov 26 [1] DAS GUPTA, APARUPA . MAO, JUNHUA . WANG, PENG . [2] CAI, YUAN . SINGH, VIKRAM VIR . WU, GUANI . [3] LIU, YUNLEI . KARWA, ROHAN SUNIL . KUHFELD, MEGAN REBECCA . [4] MURPHY, JOHN LEE . HONG, JI YEON . SHAH, VAIBHAV CHETAN . [5] FERNANDEZ, KEITH EDWARD . AHLUWALIA, ANSUYA . HAROUNIAN, BENJAMIN . Tue, Dec 3 [6] Csapo, Marika . LEWIS, BRONWYN ASHLEIGH . ZHANG, QIANG . [7] NAN, XIAOMENG . ARELLANO, RYAN CHRISTOPHER . SHU, LE . [8] WANG, XINYUE . CHEN, JIAMIN . HAN, TIAN . [9] THAKUR, MANOJ RAMESHCHANDRA . POWELL, CHRISTOPHER GRANT . [10] XIE, MEIHUI . WONG, ALEX KING LAP . XIE, DAN . [11] TAN, MENGXIN . ROBERTS, SHANE WILLIAM STROUTH . HUANG, PENG JUN . [12] YU, CHENGCHENG . WU, JOHN SHING . MOUSAVI, SEYED ALI . Thu, Dec 5 [13] GU, JIAYING . RAMIREZ, ARTURO . DEMIRDJIAN, LEVON . [14] HUYNH, HIEN TRINH . SHEN, LINLING . CHANDRASEKHARAN, MANJANA . [15] WISNER, TRISTAN GARDNER . MAGYAR, ZSUZSANNA BLANKA . HE, JIA . [16] KRUMHOLZ, SAMUEL DAVID . KATTE, SAMIR RAJENDRA . WAINFAN, KATHRYN TANYA . [17] WANG, TENG . SCHWEIG, JONATHAN DAVID . Gupta, Hitesh . Hw3 and HW4 are on the course website http://www.stat.ucla.edu/~frederic/202a/F13 . Use the computer labs in BH 9413 if your computer doesn't seem to work with regard to running C in R. From a former student: "Some of us were working at getting R and C working, and I found a solution you may find useful. Ê When we were trying to run R CMD SHLIB, an error came back to the effect of 'gcc-4.2 file not found'. Ê In this case, the executable was just /usr/bin/gcc, so we fixed it with a symlink by running the command 'sudo ln -s /usr/bin/gcc /usr/bin/gcc-4.2' at the terminal." For the final project, start looking for a dataset and dividing up tasks. Find a regression type dataset, where you have many independent observational units (e.g. people), and for each, you have 3 or 4 variables recorded, and where one is the sensible response variable. You can do regression and also analyze each variable independently. 1. Approximate pi in C. In mypi.c, #include #include void pi2 (int *n, double *y){ int i; double x[*n]; x[0] = 1.0; y[0] = sqrt(6.0); for(i = 1; i < *n; i++) { x[i] = x[i-1] + 1.0 / ((i+1.0)*(i+1.0)); /* or x[i] = x[i-1] + 1.0 / pow(i+1.0,2.0); */ y[i] = sqrt(6.0 * x[i]); } } In R, system("R CMD SHLIB mypi.c") dyn.load("mypi.so") pi3 = function(n){ .C("pi2",as.integer(n), y = double(n)) } b = pi3(1000000) b$y[1000000] Note that you have to be incredibly careful in C when doing arithmetic between integers and non-integers. If instead of 1.0 / ((i+1.0)*(i+1.0)); you do 1 / ((i+1.0)*(i+1.0)); or 1.0 / (i+1.0)^2; crazy stuff happens. ^ is a bitwise operator meaning "XOR", i.e. X^Y = 1 if X=1 or Y=1 but not both. 2. dnorm. You can access C versions of many basic R functions, including for instance dnorm(), rnorm(), etc. A nice reference is http://www.biostat.jhsph.edu/~bcaffo/statcomp/files/cprog3_ho.pdf . The syntax in C of dnorm is double dnorm(double x, double mu, double sigma, int give_log) in mydn.c, #include #include void norm2 (int *n, double *upper, double *bw, double *y){ int i; double x, inc; x = -1.0 * *upper; inc = 2.0 * *upper / *n; for(i = 0; i < *n; i++) { y[i] = dnorm(x / *bw, 0,1,0); x += inc; } } In R, system("R CMD SHLIB mydn.c") dyn.load("mydn.so") norm3 = function(n, u, b){ d = .C("norm2", as.integer(n), as.double(u), as.double(b), y = double(n)) d$y } b = 100 n = 10000 u = 5*b a = norm3(n,u,b) title2 = paste("normal density with sd ", as.character(b)) plot(seq(-u,u,length=n), a, type="l", main=title2,xlab="x", ylab="f(x)") 3. Sum of squared differences between observations. In sumsq.c, #include #include void ss2 (double *x, int *n, double *y) /* x will be the vector of data of length n, and y will be a vector of squared differences from obs i to the other n-1 observations. */ { int i,j; double a; for(i = 0; i < *n; i++){ a = 0.0; for(j=0; j < *n; j++){ a += pow(x[i] - x[j], 2); } y[i] = a; } } in R, system("R CMD SHLIB sumsq.c") dyn.load("sumsq.so") sum3 = function(data2){ n = length(data2) a = .C("ss2", as.double(data2), as.integer(n), y=double(n)) a$y ## or equivalently a[[3]] } b = c(1,3,4) sum3(b) n = c(100, 1000, 2000, 3000, 5000, 7000, 8000, 10000) t2 = rep(0,8) for(i in 1:8){ b = runif(n[i]) timea = Sys.time() d = sum3(b) timeb = Sys.time() t2[i] = timeb-timea cat(n[i]," ") } par(mfrow=c(1,2)) plot(n,t2,ylab="time (sec)") ## Now try the same thing in R, without C. sum4 = function(data2){ n = length(data2) x = rep(0,n) for(i in 1:n){ for(j in 1:n){ x[i] = x[i] + (data2[i] - data2[j])^2 } } x } b = c(1,3,4) sum4(b) n = c(1:8)*100 t3 = rep(0,8) for(i in 1:8){ b = runif(n[i]) timea = Sys.time() d = sum4(b) timeb = Sys.time() t3[i] = timeb-timea cat(n[i]," ") } plot(n,t3,ylab="time (sec)") 4. R Cookbook ch. 13.1 and 13.2. The main things I want to emphasize are otimize and optim. Use optimize(), p 335, to minimize a function of one variable. f = function(x){exp((abs(x)-2.0))+(x-3)^2} x = seq(-10,10,length=200) y = f(x) par(mfrow=c(1,2)) plot(x,y) plot(x,y,ylim=c(min(y),min(y)+1)) c(1:200)[y == min(y)] x[124] x[123:125] f(x[124]) g = optimize(f) ## doesn't work g = optimize(f, lower=-10,upper=10) g g$min is the minimum, and g$obj = f(g$min). Use optim(), p336, for functions of more than one variable. It uses a Newton-Raphson type method by default, and doesn't require you to give a range but you do need an initial guess. f = function(x) 3+(x[1]-2.4)^2 + (x[2]-2.5)^2 guess1 = c(10,10) g = optim(guess1,f) g$par