1. Final project and oral report tips. 2. Defining vectors and matrices within C. 3. C functions communicating. 4. Running C from terminal. 5. R Cookbook ch. 11. 6. R Cookbook ch. 13.1 and 13.2. -- Use the computer labs in BH 9413 if your computer doesn't seem to work with regard to running C in R. -- HW3 is now due THURSDAY, Nov 15 (not Tuesday the 13th). 1. Final project and oral report tips. For the oral projects. Rule 1: Do not look at me when you are talking! Rule 2: 8-10 minutes per oral report and everybody in the group must speak for roughly equal amounts. I will cut you off if your group goes over. You can have someone in the audience or on your team helping you with the time. Rule 3: Everyone must be respectful and quiet during other people's talks. No questions because we won't have time. Rule 4: Send me a pdf or powerpoint version of your group's slides by 6pm the night before your talk, to frederic@stat.ucla.edu. That way, I can set up the talks in order ahead of time and we won't have to waste time in class waiting for each person to connect their laptop to the projector. About 7 or 8 slides seems right, though it's fine with me if you want fewer or more. Rule 5: Give us a sense of your data. Assume that the listener knows what the statistical methods you are using are. Tell us what they say about your data. Emphasize the results more than the methods. Go slowly in the beginning so that the listener really understands what your data are. Rule 6: Speculate and generalize but use careful language. Say "It seems" or "appears" rather than "is" when it comes to speculative statements or models. For example, you might say "The residuals appear approximately normal" or "a linear model seems to fit well" but not "The residuals are normal" or "The data come from a linear model". Rule 7: Start with an introduction explaining what your data are, how you got them, and why they are interesting (roughly 2 minutes), then show your results as clearly as possible, with figures preferred (roughly 3 minutes), and then conclude (roughly 2 minutes). In your conclusion, mention the limitations of your analysis and speculate about what might make a future analysis better, if you had infinite time. This might include collecting more data, or getting data on more variables, as well as more sophisticated statistical methods. For your written reports, rules 5-7 apply again. Replace "minutes" with "pages". Have just the text in the beginning, and then the figures, and then your code at the end. Email your pdf document to me, at frederic@stat.ucla.edu , by Sun, Dec 9, 11:59pm. 2. Vectors and matrices in C. You can just say double x[100] or double x[25][4], for example. In mypi2.c, /* Calculate the first 100 terms in the approximation of pi. */ #include #include void pi100 (double *y){ int i; double x[100]; x[0] = 1.0; y[0] = sqrt(6.0); for(i = 1; i < 100; i++) { x[i] = x[i-1] + pow(i+1.0,-2.0); y[i] = sqrt(6.0 * x[i]); } } In R, system("R CMD SHLIB mypi2.c") dyn.load("mypi2.so") a = .C("pi100",y=double(100)) plot(1:100,a$y,type="l",xlab="n",ylab="Approx. of pi") In mypi3.c, /* Calculate 1*2*3 + 4*5*6 + ... + 28*29*30. */ #include #include void pi100 (double *y){ int i,j; double x[10][3]; *y = 0.0; for(i = 0; i < 10; i++) { for(j = 0; j < 3; j++){ x[i][j] = 3*i + j + 1.0; } *y += x[i][0] * x[i][1] * x[i][2]; } } In R, system("R CMD SHLIB mypi3.c") dyn.load("mypi3.so") y = 1.0 a = .C("pi100",as.double(y)) a If you want x to be an n by m matrix, where n and m are inputs, you can just do, for instance, in mypi4.c, /* Calculate 1*2*3*...*m + (m+1)(m+2)...(2m) + ... + [(n-1)m+1][(n-1)m+2]...(nm). */ #include #include void pi103 (double *y, int *n, int *m){ int i,j; double x[*n][*m]; double a; /* First define the matrix. */ for(i = 0; i < *n; i++) { for(j = 0; j < *m; j++){ x[i][j] = *m*i + j + 1.0; } } /* Now multiply each row and add the product to y. */ *y = 0.0; for(i = 0; i < *n; i++){ a = 1.0; for(j = 0; j < *m; j++){ a *= x[i][j]; } *y += a; } } In R, system("R CMD SHLIB mypi4.c") dyn.load("mypi4.so") y = 1.0 a = .C("pi103",as.double(y),as.integer(10),as.integer(3)) a[[1]] ## to check, prod(1:3)+prod(4:6)+prod(7:9)+prod(10:12)+ prod(13:15)+prod(16:18)+prod(19:21)+prod(22:24)+prod(25:27)+prod(28:30) a = .C("pi103",y=as.double(y),as.integer(8),as.integer(4)) a$y prod(1:4)+prod(5:8)+prod(9:12)+prod(13:16)+ prod(17:20)+prod(21:24)+prod(25:28)+prod(29:32) Note that you have to say "y = as.double(y)" rather than just as.double(y) if you want to be able to call the result with a$y. To do this in R, you could do n = 8 m = 4 x = matrix(1:(n*m),byrow=T,ncol=m) y = apply(x,1,prod) sum(y) What if you want to define a vector of length n, or a matrix with a dimension n, and n is not an input or a known integer but instead something that must be calculated inside your function, or something that changes as you go through your function? Then you need to do dynamic memory allocation. We will hopefully get to this later. 3. Calling C functions from C. In mysd1.c, /* Create a 10 x 4 matrix of integers and compute the sample SD of each row. */ /* Row 1 will be (1, 4, 9, 16). Row 2 will be (25, 36, 49, 64), etc. */ #include #include double sd2(double *x, int n){ int i; double xbar, b; xbar = 0.0; for(i = 0; i < n; i++){ xbar += x[i]; } xbar = xbar/n; b = 0.0; // b will be the sum of squared deviations for(i = 0; i < n; i++){ b += pow(x[i] - xbar, 2); } return(pow(b/(n-1),0.5)); } void rowsd (double *y){ int i,j; double a; double x[10][4]; for(i = 0; i < 10; i++) { for(j = 0; j < 4; j++){ x[i][j] = pow(4*i + j + 1.0, 2); } y[i] = sd2(x[i], 4); } } In R, system("R CMD SHLIB mysd1.c") dyn.load("mysd1.so") a = .C("rowsd",y=double(10)) a$y ## to check it, sd(c(1,4,9,16)) sd(c(25,36,49,64)) 4. Running C from terminal. In mypi5.c, #include #include int main ( ){ int i; float x,y,h; printf("\n How many years old are you? "); scanf("%d", &i); printf("\n How long is the first side of your right triangle? "); scanf("%f", &x); printf("\n How long is the second side of your right triangle? "); scanf("%f", &y); h = sqrtf(x*x + y*y); printf("\n You are %d years old and the hypotenuse is %f . \n", i, h); return(i); } In terminal on mac (or powershell on PC), cd /Users/fredericschoenberg/Documents/2012/202aF12/CinR, gcc mypi5.c -o mypi5.out ./mypi5.out ## I ended here. 5. R Cookbook ch. 11. p297, im = influence.measures(k2) outputs all 7 influence measures listed on p298 for each of the 10,000 data points. To get just the Cook's distances, use im2 = im$infmat[,6] Cook's distance = SUM from i= 1 to 10000 of [^Y_j - ^Y_j(-i)]^2 / (p MSE). ^Y_j = prediction of observation j based on the model. ^Y_j(-i) = prediction of observation j using model with observation i omitted, p = number of parameters estimated, MSE = mean squared error. p299, dwtest() in lmtest package, and acf() to test the autocorrelation. The Durbin-Watson test has been criticized, e.g. by Chatfield (1986), on the grounds that it basically just tests the first autocorrelation but users often think it looks at more. p300-1, prediction intervals and predictions using predict(). 6. R Cookbook ch. 13.1 and 13.2. optimize, p 335, to minimize a function of one variable. f = function(x){exp((abs(x)-2.0))+(x-3)^2} x = seq(-10,10,length=200) y = f(x) par(mfrow=c(1,2)) plot(x,y) plot(x,y,ylim=c(min(y),min(y)+1)) c(1:200)[y == min(y)] x[124] x[123:125] f(x[124]) g = optimize(f) ## doesn't work g = optimize(f, lower=-10,upper=10) g g$min is the minimum, and g$obj = f(g$min). optim(), p336, for functions of more than one variable. It uses a Newton-Raphson type method by default, and doesn't require you to give a range but you do need an initial guess. f = function(x) 3+(x[1]-2.4)^2 + (x[2]-2.5)^2 guess1 = c(10,10) g = optim(guess1,f) g$par Coming up. ## Reading in from a file. ## Dynamic memory allocation. ## Making R packages. ## Generalized additive models.