## 1. No lecture Nov5.  
## 2. game. 
## 3. Final project order. 
## 4. Compiling C. 
## 5. Note on hw4. 
## 6. Approximating pi in C. 
## 7. dnorm in C. 
## 8. Sum of squared differences between observations in C. 

1. Faculty retreat. 
  No class Thu Nov5 because of the faculty retreat. 
Also no class Thu Nov26 for Thanksgiving. 

2. game. 
  a. In the breakout room, exchange emails, and pick a leader. 
  b. Decide on a time to meet later, like maybe right after class if you like. 
  c. The leader should download the game from the course webpage, http://www.stat.ucla.edu/~frederic/202a/F20 . Download the whole folder called gamefor202a. This will take a few minutes because of all the pictures. 
  d. The leader should run the first 10 lines of game.txt, which will require installing and loading various libraries. This may also take a minute or two. 
  e. When the time comes to meet, the leader should start a zoom, invite the others, start R, share screen, set the R working directory to the images folder in gamefor202a, and copy and paste everything in game.txt into R. 
  
3. Final projects. For your final projects, you will analyze some data using the methods we have talked about in class, including the methods used for the homeworks and also other methods we have discussed in class. You will write up your analysis in a written report, and will also make an oral presentation. The presentation will be only 5 minutes each in total. No going over! I will cut you off at 5 minutes. 
However, I would like to take 1 quick question from the audience or me afterwards. 
You will use your own computer and share screen for your presentation. 

For the final 3 lectures, attendance is mandatory. 
Please do not interrupt with difficult questions, but 
clarifying questions are fine. 
Deeper questions should be asked after the presentation. 
Your dataset, which you will find yourselves, on the web,can be anything you choose, but it should be:a) regression style data, i.e. a response variable, and for eachobservation, a bunch (at least 2 or 3) explanatory variables. You shouldhave at the very least n=30 observations. One of the variables should be 
a sensible response variable you can imagine wanting to predict. b) something of genuine interest to you.Analyze the data using the methods we have talked about in class,such as linear regression, univariate kernel density estimation, 2-d kernel density estimation, testing, quantile plots, and kernel regression. 
You can do regression and also analyze each variable individually. At least one component of your data anlysis should be done in C. Your final project should be submitted to me in pdf by email to frederic@stat.ucla.edu by Sun Dec 14, 11:59pm. Note the address. Do not send them via ccle and do not send them to stat202a@stat.ucla.edu. They are all due the same date, regardless when your oral presentation is.I will now randomly assign people to presentation times.If you want to change oral presentation dates and times with another person, feel free but let me know.I will use sample(). setwd = "/Users/rickpaikschoenberg/Documents/2020/202a/done" 
y = scan("roster.txt",what="char",sep="\n")n = length(y)w = sample(y)for(i in 1:n){      
if(i == 1)  cat("\n\n  Thu, Dec 3\n")if(i == 13) cat("\n\n  Tue, Dec 8 \n")if(i == 25) cat("\n\n  Thu, Dec 10 \n")cat(i,". ",w[i],"\n",sep="")} 

 Thu, Dec 3
1. FISCHER, ERIC MERCADO
2. SOUDA, NAVIN VARADARAJ
3. BURTON, HENRY
4. HWANGBO, NATHAN MIN
5. HUANG, STELLA HONGYING
6. SHINKRE, TANVI RAHUL
7. ZHAO, YIJIA
8. O'NEILL, ELIZABETH
9. DONG, CHRIS YUANCHAO
10. HOFFMANN, NATHAN ISAAC
11. FAN, HAIBO
12. WANG, KAIXIN

  Tue, Dec 8 
13. CHU, HANQING 
14. LEE, CHRISTY
15. AGRAWAL, SURABHI
16. PARIDAR, MAHSA
17. CHEN, ALEX
18. LI, BILL
19. YAN, GUANAO
20. JACOBSON, THOMAS ABRAM
21. ZHAI, XUFAN
22. MUELLER, SCOTT ALLEN
23. WONG, EMILY FRANCES
24. VINAS, LUCIANO

  Thu, Dec 10 
25. KIM, DOEUN 
26. ZHANG, XINYUAN
27. ZHANG, ZHE
28. RESCH, JOSEPH
29. XU, CHAO
30. ZHOU, CARTLAND
31. NGUEN, CHUNG KYONG
32. TREJO, ALFREDO
33. LIU, VINCENT BOJIE
34. GABRIEL, CHRISTOPHER JOHN
35. VARGAS, SANTIAGO

I put some sample projects and presentation powerpoints from previous students 
on the course website in the folder sampleprojects. 

Give us a sense of your data. Assume that the listener knows what the statistical methods you are using are. Tell us what they say about your data. Emphasize the results more than the methods. Go slowly in the beginning so that the listener really understands what your data are.Speculate and generalize but use careful language. Say "It seems" or "appears" rather than "is" when it comes to speculative statements or models. For example, you might say "The residuals appear approximately normal" or "a linear model seems to fit well" but not "The residuals are normal" or "The data come from a linear model". Start with an introduction explaining what your data are, how you got them, and why they are interesting (1-2 minutes), then show your results as clearly as possible, with figures preferred (roughly 2 minutes), and then conclude (1 minute). In your conclusion, mention the limitations of your analysis and speculate about what might make a future analysis better, if you had infinite time. This might include collecting more data, or getting data on more variables, as well as more sophisticated statistical methods.For your written reports, apply these same rules. 
Your project should be 5 pages or less of text, followed by as many figures or tables as you want. 
Have just the text in the beginning, and then the figures at the end. Do not worry about embedding the figures in the text. 
Email your pdf document to me, at frederic@stat.ucla.edu , by Sun, Dec 14, 11:59pm.4. Note on hw4. When I say in hw4 problem 2d "sample 100 pairs of observations (Xi, Yi) with replacement," I mean, if your dataset has length n, then let  b = sample(1:n, 100, rep=T)and for each element i in b,take (Xi, Yi). 
5. Compiling C. 

For compiling C in Windows, these links might be useful:http://www.stat.columbia.edu/~gelman/stuff_for_blog/AlanRPackageTutorial.pdf .https://cran.r-project.org/bin/windows/base/rw-FAQ.html#How-do-I-include-compiled-C-code_003f . From a former student:"Some of us were working at getting R and C working, and I found a solution you may find useful.  When we were trying to run R CMD SHLIB, an error came back to the effect of 'gcc-4.2 file not found'.  In this case, the executable was just /usr/bin/gcc, so we fixed it with a symlink by running the command 'sudo ln -s /usr/bin/gcc /usr/bin/gcc-4.2' at the terminal."

6. Approximate pi in C.In mypi.c,    #include <R.h>    #include <Rmath.h>    void pi2 (int *n, double *y){      int i;      double x[*n];      x[0] = 1.0;      y[0] = sqrt(6.0);      for(i = 1; i < *n; i++) {          x[i] = x[i-1] + 1.0 / ((i+1.0)*(i+1.0));          /* or x[i] = x[i-1] + 1.0 / pow(i+1.0,2.0); */           y[i] = sqrt(6.0 * x[i]);      }    }      In R,    ## set working directory to the one containing mypi.c. 
    system("R CMD SHLIB mypi.c")    dyn.load("mypi.so")    pi3 = function(n){		.C("pi2",as.integer(n), y = double(n))	}    b = pi3(1000000)    b$y[1000000]Note that you have to be incredibly careful in C when doing arithmetic between integers and non-integers.If instead of 1.0 / ((i+1.0)*(i+1.0));you do         1  / ((i+1.0)*(i+1.0));or            1.0 / (i+1.0)^2;crazy stuff happens.^ is a bitwise operator meaning "XOR", i.e. X^Y = 1 if X=1 or Y=1 but not both.    7. dnorm in C.You can access C versions of many basic R functions, including for instance dnorm(), rnorm(), etc.The syntax in C of dnorm is double dnorm(double x, double mu, double sigma, int give_log) in mydn.c,    #include <R.h>    #include <Rmath.h>    void norm2 (int *n, double *upper, double *bw, double *y){      int i;      double x, inc;      x = -1.0 * *upper;      inc = 2.0 * *upper / *n;      for(i = 0; i < *n; i++) {      y[i] = dnorm(x / *bw, 0,1,0);      x += inc;      }    }    ## I stopped here last time. 
In R,     system("R CMD SHLIB mydn.c")    dyn.load("mydn.so")    norm3 = function(n, u, b){		d = .C("norm2", as.integer(n), as.double(u), 		as.double(b), y = double(n))	d$y	}    b = 12.4    n = 100000    u = 5*b    a = norm3(n,u,b)    title2 = paste("normal density with sd ", as.character(b))    plot(seq(-u,u,length=n), a, type="l", main=title2,xlab="x", ylab="f(x)")8. Sum of squared differences between observations in C.In sumsq.c,	#include <R.h>	#include <Rmath.h>	void ss2 (double *x, int *n, double *y)	/* x will be the vector of data of length n,	   and y will be a vector of squared differences from obs i	   to the other n-1 observations.	*/	{		int i,j;		double a;		for(i = 0; i < *n; i++){			a = 0.0;			for(j=0; j < *n; j++){				a += pow(x[i] - x[j], 2);			}			y[i] = a;		}	}in R,	system("R CMD SHLIB sumsq.c")	dyn.load("sumsq.so")	sum3 = function(data2){		n = length(data2)		a = .C("ss2", as.double(data2), as.integer(n),			y=double(n))		a$y  ## or equivalently a[[3]]	}	b = c(1,3,4)        sum3(b)	n = c(100, 1000, 2000, 3000, 5000, 7000, 8000, 10000)	t2 = rep(0,8)	for(i in 1:8){	  b = runif(n[i])	  timea = Sys.time()	  d = sum3(b)	  timeb = Sys.time()	  t2[i] = timeb-timea	  cat(n[i]," ")	}	par(mfrow=c(1,2))	plot(n,t2,ylab="time (sec)")		## Now try the same thing in R, without C.		sum4 = function(data2){	  n = length(data2)	  x = rep(0,n)	  for(i in 1:n){	    for(j in 1:n){	      x[i] = x[i] + (data2[i] - data2[j])^2	      }	    }	  x	  }	b = c(1,3,4)        sum4(b)	n = c(1:8)*100	t3 = rep(0,8)	for(i in 1:8){	  b = runif(n[i])	  timea = Sys.time()	  d = sum4(b)	  timeb = Sys.time()	  t3[i] = timeb-timea	cat(n[i]," ")	}	plot(n,t3,ylab="time (sec)")