## 1. Final project order. ## 2. Compiling C and calling it from R. Reminder -- hw2 is due Wed Oct23. Reminder -- no lecture Thu Oct31. 1. Final projects. For your final projects, you will analyze some data using the methods we have talked about in class, including the methods used for the homeworks and also other methods we have discussed in class. You will write up your analysis in a written report, and will also make an oral presentation. The presentation will be only 5 minutes each in total. No going over! I will cut you off at 5 minutes. However, I would like to take 1 quick question from the audience or me afterwards. You will use my computer for your presentation. Email me by 10pm the night before your presentation with your slides in pdf or ppt. For the final 3 lectures, attendance is mandatory 2 of the 3 days. Please do not interrupt with difficult questions, but clarifying questions are fine. Deeper questions should be asked after the presentation. Your dataset, which you will find yourselves, on the web, can be anything you choose, but it should be: a) regression style data, i.e. a response variable, and for each observation, a bunch (at least 2 or 3) explanatory variables. You should have at the very least n=30 observations. One of the variables should be a sensible response variable you can imagine wanting to predict. b) something of genuine interest to you and where you have some knowledge about the topic. Analyze the data using the methods we have talked about in class, such as linear regression, kernel regression, univariate kernel density estimation, 2-d kernel density estimation, classification, quantile plots, or gam. You can do regression and also analyze each variable individually. At least one component of your data anlysis should be done in C. Your final project should be submitted to me in pdf by email to frederic@stat.ucla.edu by Dec10, 11:59pm. Note the address. They are all due the same date, regardless when your oral presentation is. I will now randomly assign people to presentation times. If you want to change oral presentation dates and times with another person, feel free but let me know. I will use sample(). y = scan("202aroster.txt",what="char",sep="\n") n = length(y) w = sample(y) for(i in 1:n){ if(i == 1) cat("\n\n Tue, Nov26\n") if(i == 14) cat("\n\n Tue, Dec3 \n") if(i == 27) cat("\n\n Thu, Dec5 \n") cat(i,". ",w[i],"\n",sep="") } Tue, Nov26 1. "NARANG, MANIK" 2. "AVELAR MENENDEZ, ANGEL RODRIGO" 3. "MENG, SILIN" 4. "CHANDY, MATHEW" 5. "ZHAO, WENXIAO" 6. "BHAGWAT, MUGDHA" 7. "AGGARWAL, ARJUN" 8. "KUANG, WENHAO" 9. "SHAH, KRISH HEMANG" 10. "PHAM, DUY MINH" 11. "TRAN, TOMMY VO" 12. "BOLOURANI, ANAHITA" 13. "RIZWAN SHAIKH, MUSKAN" Tue, Dec3 14. "BIRWATKAR, SAHIL SANJAY" 15. "HUANG, YIZHENG" 16. "REN, RAIN" 17. "GU, CHANGQUAN" 18. "WEI, ZIBU" 19. "VORA, SARTHAK BHAVESH" 20. "ZHANG, ZHIYUAN" 21. "CARNAHAN, DANIEL" 22. "RAMALINGAM, NIKOLAI ANDRES" 23. "DIVECHA, ANIRUDH" 24. "VORA, DHAVAL NITIN" 25. "HSIEH, DIN-YIN DARREN" 26. "WANG, YANJUN" Thu, Dec5 27. "PARDIKAR, ISHA ATUL" 28. "DONG, YIZHUO" 29. "WANG, SHU" 30. "BOGGARAM RAVISHANKAR, OM AMITESH" 31. "XIA, JINGFAN" 32. "LIU, CUINING" 33. "GOKARN, ARYAMAN RAJESH" 34. "RAJABALLY, TANIA CASSEM" 35. "TATTERSALL, CASEY JAMES" 36. "PARIGI, SHRAVAN" 37. "VEDAM, TIRUMALASRI" 38. "FRYDMAN, CLARA ROSALIE" -- Attendance is mandatory on your own day plus at least 1 of the 2 other days. -- Email me your pdf or ppt of your slides by 10pm the night before your presentation, and then you will use my computer for your presentation. If you would like to switch let me know. Maybe someone will want to switch with you. I put some sample projects and presentation powerpoints from previous students on the course website in the folder sampleprojects. Give us a sense of your data. Assume that the listener knows what the statistical methods you are using are. Tell us what they say about your data. Emphasize the results more than the methods. Go slowly in the beginning so that the listener really understands what your data are. Speculate and generalize but use careful language. Say "It seems" or "appears" rather than "is" when it comes to speculative statements or models. For example, you might say "The residuals appear approximately normal" or "a linear model seems to fit well" but not "The residuals are normal" or "The data come from a linear model". Start with an introduction explaining what your data are, how you got them, and why they are interesting (1-2 minutes), then show your results as clearly as possible, with figures preferred (roughly 2 minutes), and then conclude (1 minute). In your conclusion, you might mention what the main thing is you have learned about your data, and the limitations of your analysis, and speculate about what might make a future analysis better, if you had infinite time. This might include collecting more data, or getting data on more variables, as well as more sophisticated statistical methods. For your written reports, apply these same rules. Your project should be 5 pages or less of text, followed by as many figures or tables and code as you want. Have just the text in the beginning, and then the figures and code at the end. Do not worry about embedding the figures in the text. Email your pdf document to me, at frederic@stat.ucla.edu , by Dec10 11:59pm. Don't look at me! 2. Compiling C and calling it from R. The base R comes with a C compiler if you have version 2.1.3 or later. There are also other C compilers you can use, like XCode, for Mac OSX. It includes a C and C++ compiler, among other tools. Another common one is GCC. There are many free ones for PCs. See for instance https://www.thoughtco.com/list-of-free-c-compilers-958190 . But for most of you, if you download R to your computer, it should be fine. The first step to writing C code is opening a text editor. Write your C code, call it something.c, then compile it, to create an object file called something.so. Then you can load that into R. Hello world. Create a C function to print "Hello world!" and call this function n times in R. In a text editor, create a C file. Say it's called hello.c The file looks like: #include #include /* Start a comment. Continue your comment. */ void hello (int *n) { int i,j; double a2; a2 = 4.5; j = 3; for(i = 0; i < *n; i++) Rprintf("Hello world number %d . The integral is %f .\n", i, a2); } PUT THE FILE HELLO.C IN YOUR WORKING R DIRECTORY, OR MAKE YOUR CURRENT R DIRECTORY THE FOLDER CONTAINING HELLO.C. In UNIX, in same directory where hello.c is, type R CMD SHLIB hello.c or, in R, do system("R CMD SHLIB hello.c") If this does not work, from terminal, try xcode-select --install In R, in the same directory, do dyn.load("hello.so") hello2 = function(n){ .C("hello",as.integer(n)) } y = hello2(10) For compiling C in Windows, these links might be useful: http://www.stat.columbia.edu/~gelman/stuff_for_blog/AlanRPackageTutorial.pdf . https://cran.r-project.org/bin/windows/base/rw-FAQ.html#How-do-I-include-compiled-C-code_003f . From a former student: "Some of us were working at getting R and C working, and I found a solution you may find useful. When we were trying to run R CMD SHLIB, an error came back to the effect of 'gcc-4.2 file not found'. In this case, the executable was just /usr/bin/gcc, so we fixed it with a symlink by running the command 'sudo ln -s /usr/bin/gcc /usr/bin/gcc-4.2' at the terminal."