1. dnorm and change of variables.
2. Matrices in C.3. Python and mySQL references.4. Bias in the sample SD. 
5. Bias in sample variance. 6. Calling R from C. 7. GAM. 1. Brief review of dnorm and change of variables.Why does dnorm(x, sd = s) = dnorm(x/s, sd = 1)/s?dnorm(1.2, 0,5.7,0) = normal density with mean 0 and sd 5.7, at 1.2= dnorm(1.2/5.7, 0, 1, 0) / 5.7from change of variables formula in calculus.Let f be the std normal density.Integral from -inf to inf of f(x/b) dx = ?Let y = x/b. dy = dx/b. dx = b dy.So Integral from -inf to inf of f(x/b) dx = b * Integral of f(x)dx = b.So, divide by b to get a density g(x) = f(x/b).2. Matrices in C.You pretty much have to pass matrices in R to C as vectors.Matrices in R are actually stored as long vectors, with their dimensions as 2 attributes.However, you can pass the matrices just as you would vectors, with one caveat. The matrices are read column by column, not row by row!a = matrix(1:6,ncol=3,byrow=T)aa[1]a[2]a[3]a[1:6]In mymatrix.c,#include <R.h>#include <Rmath.h>void matmult (double *a, int *n1, int *m1,	      double *b, int *n2, int *m2,	      double *y){    int i,j,k;    double x;    double a1[*n1][*m1];    double b1[*n2][*m2];        for(j = 0; j < *m1; j++) {	for(i = 0; i < *n1; i++) {	    a1[i][j] = a[j * *n1 + i];	}    }        for(j = 0; j < *m2; j++) {	for(i = 0; i < *n2; i++) {	    b1[i][j] = b[j * *n2 + i];	}    }	        for(i = 0; i < *n1; i++) {	for(j = 0; j < *m2; j++) {	    x = 0.0;	    for(k = 0; k < *m1; k++) {		x += a1[i][k] * b1[k][j];	    }	    y[i* *m2 + j] = x;  // y[i][j] = x;	}    }}In R,    system("R CMD SHLIB mymatrix.c")    dyn.load("mymatrix.so")    a = matrix(1:6,ncol=3,byrow=T) ## or equivalently a = c(1,4,2,5,3,6)    b = matrix(1:12,ncol=4,byrow=T)    y = rep(0,8)    x = .C("matmult",as.double(a),as.integer(2),as.integer(3),as.double(b),as.integer(3),           as.integer(4),as.double(y))    y = matrix(x[[7]], ncol=4, byrow=T)    y    a %*% b    3. Python and mySQL references.For Python, a nice introduction is http://wiki.python.org/moin/BeginnersGuide . There are many nearly equivalent online references. For MySQL, a good reference is  King, T., Reese, G., Yarger, R., and Williams, H.E. (2002).  Managing and Using MySQL: Open Source SQL Databases for Managing Information   and Web Sites, 2nd ed. O'Reilly and Associates, Inc., Sebastopol, CA.  Or   Sams Teach Yourself SQL in 10 Minutes, http://www.amazon.com/dp/0672325675,and websites that you might like are  http://w3schools.com/php/php_mysql_intro.asp  http://www.tizag.com/mysqlTutorial . 
4. Bias in the sample sd.Given n iid observations from some population with mean ľ and variance sigma^2, it is well known (and not hard to show) that the sample variance s^2 = á (Xi - Xbar)^2 / (n-1) is an unbiased estimate of sigma^2.But is s an unbiased estimate of sigma? What is its bias?Suppose the population is uniform (0,1).If we were writing code to answer this question in R, we might do the following.  ## R code to calculate bias in the sample sd of n uniforms on (0,1),   ## where n goes from 2 to nmax=100, and for each such n,  ## we will generate n uniforms,   ## calculate the sample sd, and repeat m times, where m = 1000.  ## The true variance of U(0,1) = E(X^2) - [E(X)]^2 = 1/3 - 1/4 = 1/12,  ## so sigma = 1/sqrt(12).   nmax = 100  n = 2:nmax  m = 10000  bias2 = rep(0,99) ## to be general 99 should be changed to nmax-1  for(i in 1:99){    s = rep(0,m)    for(j in 1:m){      x = runif(n[i])      s[j] = sd(x)      }    bias2[i] = mean(s) - 1/sqrt(12)  cat(i," ")  }    plot(n,bias2,xlab="n",ylab="bias", ylim=c(-.003,.001),         main="bias in SD of n uniforms", type="l")  abline(h=0,lty=2)  ## There's a lot of scatter and noise, because m is so small. It's mostly noise.In sd2.c,#include <stdlib.h>#include <R.h>#include <Rmath.h>double sd2(double *x, int j) {    int i;    double a1, xbar;        xbar = 0.0;    for(i = 0; i < j; i++) {	xbar += x[i];    }    xbar = xbar / j;    a1 = 0.0;    for(i = 0; i < j; i++) {	a1 += pow(x[i] - xbar,2);    }    a1 = pow(a1/(j-1), 0.5);    return(a1);}double mean2(double *x, int j) {    int i;    double xbar;        xbar = 0.0;    for(i = 0; i < j; i++) {	xbar += x[i];    }    xbar = xbar / j;    return(xbar);}void bias3 (int *nmax, int *m, double *b){    /* The output will be b, the bias in the sd for each n,         from n=2 up to nmax. */    int i,j,k;    double x[*nmax];    double s[*m];        GetRNGstate();    for(i = 2; i < *nmax+1; i++){   	for(j = 0; j < *m; j++){	    for(k = 0; k < i; k++){		x[k] = runif(0,1);	    }	    s[j] = sd2(x, i);	}	b[i-1] = mean2(s, *m) - pow(12, -.5);        if(i/1000 == floor(i/1000)) Rprintf("%i ",i);    }    PutRNGstate();}In R,  system("R CMD SHLIB sd2.c")  dyn.load("sd2.so")  nmax = 100  m = 100000  a = .C("bias3",as.integer(nmax), as.integer(m),b = double(nmax))  bias1 = a$b  plot(c(2:nmax),bias1[2:nmax],xlab="n",ylab="bias",ylim=c(-.003,.001),         main="bias in SD of n uniforms", type="l")  abline(h=0,lty=2)5. Bias in sample variance?The sample variance s^2 = ˇ (Xi - Xbar)^2 / (n-1).s^2 is an unbiased estimate of sigma^2, regardless of n and the distribution of Xi. Last time we investigated the bias in s as an estimate of sigma,supposing the random variables Xi are uniform (0,1).What would it look like if we repeated this, but looked at s^2 instead of s?In sd3.c,#include <R.h>#include <Rmath.h>double s2(double *x, int j) {    int i;    double a1, xbar;        xbar = 0.0;    for(i = 0; i < j; i++) {	xbar += x[i];    }    xbar = xbar / j;    a1 = 0.0;    for(i = 0; i < j; i++) {	a1 += pow(x[i] - xbar,2);    }    a1 = a1/(j-1); // I changed this from a1 = pow(a1/(j-1), 0.5).    return(a1);}double mean2(double *x, int j) {    int i;    double xbar;        xbar = 0.0;    for(i = 0; i < j; i++) {	xbar += x[i];    }    xbar = xbar / j;    return(xbar);}void bias4 (int *nmax, int *m, double *b){    /* The output will be b, the bias in s2 for each n,         from n=2 up to nmax. */    int i,j,k;    double x[*nmax];    double s[*m];        GetRNGstate();    for(i = 2; i < *nmax+1; i++){   	for(j = 0; j < *m; j++){	    for(k = 0; k < i; k++){		x[k] = runif(0,1);	    }	    s[j] = s2(x, i); // I changed this from sd2 to s2.	}	b[i-1] = mean2(s, *m) - 1.0/12.0; // I changed pow(12,-.5) to 1.0/12.0.	if(i/1000 == floor(i/1000)) Rprintf("%i ",i);    }    PutRNGstate();}In R,   system("R CMD SHLIB sd3.c")  dyn.load("sd3.so")  nmax = 100  m = 100000  a = .C("bias4",as.integer(nmax), as.integer(m),b = double(nmax))  bias1 = a$b  plot(c(2:nmax),bias1[2:nmax],xlab="n",ylab="bias", ylim=c(-.003,.001),         main="bias in sample var of n uniforms", type="l")  abline(h=0,lty=2)6. Calling R from C. Notes on this come from Phil Spector's lecture notes at UC Berkeley.The composite version of Simpson's rule is a rough way to approximate an integral of some function, f.It goes back to English mathematician Thomas Simpson (1710Đ1761), who approximated the integral from a to b of f(x) dx by [(b-a)/6] [f(a) + 4f((a+b)/2) + f(b)].This is the approximation you'd get by just dividing the interval (a,b) into 6 pieces, and using the left endpoint for the first piece, the right endpoint for the last piece, and the middle, (a+b)/2, for the other 4 pieces.For the composite Simpson's rule, you divide the interval not into 6 pieces but n pieces, where n is even, and getthe integral from a to b of f(x) dx ~  h/3 [f(a) + 4f(a+h) + 2f(a+2h) + 4f(a+3h) + 2f(a+4h) + ... + 4f(b-h) + f(b)],where h = (b-a)/n.Note that the coefficients 4 and 2 alternate.The error in Simpson's rule is bounded in absolute value by h^4 (b-a) max[f^4(x); x in (a,b)]/180.Suppose we want to write this function f in R, but run the composite Simpson'srule in C, and call the C function from R.The key component is a C function called call_R. call_R takes as one of its arguments a pointer to a function, which is passed into the C environment as a list. The number and type of the arguments to f are also passed to call_R. As usual, when calling C from R, the results of the C function in the end are passed back to R not as a returned value, but through the argument list. It is typically easiest to divide the C program into three parts.1) simp is the algorithm to implement the composite Simpson's rule, but it doesn't deal with issues pertaining to the interface between C and R.2) sfunc is just a shell that uses call_R to have R evaluate the function f.3) dosimp is just the interface, which is called by R and which calls simp.In simp.c,  static char *sfunction;  double simp(double(*func)(),double start,double stop,long n){    double mult,x,t,t1,inc;     long i;     inc = (stop - start) / (double)n;     x = start;    t = func(x);     mult = 4.0;     for(i=1; i<n; i++){	x += inc; 	t += mult * func(x); 	if(mult == 4.0) mult = 2.0;	else mult = 4.0;    }    t += func(stop);     return(t * inc / 3.0); }  double sfunc(double x) {    char *modes[1];     char *arguments[1];     double *result;     long lengths[2];     lengths[0] = (long)1;     arguments[0] = (char*)&x;     modes[0] = "double";     call_R(sfunction,(long)1,arguments,modes,lengths,	   (char*)0,(long)1,arguments); 	   result = (double*)arguments[0]; 	   return(*result);   }  dosimp(char** funclist,double *start,double *stop,long *n,double *answer) {    double sfunc(double);     double simp(double(*)(),double,double,long);     sfunction = funclist[0];     *answer = simp((double(*)())sfunc,*start,*stop,*n);   } 	   The function simp implements the composite Simpson's rule.The function dosimp is what R calls.dosimp simply extracts the function pointer from the list passed into the C environment, and loads it into a static pointer to make it available to the function sfunc, and then calls the function simp which actually does the integration. sfunc uses the call_R function to pass arguments to, and receive results from, R. call_R's first argument is the pointer to the function which was passed to dosimp from R. The second argument is the number of arguments to be passed to the R function, which in this case is 1. (long 1 just means the integer 1.)Next is an array of pointers to the arguments themselves, each cast as char*. Note that this is an array of pointers, i.e. a pointer to pointers.Similarly, the modes argument is an array of pointers to character strings that tell the R function how to treat each argument, i.e. as doubles, integers, etc. The most common modes are double and character. As in all functions external to R, the arguments themselves are pointers to the values they represent, not the values themselves. The next argument is an array of integers (longs) giving the length of each of the arguments passed; in this case, lengths[0] has been set to 1. The next argument, which is cast as a null pointer in this example, is a pointer to an array of names of the arguments being passed to the R function; usually an explicit pointer to names will not be required. The next argument is the number of results which the R function will return; by returning a list, an R function can return more than one value into the C environment. The final argument is an array of pointers to where the results will be returned. Now, in R,  system("R CMD SHLIB simp.c")  dyn.load("simp.so")  mysimp = function(func, start, stop, n = 10) {    if(trunc(n/2) - n/2 != 0) stop("Number of points must be even.")    t=0     .C("dosimp",list(func),as.double(start),    as.double(stop),as.integer(n), ans = as.double(t))$ans  }  f1 = function(x){x^4 - 2*x^2}  mysimp(f1,1,4,10)  ## compare with 4^5/5 - 1/5 - 2*4^3/3 + 2/3 = 162.6.  mysimp(f1,1,4,100)Note the use of list(func). This passes the function pointer to the C routine.7. Generalized additive models with kernel regression.Suppose you have a response Y, and 3 explanatory variables, X1, X2, and X3.A generalized additive model is of the form  Y = f1(X1) + f2(X2) + f3(X3) + eps, where eps = noise (iid, mean 0, variance sigma^2).The idea is that the functions f1, f2, and f3 don't need to be linear or some particular parameteric form. You can estimate these functions and then see if they look linear, quadratic, exponential, or whatever.R has a function gam() in the mgcv package, which fits these functions using penalized regression splines. However, one could also fit these functions by kernel regression, using your C kernel regression function, or more simply using ksmooth(), which does basically the same thing. Consider the following example. We will generate some fake data, and then fit Y = f1(x1) using kernel regression.Then we will take the residuals from that, where residual[i] = Y[i] - f1(x1[i]),and fit the residuals as a function of x2 by kernel regression, and keep going. You might want to do something like this for your projects.x1 = runif(100)*10x2 = runif(100)*20x3 = runif(100)*30y = x1^3 - 10*x1^2 + 5*x1 - .1*x2^3 + 2*x2^2 + 4*x2 + 12 + 7 * x3 + 14*rnorm(100)plot(x1,y)c1 = ksmooth(x1,y, bandwidth = bw.nrd(x1),x.points=x1)lines(c1)fit1 = rep(0,100)fit1[order(x1)] = c1$yy1 = y - fit1 ## these are the residuals from this first fit## note that x1 is not ordered, so if you do lines(x1,y1), it won't look good## because lines joins the first point to the second point to the third, etc.## but you can do points(x1,fit1,pch=2) to check that things are going ok.plot(x2,y1,ylab="residuals after first kernel smoothing")c2 = ksmooth(x2,y1, bandwidth = bw.nrd(x2),x.points=x2)lines(c2)fit2 = fit1fit2[order(x2)] = c2$y## to check, try points(x2,fit2,pch=2)y2 = y1 - fit2 ## residuals after 2nd fitplot(x3, y2,xlab="x3",ylab="residuals after 2 kernel smoothings")## Now suppose we think these residuals look like a linear function of x3, ## so we decide to fit a line for this term.c3 = lsfit(x3, y2)abline(c3)c3$coef[2] ## compare with 7plot(x3,c3$resid,ylab="residuals after 2 kernels and a line")par(mfrow=c(1,2))hist(y2,nclass=20,xlab="residuals after 2 kernels",main="", xlim=range(c(y2,c3$resid)))hist(c3$resid, nclass=20, xlab="residuals after 2 kernels and a line",main="", xlim=range(c(y2,c3$resid)))par(mfrow=c(1,1))## or, we could keep going and fit the function of x3 by kernel regression.plot(x3, y2,xlab="x3",ylab="residuals after 2 kernel smoothings")c4 = ksmooth(x3, y2, bandwidth = bw.nrd(x3),x.points=x3)lines(c4)fit3 = fit1fit3[order(x3)] = c4$yy3 = y2 - fit3plot(x3, y3,ylab="residuals after 3 kernel smoothings")par(mfrow=c(1,2))x5 = hist(c3$resid, nclass=20, xlab="residuals after 2 kernels and a line",main="", xlim=range(c(y3,c3$resid)))hist(y3, breaks=20, xlab="residuals after 3 kernels",main="", xlim=range(c(y3,c3$resid)))## It's interesting to look at, and interpret, the estimates of f1, f2, ## and f3, too.par(mfrow=c(1,3))plot(c1,type="l",xlab="x1",ylab="estimate of f1(x1)")plot(c2,type="l",xlab="x2",ylab="estimate of f2(x2)")plot(c4,type="l",xlab="x3",ylab="estimate of f3(x3)")par(mfrow=c(1,1))What might be a great idea for your project is to take some regression style data, fit a generalized additive model by kernel regression (and/or perhaps fitting terms that look linear or quadratic by linear or quadratic functions), and using C to get confidence intervals for your fitted curves.