0. Misc. 
1. Sum of squared differences from observations for each gridpoint.
2. Kernel regression. 
3. Vectors and matrices in C. 
4. Calling C functions from C. 
5. Running C from terminal. 
6. Reading in from a file. 
7. Computing an integral in C. 
8. Bias in the sample SD. 
9. Bias in sample variance. 
10. GAM. 

0. A couple miscellaneous notes. 

When I compile a .c file on Windows (with RStudio), a .dll file is generated instead of a .so file. I have to dynamically load in this .dll file instead of a .so file. 

For MySQL, a good reference is
  King, T., Reese, G., Yarger, R., and Williams, H.E. (2002).
  Managing and Using MySQL: Open Source SQL Databases for Managing Information 
  and Web Sites, 2nd ed. O'Reilly and Associates, Inc., Sebastopol, CA.
  Or 
  Sams Teach Yourself SQL in 10 Minutes, http://www.amazon.com/dp/0672325675,
and websites that you might like are
  http://w3schools.com/php/php_mysql_intro.asp
  http://www.tizag.com/mysqlTutorial . 

1. Sum of squared differences from observations for each gridpoint. 

previously we computed the sum of squared differences between observations in C.

In sumsq.c,

	#include <R.h>
	#include <Rmath.h>

	void ss2 (double *x, int *n, double *y)
	/* x will be the vector of data of length n,
	   and y will be a vector of squared differences from obs i
	   to the other n-1 observations.
	*/
	{
		int i,j;
		double a;
		for(i = 0; i < *n; i++){
			a = 0.0;
			for(j=0; j < *n; j++){
				a += pow(x[i] - x[j], 2);
			}
			y[i] = a;
		}
	}

in R,
	system("R CMD SHLIB sumsq.c")
	dyn.load("sumsq.so")
	sum3 = function(data2){
		n = length(data2)
		a = .C("ss2", as.double(data2), as.integer(n),
			y=double(n))
		a$y  ## or equivalently a[[3]]
	}

	b = c(1,3,4)
        sum3(b)

Often in addition to your dataset, you also have a grid of points where you want to compute 
something, like the sum of squared difference. This is what we will do now. 
We will now find the sum of squared differences from observations for each gridpoint.

In sumsgrid.c,

	#include <R.h>
	#include <Rmath.h>

	void ssg2 (double *x, int *n, double *g, int *m, double *y)
	/* x will be the vector of data of length n,
	   g will be the vector of m gridpoints,
	  and y will be a vector of length m, of squared differences 
	   from gridpoint i to the n observations.
	*/
	{
		int i;
		int j;
		double a;
		for(i = 0; i < *m; i++){
			a = 0.0;
			for(j=0; j < *n; j++){
				a += pow(g[i] - x[j], 2);
			}
			y[i] = a;
		}
	}

in R,
	system("R CMD SHLIB sumsgrid.c")
	dyn.load("sumsgrid.so")
	sum4 = function(data2, grid2){
		n = length(data2)
		m = length(grid2)
		a = .C("ssg2", as.double(data2), as.integer(n),
			as.double(grid2), as.integer(m),
			y=double(m))
		a$y  ## or equivalently a[[5]]
	}

	b = c(1,3,4)
	gr = c(0,1,2,3,4,5)
        sum4(b,gr)

This function ssg2 will be similar to your kernel density code, except 
that you will also have a bandwidth as an additional input, and you will 
replace the line a += pow(g[i] - x[j], 2); with the kernel density 
estimate.

2. Kernel regression.

Given vectors X and Y of length n, suppose you want to look at the 
relationship between X and Y, without assuming a linear model or other 
model. You could just plot(x,y), but perhaps you want to look at a smooth 
summary of the relationship between X and Y.

Suppose Y_i = f(X_i) + eps_i, for i = 1 to n,
where f is some smooth function, and where eps_i are iid with mean 0.

The Nadaraya-Watson or kernel regression estimate of f is 
fhat(x) = Sum [Y_i k(x-X_i; b)] / Sum [k(x-X_i; b)],
where k is some kernel function,
b is some bandwidth,
and the sums are from i = 1 to n.

Note that k(x-X_i; b) = k{(x-X_i)/b)}/b.
[for instance, compare dnorm(7.0/33.3)/33.3 with dnorm(7.0, sd=33.3).]

A commonly used kernel is the Epanechnikov kernel, 
k(u) = 3/4 (1-u^2), if |u| â‰¤ 1, and 0 otherwise.

This was proposed in 
  Epanechnikov, V.A. (1969). "Non-parametric estimation of a multivariate 
  probability density". Theory of Probability and its Applications 14: 153â€“158.
and shown to have minimal variance under certain conditions.

in epanech.c, 

#include <R.h>
#include <Rmath.h>

/* x and y will be the data vectors each of length n, 
b is the bandwidth,
g will be the vector of m gridpoints where we want to calculate the 
kernel regression estimates, est, which is thus also of length m. 
*/

void kernreg2 (double *x, double *y, int *n, double *b,
	       double *g, int *m, double *est)
{
    int i,j;
    double a1,a2,c;
    for(i = 0; i < *m; i++){
	a1 = 0.0;
	a2 = 0.0;
	for(j=0; j < *n; j++){
	    if(fabs((x[j] - g[i])/ *b) <= 1.0){
		c = 0.75 * (1.0-pow((x[j]-g[i])/ *b,2.0))/ *b;
		a1 += y[j] * c;	
		a2 += c;
	    }
	}
	if(a2 > 0.0) est[i] = a1/a2; 
	else est[i] = 0.0;
	Rprintf("I'm on gridpoint %f and index %d right now.\n", g[i], i);
    }
}

// Watch out for /* in your code. This is treated as a comment started.
// Use / *b instead of /*b.
// These double slashes can be used in C to comment an individual line, 
// like # in R.

In R,

	system("R CMD SHLIB epanech.c")
	dyn.load("epanech.so")
	n = 1000
	x = runif(n)*4
	y = 0.1*x^5 - 2*x^3 + 10*x + rnorm(n)
	bw = bw.nrd(x)
	m = 100
	g2 = seq(0,4,length=m)
## This time, I'm going to call the C function directly, 
## without making a separate R function to call it.
	a = .C("kernreg2", as.double(x), as.double(y), as.integer(n), 
	as.double(bw), as.double(g2), as.integer(m), est=double(m))
	plot(c(0,4),c(-3,17),type="n",xlab="x",ylab="y",cex.axis=3)
	points(x,y,col="red",cex=3)
	lines(g2,a$est,col="black")
	lines(g2,.1*g2^5 - 2*g2^3 + 10*g2,col="blue")
	legend("topleft",lty=c(1,1),col=c("black","blue"),c("estimate","truth")) 

Here, bw ~ 0.3. Try it again with bw = .01 or bw = 2.
The Nadaraya-Watson estimator tends to be biased at the edges, 
especially when the bandwidth is large.

Play around with a[[1]], a[[2]], etc.
C functions called by R output back to R a list
containing all the arguments passed to them, though here the last argument, 
a$est, is changed after running the function kernreg2.

Note that, when calling your C function, the order of the arguments 
matters, and their type matters, but their names don't matter. For 
instance, here in R I called the bandwidth bw, but in C I called it b.
Same with g2 and g. 

3. Vectors and matrices in C.

You can just say double x[100] or double x[25][4], for example.
In mypi2.c,
  
  /* Calculate the first 100 terms in the approximation of pi. */
  #include <R.h>
  #include <Rmath.h>
  void pi100 (double *y){
    int i;
    double x[100];

    x[0] = 1.0;
    y[0] = sqrt(6.0);
    for(i = 1; i < 100; i++) {
      x[i] = x[i-1] + pow(i+1.0,-2.0);
      y[i] = sqrt(6.0 * x[i]);
    }
  }

In R,
    system("R CMD SHLIB mypi2.c")
    dyn.load("mypi2.so")
    a = .C("pi100",y=double(100))
    plot(1:100,a$y,type="l",xlab="n",ylab="Approx. of pi")

In mypi3.c,
  
  /* Calculate 1*2*3 + 4*5*6 + ... + 28*29*30. */
  #include <R.h>
  #include <Rmath.h>
  void weirdsum (double *y){
    int i,j;
    double x[10][3];
    *y = 0.0;
    for(i = 0; i < 10; i++) {
      for(j = 0; j < 3; j++){
        x[i][j] = 3*i + j + 1.0;
      }
      *y += x[i][0] * x[i][1] * x[i][2];
    }
  }

In R,
    system("R CMD SHLIB mypi3.c")
    dyn.load("mypi3.so")
    y = 1.0
    a = .C("weirdsum",as.double(y))
    a

If you want x to be an n by m matrix, where n and m are inputs, you can 
just do, for instance, in mypi4.c,

  /* Calculate 1*2*3*...*m + (m+1)(m+2)...(2m) + ... + [(n-1)m+1][(n-1)m+2]...(nm). */
  #include <R.h>
  #include <Rmath.h>
  void pi103 (double *y, int *n, int *m){
    int i,j;
    double x[*n][*m];
    double a;
    
    /* First define the matrix. */
    for(i = 0; i < *n; i++) {
	for(j = 0; j < *m; j++){
	    x[i][j] = *m*i + j + 1.0;
	}
    }
    
    /* Now multiply each row and add the product to y. */
    *y = 0.0;
    for(i = 0; i < *n; i++){
	a = 1.0;
	for(j = 0; j < *m; j++){
	    a *= x[i][j];
	}
	*y += a;
      }
    } 

In R,
    system("R CMD SHLIB mypi4.c")
    dyn.load("mypi4.so")
    y = 1.0
    a = .C("pi103",as.double(y),as.integer(10),as.integer(3))
    a[[1]]
    ## to check, 
    prod(1:3)+prod(4:6)+prod(7:9)+prod(10:12)+
    prod(13:15)+prod(16:18)+prod(19:21)+prod(22:24)+prod(25:27)+prod(28:30)
    a = .C("pi103",result=as.double(y),as.integer(8),as.integer(4))
    a$result
    prod(1:4)+prod(5:8)+prod(9:12)+prod(13:16)+
    prod(17:20)+prod(21:24)+prod(25:28)+prod(29:32)

Note that you have to say "y = as.double(y)" rather than just as.double(y) 
if you want to be able to call the result with a$result.

To do this in R, you could do
  n = 8
  m = 4
  x = matrix(1:(n*m),byrow=T,ncol=m)
  y = apply(x,1,prod)
  sum(y)

You pretty much have to pass matrices in R to C as vectors.
Matrices in R are actually stored as long vectors, with their dimensions as 
2 attributes.
However, you can pass the matrices just as you would vectors, with one 
caveat. The matrices are read column by column, not row by row!

a = matrix(1:6,ncol=3,byrow=T)
a
a[1]
a[2]
a[3]
a[1:6]

In mymatrix.c,

#include <R.h>
#include <Rmath.h>

void matmult (double *a, int *n1, int *m1,
	      double *b, int *n2, int *m2,
	      double *y)
{
    int i,j,k;
    double x;
    double a1[*n1][*m1];
    double b1[*n2][*m2];    

    for(j = 0; j < *m1; j++) {
	for(i = 0; i < *n1; i++) {
	    a1[i][j] = a[j * *n1 + i];
	}
    }
    
    for(j = 0; j < *m2; j++) {
	for(i = 0; i < *n2; i++) {
	    b1[i][j] = b[j * *n2 + i];
	}
    }
	    
    for(i = 0; i < *n1; i++) {
	for(j = 0; j < *m2; j++) {
	    x = 0.0;
	    for(k = 0; k < *m1; k++) {
		x += a1[i][k] * b1[k][j];
	    }
	    y[i* *m2 + j] = x;  // y[i][j] = x;
	}
    }
}

In R,
    system("R CMD SHLIB mymatrix.c")
    dyn.load("mymatrix.so")
    a = matrix(1:6,ncol=3,byrow=T) ## or equivalently a = c(1,4,2,5,3,6)
    b = matrix(1:12,ncol=4,byrow=T)
    y = rep(0,8)
    x = .C("matmult",as.double(a),as.integer(2),as.integer(3),as.double(b),as.integer(3),
           as.integer(4),as.double(y))
    y = matrix(x[[7]], ncol=4, byrow=T)
    y
    a %*% b 

What if you want to define a vector of length n, or a matrix with 
a dimension n, and n is not an input or a known integer but instead 
something that must be calculated inside your function, or something that 
changes as you go through your function? Then you need to do dynamic 
memory allocation. We might get to this later.

4. Calling C functions from C.

In mysd1.c,
  
  /* Create a 10 x 4 matrix of integers and compute the sample SD of each row. */
  /* Row 1 will be (1, 4, 9, 16). Row 2 will be (25, 36, 49, 64), etc. */

  #include <R.h>
  #include <Rmath.h>
  
  double sd2(double *x, int n){
    int i;
    double xbar, b;
    xbar = 0.0;
    for(i = 0; i < n; i++){
      xbar += x[i];
    }
    xbar = xbar/n;
    b = 0.0; // b will be the sum of squared deviations
    for(i = 0; i < n; i++){
      b += pow(x[i] - xbar, 2);
    }
    return(pow(b/(n-1),0.5));
    }

  void rowsd (double *y){
    int i,j;
    double a;
    double x[10][4];
    for(i = 0; i < 10; i++) {
      for(j = 0; j < 4; j++){
        x[i][j] = pow(4*i + j + 1.0, 2);
      }
      y[i] = sd2(x[i], 4);
    }
  }
 
In R,
    system("R CMD SHLIB mysd1.c")
    dyn.load("mysd1.so")
    a = .C("rowsd",y=double(10))
    a$y
    ## to check it,
    sd(c(1,4,9,16))
    sd(c(25,36,49,64))

5. Running C from terminal.

In mypi5.c,
  
  #include <stdio.h>
  #include <math.h>

  int main ( ){
    int i;
    float x,y,h;
    printf("\n How many years old are you? ");
    scanf("%d", &i);
    printf("\n How long is the first side of your right triangle? ");
    scanf("%f", &x);
    printf("\n How long is the second side of your right triangle? ");
    scanf("%f", &y);
    h = sqrtf(x*x + y*y);
    printf("\n You are %d years old and the hypotenuse is %f . \n", i, h);
    return(i);
  }

In terminal on mac (or powershell on PC),
  cd Desktop 
  gcc mypi5.c -o mypi5.out  
  ./mypi5.out

6. Reading in from a file.

You first need a file pointer
  FILE *fp;

FILE is a structure that holds information about the file. 
Use FILE * to pass the information around by reference.

  fp = fopen(filename, mode);

filename is a string that holds the name of the file on disk 
(including a path like /cs/course if necessary).
mode is a string representing how you want to open the file. 
Most often you'll open a file for reading ("r") or writing ("w").

In data.txt,
  1
  14
  5
  37
  7
  2
  1
  100

In readin1.c,
  
  #include <stdio.h>
  #include <R.h>
  #include <Rmath.h>

  void readssq (int *n){
    /* Read in the first n terms in "data.txt"
    and compute the sum of squares.
    Write the output to "output.txt" */
    FILE *myfile1, *myfile2;
    int i;
    float a1, a2;
    
    a2 = 0.0;
    myfile1 = fopen("data.txt", "r");
    myfile2 = fopen("output.txt", "w");
    for(i = 0; i < *n; i++){
	fscanf(myfile1, "%f", &a1);
	a2 += pow(a1, 2);
    }
    fclose(myfile1);
    Rprintf("The sum of squares is %f \n", a2);
    fprintf(myfile2, "%f \n", a2);
    fclose(myfile2);
  }

In R,
    system("R CMD SHLIB readin1.c")
    dyn.load("readin1.so")
    a = .C("readssq",as.integer(5))

The input file that we are opening for reading ("r") 
must already exist, 
but the output file we are opening for writing ("w") 
does not have to exist. 
If it doesn't, it will be created. If this output file 
does already exist, 
its previous contents will be thrown away and lost.
Use "a" for append to write something to the end of a 
file without losing 
its contents.

In fprintf,
Integer values use %d,
Floating point values, i.e. decimals, use %f,
%lf for double precision numbers,
Single characters use %c,
Character strings use %s.

7. Computing an integral in C. 

A rough way to approximate an integral of some function, is to 
divide the interval along the x-axis into n pieces, and for each little piece, 
compute the area = width of the piece times f(x) at the 
left endpoint (or center, or right endpoint) 
of the piece, and add up those areas. 

## In expofile.c 

#include <R.h>
#include <Rmath.h>
 
double expo(double a, double b, double x){ 
    // Computes a exp(bx) 
	return(a * exp(b*x)); 
} 

void myintegral(double *start,double *stop,int *n, double *a, double *b, double *ans){
    /* integrates aexp(bx) from start to stop numerically, 
	using n rectangles. 
    */ 
    double x,t,inc; 
    long i; 
    inc = (*stop - *start) / (double) *n; 
    x = *start;
    t = *a * exp(*b * x);           
    for(i=1; i< *n; i++){
	x += inc; 
	t += *a * exp(*b * x); 
    }
    *ans = t * inc; 
}

  system("R CMD SHLIB expofile.c")
  dyn.load("expofile.so")
  a = 2.5 
  b = 3.5 
  c = 1.0 
  d = 4.0 
  n = 100000000
  y = .C("myintegral",as.double(c), as.double(d), as.integer(n), as.double(a), as.double(b),as.double(0.0)) 
  ## compare with the integral of a exp(bx) from x = c to d 
  ## = a/b * (exp(b*d) - exp(b*c)) 
  ## = 858979.4. 
  y[[6]] 

8. Bias in the sample sd. 

Given n iid observations from some population with mean Âµ and variance 
sigma^2, it is well known (and not hard to show) 
that the sample variance s^2 = âˆ‘ (Xi - Xbar)^2 / (n-1) 
is an unbiased estimate of sigma^2.
But is s an unbiased estimate of sigma? What is its bias?
Suppose the population is uniform (0,1).
If we were writing code to answer this question in R, we might do the 
following.

  ## R code to calculate bias in the sample sd of n uniforms on (0,1), 
  ## where n goes from 2 to nmax=100, and for each such n,
  ## we will generate n uniforms, 
  ## calculate the sample sd, and repeat m times, where m = 10000.
  ## The true variance of U(0,1) = E(X^2) - [E(X)]^2 = 1/3 - 1/4 = 1/12,
  ## so sigma = 1/sqrt(12). 
  nmax = 100
  n = 2:nmax
  m = 10000
  bias2 = rep(0,99) ## to be general 99 should be changed to nmax-1
  for(i in 1:99){
    s = rep(0,m)
    for(j in 1:m){
      x = runif(n[i])
      s[j] = sd(x)
      }
    bias2[i] = mean(s) - 1/sqrt(12)
  cat(i," ")
  }
  
  plot(n,bias2,xlab="n",ylab="bias", ylim=c(-.003,.001),
         main="bias in SD of n uniforms", type="l")
  abline(h=0,lty=2)
  ## There's a lot of scatter and noise, because m is so small. It's mostly noise.

In sd2.c,

#include <stdlib.h>
#include <R.h>
#include <Rmath.h>

double sd2(double *x, int j) {
    int i;
    double a1, xbar;
    
    xbar = 0.0;
    for(i = 0; i < j; i++) {
	xbar += x[i];
    }
    xbar = xbar / j;
    a1 = 0.0;
    for(i = 0; i < j; i++) {
	a1 += pow(x[i] - xbar,2);
    }
    a1 = pow(a1/(j-1), 0.5);
    return(a1);
}

double mean2(double *x, int j) {
    int i;
    double xbar;
    
    xbar = 0.0;
    for(i = 0; i < j; i++) {
	xbar += x[i];
    }
    xbar = xbar / j;
    return(xbar);
}

void bias3 (int *nmax, int *m, double *b){
    /* The output will be b, the bias in the sd for each n,
         from n=2 up to nmax. */
    int i,j,k;
    double x[*nmax];
    double s[*m];
    
    GetRNGstate();
    for(i = 2; i < *nmax+1; i++){   
	for(j = 0; j < *m; j++){
	    for(k = 0; k < i; k++){
		x[k] = runif(0,1);
	    }
	    s[j] = sd2(x, i);
	}
	b[i-1] = mean2(s, *m) - pow(12, -.5);
        if(i/1000 == floor(i/1000)) Rprintf("%i ",i);
    }
    PutRNGstate();
}

In R,

  system("R CMD SHLIB sd2.c")
  dyn.load("sd2.so")
  nmax = 100
  m = 100000
  a = .C("bias3",as.integer(nmax), as.integer(m),b = double(nmax))
  bias1 = a$b
  plot(c(2:nmax),bias1[2:nmax],xlab="n",ylab="bias",ylim=c(-.003,.001),
         main="bias in SD of n uniforms", type="l")
  abline(h=0,lty=2)

9. Bias in sample variance?

The sample variance s^2 = âˆ‘ (Xi - Xbar)^2 / (n-1).
s^2 is an unbiased estimate of sigma^2, regardless of n and the 
distribution of Xi. 
Last time we investigated the bias in s as an estimate of sigma,
supposing the random variables Xi are uniform (0,1).
What would it look like if we repeated this, 
but looked at s^2 instead of s?

In sd3.c,

#include <R.h>
#include <Rmath.h>

double s2(double *x, int j) {
    int i;
    double a1, xbar;
    
    xbar = 0.0;
    for(i = 0; i < j; i++) {
	xbar += x[i];
    }
    xbar = xbar / j;
    a1 = 0.0;
    for(i = 0; i < j; i++) {
	a1 += pow(x[i] - xbar,2);
    }
    a1 = a1/(j-1); // I changed this from a1 = pow(a1/(j-1), 0.5).
    return(a1);
}

double mean2(double *x, int j) {
    int i;
    double xbar;
    
    xbar = 0.0;
    for(i = 0; i < j; i++) {
	xbar += x[i];
    }
    xbar = xbar / j;
    return(xbar);
}

void bias4 (int *nmax, int *m, double *b){
    /* The output will be b, the bias in s2 for each n,
         from n=2 up to nmax. */
    int i,j,k;
    double x[*nmax];
    double s[*m];
    
    GetRNGstate();
    for(i = 2; i < *nmax+1; i++){   
	for(j = 0; j < *m; j++){
	    for(k = 0; k < i; k++){
		x[k] = runif(0,1);
	    }
	    s[j] = s2(x, i); // I changed this from sd2 to s2.
	}
	b[i-1] = mean2(s, *m) - 1.0/12.0; // I changed pow(12,-.5) to 1.0/12.0.
	if(i/1000 == floor(i/1000)) Rprintf("%i ",i);
    }
    PutRNGstate();
}

In R, 

  system("R CMD SHLIB sd3.c")
  dyn.load("sd3.so")
  nmax = 100
  m = 100000
  a = .C("bias4",as.integer(nmax), as.integer(m),b = double(nmax))
  bias1 = a$b
  plot(c(2:nmax),bias1[2:nmax],xlab="n",ylab="bias", ylim=c(-.003,.001),
         main="bias in sample var of n uniforms", type="l")
  abline(h=0,lty=2)

10. Generalized additive models with kernel regression.

Suppose you have a response Y, and 3 explanatory variables, X1, X2, and X3.
A generalized additive model is of the form

  Y = f1(X1) + f2(X2) + f3(X3) + eps, 

where eps = noise (iid, mean 0, variance sigma^2).
The idea is that the functions f1, f2, and f3 don't need to be linear or 
some particular parameteric form. You can estimate these functions and 
then see if they look linear, quadratic, exponential, or whatever.
R has a function gam() in the mgcv package, which fits these functions 
using penalized regression splines. However, one could also fit these 
functions by kernel regression, using your C kernel regression function, 
or more simply using ksmooth(), which does basically the same thing. 

Consider the following example. We will generate some fake data, and then 
fit Y = f1(x1) using kernel regression.
Then we will take the residuals from that, 
where residual[i] = Y[i] - f1(x1[i]),
and fit the residuals as a function of x2 by kernel regression, and 
keep going. You might want to do something like this for your projects.

x1 = runif(100)*10
x2 = runif(100)*20
x3 = runif(100)*30
y = x1^3 - 10*x1^2 + 5*x1 - .1*x2^3 + 2*x2^2 + 4*x2 + 12 + 7 * x3 + 14*rnorm(100)
plot(x1,y)
c1 = ksmooth(x1,y, bandwidth = bw.nrd(x1),x.points=x1)
lines(c1)
fit1 = rep(0,100)
fit1[order(x1)] = c1$y
y1 = y - fit1 ## these are the residuals from this first fit
## note that x1 is not ordered, so if you do lines(x1,y1), it won't look good
## because lines joins the first point to the second point to the third, etc.
## but you can do points(x1,fit1,pch=2) to check that things are going ok.
plot(x2,y1,ylab="residuals after first kernel smoothing")
c2 = ksmooth(x2,y1, bandwidth = bw.nrd(x2),x.points=x2)
lines(c2)
fit2 = fit1
fit2[order(x2)] = c2$y
## to check, try points(x2,fit2,pch=2)
y2 = y1 - fit2 ## residuals after 2nd fit
plot(x3, y2,xlab="x3",ylab="residuals after 2 kernel smoothings")
## Now suppose we think these residuals look like a linear function of x3, 
## so we decide to fit a line for this term.
c3 = lsfit(x3, y2)
abline(c3)
c3$coef[2] ## compare with 7
plot(x3,c3$resid,ylab="residuals after 2 kernels and a line")
par(mfrow=c(1,2))
hist(y2,nclass=20,xlab="residuals after 2 kernels",main="", xlim=range(c(y2,c3$resid)))
hist(c3$resid, nclass=20, xlab="residuals after 2 kernels and a line",main="", 
xlim=range(c(y2,c3$resid)))
par(mfrow=c(1,1))
## or, we could keep going and fit the function of x3 by kernel regression.
plot(x3, y2,xlab="x3",ylab="residuals after 2 kernel smoothings")
c4 = ksmooth(x3, y2, bandwidth = bw.nrd(x3),x.points=x3)
lines(c4)
fit3 = fit1
fit3[order(x3)] = c4$y
y3 = y2 - fit3
plot(x3, y3,ylab="residuals after 3 kernel smoothings")
par(mfrow=c(1,2))
x5 = hist(c3$resid, nclass=20, xlab="residuals after 2 kernels and a line",main="", 
xlim=range(c(y3,c3$resid)))
hist(y3, breaks=20, xlab="residuals after 3 kernels",main="", xlim=range(c(y3,c3$resid)))
## It's interesting to look at, and interpret, the estimates of f1, f2, 
## and f3, too.
par(mfrow=c(1,3))
plot(c1,type="l",xlab="x1",ylab="estimate of f1(x1)")
plot(c2,type="l",xlab="x2",ylab="estimate of f2(x2)")
plot(c4,type="l",xlab="x3",ylab="estimate of f3(x3)")

par(mfrow=c(1,1))

What might be a good idea for your project is to take some regression 
style data, fit a generalized additive model by kernel regression (and/or 
perhaps fitting terms that look linear or quadratic by linear or quadratic 
functions), and using C to get confidence intervals for your fitted curves. 

Alternatively, if you have a natural "factor" style response variable and would like to do classification, using several explanatory variables, using the methods in Irizarry's book, 
that would also be good.