#Hypothesis testing for population proportion p - simulated-based inference


#Suppose we want to test H0: p=1/2 Vs. Ha: p!=1/2, e.g. whether a coin is fair.
#Flip the coin n=16 times and obtain the sample proportion phat=x/n, where x=# of tails among the n tosses.

#Suppose we see 10 tails in these 16 tosses.  So phat=10/16=0.625.

#Under the null H0 hypothesis p=1/2.  Use R to generate many samples of size 16.  Say, we generate k=1000 sample.  For each sample compute the sample proportion.  Here are the commands:

x <- c("heads", "tails")

k <- 1000
n <- 16

#Create a vector of 16000 outcomes and then collapse the vector into a matrix of k=1000 columns and n=16 rows.  Each column represents a random sample of 16 tosses.
toss <- matrix(sample(x, n*k, replace=TRUE), n,k)

#Compute the sample proportion for each sample (column by column):
r <- colSums(toss == "heads") / n

#Compute the histogram using these 1000 sample proportions.
hist(r)

#Place phat from the original data on the histogram and count how many of the 1000 simulated phat values are larger than the original phat.
segments(.625,0,.625,200, col="green", lwd=5)

sum(r>0.625)

#What is your conclusion?
============================================================================
============================================================================
#Another example where the two outcomes are not equally likely:
An experimenter has prepared a drug dosage level that he claims will induce 
sleep for at least 80 % of those people suffering from insomnia.  After 
examining the dosage, we feel that his claims regarding the effectiveness 
of the dosage are inflated.  In an attempt to disprove his claim, we 
administer his prescribed dosage to 20 insomniacs, and we observe X=12, the 
number having sleep induced by the drug dose.  We wish to test the hypothesis 
H_0: p=0.8 against the alternative H_a: p<0.8.  

x <- c("yes", "no")

k <- 1000
n <- 20

#Create a vector of 20000 outcomes and then collapse the vector into a matrix of k=1000 columns and n=20 rows.  Each column represents a random sample of 20 patients.  Note:  The  outcomes in this example are not equally likely.  Under the null hypothesis the "yes" has probability 80 % while the "no" has probability 20%.  Here are the R commands:
simu <- matrix(sample(x, n*k, replace=TRUE, prob=c(0.80,0.20)), n,k)

#Compute the sample proportion for each sample (column by column):
r <- colSums(simu == "yes") / n

#Compute the histogram using these 1000 sample proportions.
hist(r)

#Place phat from the original data on the histogram and count how many of the 1000 simulated phat values are less than the original phat.
segments(.6,0,.6,200, col="green", lwd=5)

sum(r < 0.60)

#What is your conclusion?

#More questions:
#Suppose we decided beforehand that RR={X <= 13).  Find Type I error (alpha).

#Suppose we decided beforehand that RR={X <= 13).  Find Type II error (beta) if the actual p=0.45.