*Introduction to Statistical Methods for the Life and Health Sciences*

http://www.stat.ucla.edu/~dinov/ |

** **

** **

**Objectives:**

**(1) **summarizing
datasets graphically and numerically

**(2) **exploring
relationships between two variables

**(3) **reading
a description of data

**(4) **learn
some new STATA commands

STATA commands that you will find helpful today and in the future:

**Graph** varname1**,** **box** produces a boxplot

**Tabulate** varname1 for one- and two-way frequency tables

**Graph** varname1 varname2 produces a scatterplot

**Tabulate** varname1**, plot** produces a bar chart of relative
frequencies in a one-way table

**Graph** varname1 varname2**, symbol()** changes
the labels for the variables in scatterplot

**Graph** varname1 varname2**, pen()** changes
the color of graphs and data points

**Graph** varname1**, title()** to
label graphs

**Summarize** varname1**, detail**

Go to http://www.stat.ucla.edu/projects/datasets and read the description of the “Ant Study,” a research project carried out by a UCLA faculty member. Review the codebook and identify each variable as quantitative or qualitative (categorical). In this lab, we will focus on the “Thatch Ant” data.

1. Which variables are quantitative?

http://www.stat.ucla.edu/projects/datasets/thatch-ant.dta

Let’s explore the distributions of “mass” and “headwidth” graphically and numerically. We’ll explore them separately first and then look at their relationship.

For each variable, do the following.

2. Create a boxplot, be sure to title it using the stata command, and print it.

To title a plot, in this case a boxplot, type

**Graph** *varname1*, **box** **title**(*title*)

3. Determine the median, the first quartile, and the third quartile and write them in on the boxplot. Identify the outliers in each boxplot, if there are any. Identify which colony each outlier belongs to. Is there anything systematic about which colony the outliers belong to?

To identify observations in a graph (in this case, the outliers), type

**Graph** *varname1*, **box** **symbol (**

4. Use 1.5 x IQR to determine if there are outliers. Do your calculations agree with the boxplot?

5. Now determine the sample average and standard deviation.

6. Approximately 68% of the data should fall between ______________ and ____________.

7. What is the actual percentage of data that falls between those two values? You probably want to use the tabulate command here.

8. If you square the standard deviation, what value do you get? What is the square of the standard deviation called?

9. What is the value in each distribution at which 90% of the data falls above it?

10. Now create a scatterplot of mass and headwidth, be sure to title it using the stata command, and print it. Assign mass as the explanatory variable and headwidth as the response variable (although it is probably arbitrary in this case). Remember that the explanatory variable should be on the x-axis and the response variable should be on the y-axis. Describe their relationship. Does it look linear or not?

You may want to try some additional examples.

And compare your work to the template solution (this is not a unique solution, just a template)