# Inside the black box, part 1: the one sample t-test

Figure 1. A figure produced using grid library view ports.

In a typical introductory statistics course, you’re given some assumptions, and provided with formulas for churning out p-values and confidence intervals. But statistics might be left feeling rather like a black box: we’re turning input into output, but it might seem difficult  to keep track of how all those assumptions and formulas work (or don’t).

One of the simplest stats tests that you’ll see in a first statistics course is the one sample t-test for testing the null hypothesis that the population mean for a random variable with a normal distribution is equal to zero. To implement this test in R all you need is the t.test( ) function. But what are the inner workings? The R script given below generates a pseudorandom sample and then puts it through the standard R function for a t.test, as well as a user defined version that should give you just about the same p-value. But wait… inside the t-test “box” other mysteries are to be found. For example: Where did the t-distribution come from in the first place? How should you interpret that p-value? And what’s up with that null hypothesis anyways? Sometimes asking questions leads to more questions… .

You can run the script provided below by cutting and pasting it onto the R command line, and pressing <enter>.

R Script

data = rnorm(20,0,1);

T.TEST = function(data){
#a one sample t test against the null hypothesis that the population mean is zero.

n = length(data);
df = n-1;
t = sqrt(n)*mean(data)/sd(data);
p.value = 2 * (pt(-abs(t),df));
output=list(t,p.value);
names(output) = c(quote(t),quote(p.value));
return(output);
}

t.test(data);
T.TEST(data);

References

R Core Team (2012). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/