-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path06_student_t-tests.qmd
118 lines (72 loc) · 3.5 KB
/
06_student_t-tests.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# Student t-tests
A t-test is a statistical hypothesis test in which the test statistic
follows a t-distribution. Three common applications of the t-test are:
- A one-sample t-test of whether the mean of a normally distributed
population is a particular value ($H_0: \mu = 70$)
- An independent samples t-test to determine whether the means of two
normally distributed populations are equal ($H_0: \mu_1 = \mu_2$)
- A paired samples t-test to determine whether the means of two
normally distributed populations are equal ($H_0: \mu_1 = \mu_2$)
Another t-test arises when testing whether a regression parameter equals
0 ($H_0: \beta_1 = 0$).
## One-sample t-test
Clean up the workspace and import `height.csv` into
**[R]{.sans-serif}**.
`> rm(list=ls()) # removes all objects`
`> dat = read.csv("height.csv", header=TRUE)`
`> head(dat)`
Let's start with a one-sample t-test. Test the hypothesis that the
population mean height for men is 70 inches ($H_0:\mu_{men} = 70$)
First we need to get the subset where gender $==$ male.
`> dat$gender == "male"`
`> men.ht = dat$height[dat$gender == "male"]`
Before doing any formal analyses, it is always a good idea to summarize
the data both numerically and visually.
`> summary(men.ht)`
`> boxplot(men.ht)`
`> hist(men.ht, main="My First Histogram in R!")`
The `t.test` function will perform a one sample t-test for the height of
men. Suppose we wish to test the hypothesis $H_0: \mu_{men} = 70$ versus
the alternative $H_a: \mu_{men} \neq 70$. We must give the `t.test`
function 1) the vector of male height values, and 2) the value of the
mean under the null hypothesis. The default alternative is two-sided.
`> t.test(men.ht, mu=70)`
What if you want to calculate a 90% confidence interval?
`> help(t.test)`
The *conf.level* argument will be used to change the confidence level.
`> t.test(men.ht, mu=70, conf.level= 0.90)`
`> t.out = t.test(men.ht, mu=70, conf.level= 0.90)`
`> names(t.out)`
`> t.out$conf.int`
## Independent Samples t-tests
Now suppose we wish to test whether there is a difference between the
population mean heights for men and women ($H_0: \mu_m = \mu_w$). Before
running the t-test, let's make side-by-side boxplots to visually compare
the heights of men and women.
`> boxplot(men.ht, women.ht)`
### The formula operator ($\sim$)
It is possible to avoid creating separate vectors for two groups that
you would like to compare. This is done with the formula operator and
many R functions can accept a formula as the first argument. The second
argument must then nearly always be the data.frame where the formula
should be evaluated.
`> boxplot(height `$\sim$` gender, data=dat)`
If you don't include the data argument, you'll likely get an error
message:
`> boxplot(height `$\sim$` gender)` \# error: data not found?
The variables in the `dat` data.frame can be made available temporarily
by wrapping the command inside a call to `with`:
`> with(dat, boxplot(height `$\sim$` gender))`
If you need access to the variables in a data.frame for an extended
session and you aren't going to change the variables in the data.frame,
you can use the `attach(dat)` function to access height and gender
directly. However, using `attach` is discouraged by some analysts. It
puts a second copy of the data.frame on the "search path" and can make
updating your data tricky. **Remember to `detach` when you are done.**
`> height`
`> attach(dat)`
`> height`
`> detach(dat)`
`> height`
Now perform the independent samples t-test.
`> t.test(height `$\sim$` gender, data=dat)`