forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_submission.Rmd
165 lines (132 loc) · 5.16 KB
/
PA1_submission.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
Reproducible Research: Peer Assessment 1
========================================
```{r, echo= FALSE}
setwd("C:/Users/benjamin/R_Coursera/RepData_PeerAssessment1")
```
## Loading and preprocessing the data
```{r}
dataFile <- unzip( 'activity.zip' )
activity <- read.csv( dataFile, colClasses= c( 'numeric', 'Date', 'numeric') )
```
## What is mean total number of steps taken per day?
### Group the steps by day:
```{r, fig.width= 10}
stepsPerDay <- aggregate( steps ~ date, data= activity, FUN= sum)
```
### A basic analysis of steps grouped by day:
```{r, fig.width= 10}
library( ggplot2 )
qplot( stepsPerDay$steps, xlab= 'Date', binwidth= 2000 )
meanSteps = mean( stepsPerDay$steps)
medianSteps = median( stepsPerDay$steps )
```
- Mean: `r meanSteps`
- Median: `r medianSteps`
## What is the average daily activity pattern?
```{r, fig.width= 10}
aveStepsPerInterval <- aggregate( steps ~ interval,
data= activity,
FUN= mean)
qplot( aveStepsPerInterval$interval, aveStepsPerInterval$steps,
data= aveStepsPerInterval,
geom= 'line',
xlab= 'Interval',
ylab= 'Average Steps')
maxInterval = aveStepsPerInterval[ which.max( aveStepsPerInterval$steps ),]$interval
```
- The interval with the maximum number of steps is `r maxInterval`.
## Imputing missing values
```{r}
naRows <- which( is.na( activity) )
naCount <- length( naRows )
```
There are `r naCount` rows that contain NA data. Let's impute them by finding the mean between the previous non-NA value and the next non-NA value and create a new data frame.
```{r}
activityCount <- nrow( activity )
impActivity <- activity
## Initialize nextSteps in case the dataset starts with a series of NA
## values for steps.
nextSteps <- NA
for ( i in 1:naCount ) {
row <- naRows[i]
## If the first row in the dataset is NA, set previousSteps to 0.
## Otherwise, set previousSteps to the steps of the previous row
## in the activity data frame.
if ( row == 1 ) {
previousSteps <- 0
} else {
previousSteps <- impActivity[row - 1,]$steps
}
## If the last row in the dataset is NA, set nextSteps to previousSteps.
## Otherwise, if the rownumber of the current row is not sequential with
## the rownumber of the next row in naRows, set nextSteps to the number
## of steps in the next row of impActivity. This prevents repetition
## when finding the next non-NA row in the next step.
if ( row == activityCount ) {
nextSteps <- previousSteps
} else {
nextRow <- naRows[i + 1]
if ( nextRow - row != 1 ) {
nextSteps <- impActivity[row + 1,]$steps
}
}
## While nextSteps is NA, read ahead in the impActivity data frame to
## find the next non-NA step value.
counter <- 1
while( is.na( nextSteps ) == TRUE ) {
## If the loop is at the end of the dataset, set nextSteps to
## previous steps.
if ( row + counter >= activityCount ){
nextSteps <- previousSteps
} else {
nextSteps <- impActivity[row + counter,]$steps
counter <- counter + 1
}
}
## Change this row's step value from NA to the mean of the step values
## of the previous step and the next non-NA step.
impActivity[row, ]$steps <- mean( c(previousSteps, nextSteps) )
}
```
### Group the steps by day for the imputed data set:
```{r, fig.width= 10}
impStepsPerDay <- aggregate( steps ~ date, data= impActivity, FUN= sum)
```
### A basic analysis of steps grouped by day with imputed steps:
```{r, fig.width= 10}
library( ggplot2 )
qplot( impStepsPerDay$steps, xlab= 'Date', binwidth= 2000 )
meanImpSteps = mean( impStepsPerDay$steps)
medianImpSteps = median( impStepsPerDay$steps )
```
- Mean: `r meanImpSteps`
- Median: `r medianImpSteps`
The all of the NA values were set to 0 which lowers the estimates calculated in the first part of the assignment. The daily number of steps remains the same.
## Are there differences in activity patterns between weekdays and weekends?
### Create a new factor variable in the imputed dataset with the levels "weekday" and "weekend".
```{r}
## A function to determine if a given day occurs in the weekend or work week.
findDayType <- function( date ) {
if ( weekdays( date ) %in% c('Sunday', 'Saturday') ) {
dayType <- 'weekend'
} else {
dayType <- 'weekday'
}
dayType
}
## Add the factor dayType using the findDayType function.
impActivity$dayType <- as.factor( sapply( impActivity$date, findDayType) )
```
### Compare the average number of steps per five minute intervals of weekdays and weekend days.
```{r, fig.width= 10}
aveStepsPerInterval <- aggregate( steps ~ interval + dayType,
data= impActivity,
FUN= mean)
qplot( aveStepsPerInterval$interval,
aveStepsPerInterval$steps,
data= aveStepsPerInterval,
facets= dayType ~ .,
geom= 'line',
xlab= 'Interval',
ylab= 'Average Steps')
```