diff --git a/Lab 3.qmd b/Lab 3.qmd new file mode 100644 index 0000000..35da416 --- /dev/null +++ b/Lab 3.qmd @@ -0,0 +1,222 @@ +--- +title: "Lab 3" +format: html +editor: visual +--- + +## Lab 3 - Isabella Villanueva + +##1. Read in the data + +```{r} +download.file( + "https://raw.githubusercontent.com/USCbiostats/data-science-data/master/02_met/met_all.gz", + destfile = file.path("~", "Downloads", "met_all.gz"), + method = "libcurl", + timeout = 60 +) + +met <- data.table::fread(file.path("~", "Downloads", "met_all.gz")) +met <- as.data.frame(met) +``` + +##2. Check the dimensions, headers, footers. + +**How many columns, rows are there?** + +```{r} +dim(met) +``` + +Columns: 2,377,343 + +Rows: 30 + +##3. Take a look at the variables. + +**What are the names of the key variables related to our question of interest?** + +Based on this lab's objective of finding the weather station with the highest elevation, and looking at patterns in the time series of its wind speed and temperature: the key variables of this lab would be + +*variables relating to time: "year", "day", "hour"* + +*variables relating to location: "lat", "long"* + +*variables relating to temperature: "temp"* + +*variables relating to elevation: "elev"* + +*variables relating to wind speed: "wind.sp"* + +```{r} +table(str(met)) +``` + +##4. Take a closer look at the key variables. + +```{r} +table(met$year) +``` + +```{r} +table(met$day) +``` + +```{r} +table(met$hour) +``` + +```{r} +barplot(table(met$hour)) +``` + +```{r} +summary(met$temp) +``` + +```{r} +hist(met$temp) +``` + +*At what elevation is the highest weather station?* + +```{r} +met <- met[met$temp > -40, ] +head(met[order(met$temp), ]) +``` + +Using the maximum value of this table and removing the 9999 values to be NA, the highest weather station is the 4113. + +```{r} +met$elev[met$elev==9999.0] <- NA +summary(met$elev) +``` + +```{r} +summary(met$wind.sp) +``` +```{r} +met$wind.sp[met$wind.sp==9999.0] <- NA +summary(met$wind.sp) +``` + +**How many missing values are there in the wind.sp variable?** +Using the maximum value of this table and removing the 9999 values to be NA, the highest weather station is the 36. + +##5. Check the data against an external data source. +**Where was the location for the coldest temperature readings (-17.2C)? Do these seem reasonable in context?** + +Checking the latitude and longitude coordinates found from this table: + +```{r} +met <- met[met$temp > -40, ] +head(met[order(met$temp), ]) +``` +The coordinates: (38.767, -104.3) pinpointed a location in Yoder, Colorado, just East of Colorado Springs. + +**Does the range of values for elevation make sense? Why or why not?** +The data was collected in August, which would not be reasonable in the context of Colorado in late summer. + +##6. Calculate summary statistics +```{r} +elev <- met[which(met$elev == max(met$elev, na.rm = TRUE)), ] +summary(elev) +``` + +```{r} +cor(elev$temp, elev$wind.sp, use="complete") +``` + +```{r} +## [1] -0.09373843 +``` + +```{r} +cor(elev$temp, elev$hour, use="complete") +``` + +```{r} +## [1] 0.4397261 +``` + +```{r} +cor(elev$wind.sp, elev$day, use="complete") +``` + +```{r} +## [1] 0.3643079 +``` + +```{r} +cor(elev$wind.sp, elev$hour, use="complete") +``` + +```{r} +## [1] 0.08807315 +``` + +```{r} +cor(elev$temp, elev$day, use="complete") +``` + +```{r} +## [1] -0.003857766 +``` + +##7. Exploratory graphs + +**Use the hist function to make histograms of the elevation, temperature, and wind speed variables for the whole dataset** + +```{r} +hist(met$elev) +``` + +```{r} +hist(met$temp) +``` + +```{r} +hist(met$wind.sp) +``` + +```{r} +library(leaflet) +leaflet(elev) |> + addProviderTiles('OpenStreetMap') |> + addCircles(lat=~lat,lng=~lon, opacity=1, fillOpacity=1, radius=100) +``` + +```{r} +library(lubridate) +elev$date <- with(elev, ymd_h(paste(year, month, day, hour, sep= ' '))) +summary(elev$date) +``` + +```{r} +elev <- elev[order(elev$date), ] +head(elev) +``` + +```{r} +library(lubridate) +elev$date <- with(elev, ymd_h(paste(year, month, day, hour, sep= ' '))) +head(elev) +``` + +```{r} +plot(elev$date, elev$temp, type="l", cex=0.5) +``` +**Summary** +This correlation between temperature and date shows a very active set of temperatures ranging by day, where the peaks of temperature and lowest temperatures occur each day. This plot shows higher peaks in the later days of August, and the peaks become lower as September approaches. + + +```{r} +plot(elev$date, elev$wind.sp, type="l", cex=0.5) +``` + +**Summary** +This correlation shows the wind speed over a period of time between August to the beginning of September. This plot shows a general increase of wind speed over time then shows the lowest wind speed measurements found in the middle of August (after August 19's peak). The wind speed then peaks again nearer to August 26. + +##8. Ask questions +One question I do have concerns the errors in data where the temperature found to be -40 degrees C to be consistently measured when that level of cold temperature is near impossible in the United States. What could have caused this reading, was it due to user/ human error? +