Skip to content

Commit

Permalink
Isabella Villanueva - Final Lab 3 Draft
Browse files Browse the repository at this point in the history
This is my finalized draft of my Lab 3!
  • Loading branch information
bellavillan committed Sep 13, 2024
0 parents commit cc15367
Showing 1 changed file with 222 additions and 0 deletions.
222 changes: 222 additions & 0 deletions Lab 3.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
---
title: "Lab 3"
format: html
editor: visual
---

## Lab 3 - Isabella Villanueva

##1. Read in the data

```{r}
download.file(
"https://raw.githubusercontent.com/USCbiostats/data-science-data/master/02_met/met_all.gz",
destfile = file.path("~", "Downloads", "met_all.gz"),
method = "libcurl",
timeout = 60
)
met <- data.table::fread(file.path("~", "Downloads", "met_all.gz"))
met <- as.data.frame(met)
```

##2. Check the dimensions, headers, footers.

**How many columns, rows are there?**

```{r}
dim(met)
```

Columns: 2,377,343

Rows: 30

##3. Take a look at the variables.

**What are the names of the key variables related to our question of interest?**

Based on this lab's objective of finding the weather station with the highest elevation, and looking at patterns in the time series of its wind speed and temperature: the key variables of this lab would be

*variables relating to time: "year", "day", "hour"*

*variables relating to location: "lat", "long"*

*variables relating to temperature: "temp"*

*variables relating to elevation: "elev"*

*variables relating to wind speed: "wind.sp"*

```{r}
table(str(met))
```

##4. Take a closer look at the key variables.

```{r}
table(met$year)
```

```{r}
table(met$day)
```

```{r}
table(met$hour)
```

```{r}
barplot(table(met$hour))
```

```{r}
summary(met$temp)
```

```{r}
hist(met$temp)
```

*At what elevation is the highest weather station?*

```{r}
met <- met[met$temp > -40, ]
head(met[order(met$temp), ])
```

Using the maximum value of this table and removing the 9999 values to be NA, the highest weather station is the 4113.

```{r}
met$elev[met$elev==9999.0] <- NA
summary(met$elev)
```

```{r}
summary(met$wind.sp)
```
```{r}
met$wind.sp[met$wind.sp==9999.0] <- NA
summary(met$wind.sp)
```

**How many missing values are there in the wind.sp variable?**
Using the maximum value of this table and removing the 9999 values to be NA, the highest weather station is the 36.

##5. Check the data against an external data source.
**Where was the location for the coldest temperature readings (-17.2C)? Do these seem reasonable in context?**

Checking the latitude and longitude coordinates found from this table:

```{r}
met <- met[met$temp > -40, ]
head(met[order(met$temp), ])
```
The coordinates: (38.767, -104.3) pinpointed a location in Yoder, Colorado, just East of Colorado Springs.

**Does the range of values for elevation make sense? Why or why not?**
The data was collected in August, which would not be reasonable in the context of Colorado in late summer.

##6. Calculate summary statistics
```{r}
elev <- met[which(met$elev == max(met$elev, na.rm = TRUE)), ]
summary(elev)
```

```{r}
cor(elev$temp, elev$wind.sp, use="complete")
```

```{r}
## [1] -0.09373843
```

```{r}
cor(elev$temp, elev$hour, use="complete")
```

```{r}
## [1] 0.4397261
```

```{r}
cor(elev$wind.sp, elev$day, use="complete")
```

```{r}
## [1] 0.3643079
```

```{r}
cor(elev$wind.sp, elev$hour, use="complete")
```

```{r}
## [1] 0.08807315
```

```{r}
cor(elev$temp, elev$day, use="complete")
```

```{r}
## [1] -0.003857766
```

##7. Exploratory graphs

**Use the hist function to make histograms of the elevation, temperature, and wind speed variables for the whole dataset**

```{r}
hist(met$elev)
```

```{r}
hist(met$temp)
```

```{r}
hist(met$wind.sp)
```

```{r}
library(leaflet)
leaflet(elev) |>
addProviderTiles('OpenStreetMap') |>
addCircles(lat=~lat,lng=~lon, opacity=1, fillOpacity=1, radius=100)
```

```{r}
library(lubridate)
elev$date <- with(elev, ymd_h(paste(year, month, day, hour, sep= ' ')))
summary(elev$date)
```

```{r}
elev <- elev[order(elev$date), ]
head(elev)
```

```{r}
library(lubridate)
elev$date <- with(elev, ymd_h(paste(year, month, day, hour, sep= ' ')))
head(elev)
```

```{r}
plot(elev$date, elev$temp, type="l", cex=0.5)
```
**Summary**
This correlation between temperature and date shows a very active set of temperatures ranging by day, where the peaks of temperature and lowest temperatures occur each day. This plot shows higher peaks in the later days of August, and the peaks become lower as September approaches.


```{r}
plot(elev$date, elev$wind.sp, type="l", cex=0.5)
```

**Summary**
This correlation shows the wind speed over a period of time between August to the beginning of September. This plot shows a general increase of wind speed over time then shows the lowest wind speed measurements found in the middle of August (after August 19's peak). The wind speed then peaks again nearer to August 26.

##8. Ask questions
One question I do have concerns the errors in data where the temperature found to be -40 degrees C to be consistently measured when that level of cold temperature is near impossible in the United States. What could have caused this reading, was it due to user/ human error?

0 comments on commit cc15367

Please sign in to comment.