Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downscale Function created for for hourly data in nc #3322

Merged
merged 23 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
50cca2c
v1 hourly downscale
Snafkin547 Jul 2, 2024
bf7adcb
Merge branch 'develop' into downscale/hourly
Snafkin547 Jul 15, 2024
95e9691
changelog updated
Snafkin547 Jul 15, 2024
32115a4
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 Jul 15, 2024
a214027
man file created created
Snafkin547 Jul 15, 2024
502bf89
import package list updated
Snafkin547 Jul 15, 2024
29272cf
Merge branch 'develop' into downscale/hourly
Snafkin547 Jul 17, 2024
df6de03
Merge branch 'develop' into downscale/hourly
Snafkin547 Jul 24, 2024
e143b38
Name added to CITATION
Snafkin547 Jul 24, 2024
c418c86
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 Jul 24, 2024
9c840cc
Merge branch 'develop' into downscale/hourly
Snafkin547 Jul 25, 2024
85ed698
Suggested change in namespace and file input style modified
Snafkin547 Jul 29, 2024
c6bb0d0
Time units uses lubridate
Snafkin547 Jul 29, 2024
16b84a4
Merge branch 'develop' into downscale/hourly
Snafkin547 Jul 29, 2024
900acee
downscale func takes time series
Snafkin547 Jul 29, 2024
8c4234a
downscale func takes time series
Snafkin547 Jul 29, 2024
a23e600
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 Jul 29, 2024
bfeb8e5
Time Zone Checked
Snafkin547 Jul 29, 2024
8cad31f
Merge branch 'develop' into downscale/hourly
Snafkin547 Jul 29, 2024
f56fd9c
Updated downscale based on comment
Snafkin547 Jul 31, 2024
30de7b9
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 Jul 31, 2024
6ca6b65
name space added
Snafkin547 Jul 31, 2024
84780a7
Merge branch 'develop' into downscale/hourly
mdietze Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ For more information about this file see also [Keep a Changelog](http://keepacha
- Added new feature of preparing initial conditions for MODIS LAI, AGB, ISCN SOC, and soil moisture across NA anchor sites.
- Added GEDI AGB preparation workflow.
- Added new feature of downloading datasets from the NASA DAAC ORNL database.
- Extended downscale function and created 'downscale_hrly' so that it handles more frequent data
- Added 'aggregate' as a new feature for downscaled data

### Fixed
Expand Down
2 changes: 2 additions & 0 deletions modules/assim.sequential/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ export(Prep_OBS_SDA)
export(Remote_Sync_launcher)
export(SDA_OBS_Assembler)
export(SDA_control)
export(SDA_downscale_hrly)
export(SDA_remote_launcher)
export(SDA_timeseries_plot)
export(adj.ens)
Expand Down Expand Up @@ -60,6 +61,7 @@ export(tobit_model_censored)
export(y_star_create)
import(furrr)
import(lubridate)
import(ncdf4)
import(nimble)
importFrom(dplyr,"%>%")
importFrom(lubridate,"%m+%")
Expand Down
103 changes: 103 additions & 0 deletions modules/assim.sequential/R/downscale_function_hrly.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#' SDA Downscale Function for Hourly Data
#'
#' This function uses the randomForest model to downscale forecast data (hourly) to unmodeled locations using covariates and site locations
#'
#' @author Harunobu Ishii
#' @param nc_file In quotes, file path for .nc containing ensemble data.
#' @param coords In quotes, file path for .csv file containing the site coordinates, columns named "lon" and "lat".
#' @param yyyy In string, format is yyyy(year of interest)
#' @param covariates SpatRaster stack, used as predictors in randomForest. Layers within stack should be named. Recommended that this stack be generated using 'covariates' instructions in assim.sequential/inst folder
#' @return It returns the `downscale_output` list containing lists for the training and testing data sets, models, and predicted maps for each ensemble member.
#' @export

SDA_downscale_hrly <- function(nc_file, coords, yyyy, covariates){

# Read the input data and site coordinates
nc_data <- ncdf4::nc_open(nc_file)
on.exit(ncdf4::nc_close(nc_data))
input_data <- ncvar_get(nc_data, "NEE")
mdietze marked this conversation as resolved.
Show resolved Hide resolved
Snafkin547 marked this conversation as resolved.
Show resolved Hide resolved
covariate_names <- names(covariates)


# Extract time and units
time <- nc_data$dim$time$vals
time_units <- nc_data$dim$time$units
time_origin_str <- substr(time_units, 12, 31)

# Check if timezone is specified in the time units string
if (grepl("UTC|GMT", time_units)) {
time_origin <- lubridate::ymd_hm(time_origin_str, tz = "UTC")
} else if (grepl("EST", time_units)) {
time_origin <- lubridate::ymd_hm(time_origin_str, tz = "EST")
} else {
time_origin <- lubridate::ymd_hm(time_origin_str, tz = "UTC") # Default to UTC if not specified
}

# Timereadable
if (grepl("hours", time_units)) {
time_readable <- time_origin + dhours(time)
Snafkin547 marked this conversation as resolved.
Show resolved Hide resolved
} else if (grepl("seconds", time_units)) {
time_readable <- time_origin + dseconds(time)
} else {
stop("Unsupported time units")
}

# Extract predictors from covariates raster using site coordinates
site_coordinates <- terra::vect(readr::read_csv(coords), geom=c("lon", "lat"), crs="EPSG:4326")
predictors <- as.data.frame(terra::extract(covariates, site_coordinates,ID = FALSE))

downscale_output<- list()

# Train & Test split
sample <- sample(1:nrow(predictors), size = round(0.75*nrow(predictors)))

# Predict for each time stamp of the year selected
time_indices <- which(year(time_readable) == yyyy)
for (index in time_indices) {
data <- input_data[index, , ]
carbon_data <- as.data.frame(data)
names(carbon_data) <- paste0("ensemble",seq(1:ncol(carbon_data)))

# Combine carbon data and covariates/predictors and split into training/test
full_data <- cbind(carbon_data, predictors)
train_data <- full_data[sample, ]
test_data <- full_data[-sample, ]

# Combine each ensemble member with all predictors
models <- list()
maps <- list()
predictions <- list()
ensembles <- list()
for (i in seq_along(carbon_data)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need any change for this PR, but making a note for those doing future development that this would be a great place to either parallelize this work or to modify the ML model to take time into account explicitly.

ensemble_col <- paste0("ensemble", i)
formula <- stats::as.formula(paste(ensemble_col, "~", paste(covariate_names, collapse = " + ")))
models[[i]] <- randomForest::randomForest(formula,
data = train_data,
ntree = 1000,
na.action = stats::na.omit,
keep.forest = TRUE,
importance = TRUE)

maps[[i]] <- terra::predict(covariates, model = models[[i]], na.rm = TRUE)
predictions[[i]] <- stats::predict(models[[i]], test_data)
}

# Organize the results into a single output list
curr_downscaled <- list( data = list(training = train_data, testing = test_data),
models = models,
maps = maps,
predictions = predictions
)

# Rename each element of the output list with appropriate ensemble numbers
for (i in 1:length(curr_downscaled$data)) {
names(curr_downscaled$data[[i]]) <- paste0("ensemble", seq(1:ncol(carbon_data)))
}
names(curr_downscaled$models) <- paste0("ensemble", seq(1:ncol(carbon_data)))
names(curr_downscaled$maps) <- paste0("ensemble", seq(1:ncol(carbon_data)))
names(curr_downscaled$predictions) <- paste0("ensemble", seq(1:ncol(carbon_data)))

downscale_output[[as.character(time_readable[index])]]<-curr_downscaled
}
return(downscale_output)
}
26 changes: 26 additions & 0 deletions modules/assim.sequential/man/SDA_downscale_hrly.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading