-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor convert_input
to Perform tasks via helper function
#3338
Open
Sweetdevil144
wants to merge
29
commits into
PecanProject:develop
Choose a base branch
from
Sweetdevil144:gsoc/convert-input
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
614b8f9
Shift functions to check for missing files
Sweetdevil144 838af61
Update CHANGELOG
Sweetdevil144 d5e8d24
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 f22b962
Remove unutilized variables from convert_input
Sweetdevil144 d884203
Update logger statements in convert_input
Sweetdevil144 68d9516
Added seperate function to check machine info
Sweetdevil144 5208b02
Update input args to get machine info
Sweetdevil144 f570646
Correct roxygen documentations
Sweetdevil144 e479c46
Update tests
Sweetdevil144 d9e911b
Merge branch 'PecanProject:develop' into gsoc/convert-input
Sweetdevil144 ed581f7
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 63ac964
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 0f9ac13
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 b98617f
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 4b771d3
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 63f270f
Refactor extra variables in `run.meta.anbalysis`
Sweetdevil144 dbb7a6d
Merge branch 'PecanProject:develop' into gsoc/convert-input
Sweetdevil144 74003d9
get existing machine info using helper function
Sweetdevil144 95fb810
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 2bcb7c4
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 fcae9bd
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 c8e8a02
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 766174f
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 d9074df
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 94c92f0
Merge branch 'develop' into gsoc/convert-input
Sweetdevil144 a578be2
Applied changes as suggested by @infotroph
Sweetdevil144 293a68b
Minor review changes
Sweetdevil144 f7f6926
Update base/db/R/get.machine.info.R
Sweetdevil144 8f820b0
Apply suggestions from code review
Sweetdevil144 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
#' Return new arrangement of database while adding code to deal with ensembles | ||
#' | ||
#' @param result list of results from the download function | ||
#' @param con database connection | ||
#' @param start_date start date of the data | ||
#' @param end_date end date of the data | ||
#' @param write whether to write to the database | ||
#' @param overwrite Logical: If a file already exists, create a fresh copy? | ||
#' @param insert.new.file whether to insert a new file | ||
#' @param input.args input arguments obtained from the convert_input function | ||
#' @param machine machine information | ||
#' @param mimetype data product specific file format | ||
#' @param formatname format name of the data | ||
#' @param allow.conflicting.dates whether to allow conflicting dates | ||
#' @param ensemble ensemble id | ||
#' @param ensemble_name ensemble name | ||
#' @param existing.input existing input records | ||
#' @param existing.dbfile existing dbfile records | ||
#' @param input input records | ||
#' @return list of input and dbfile ids | ||
#' | ||
#' @author Betsy Cowdery, Michael Dietze, Ankur Desai, Tony Gardella, Luke Dramko | ||
|
||
add.database.entries <- function( | ||
result, con, start_date, | ||
end_date, overwrite, | ||
insert.new.file, input.args, | ||
machine, mimetype, formatname, | ||
allow.conflicting.dates, ensemble, | ||
ensemble_name, existing.input, | ||
existing.dbfile, input) { | ||
# Setup newinput. This list will contain two variables: a vector of input IDs and a vector of DB IDs for each entry in result. | ||
# This list will be returned. | ||
newinput <- list(input.id = NULL, dbfile.id = NULL) # Blank vectors are null. | ||
|
||
for (i in 1:length(result)) { # Master for loop | ||
id_not_added <- TRUE | ||
|
||
if (!is.null(existing.input) && nrow(existing.input[[i]]) > 0 && | ||
(existing.input[[i]]$start_date != start_date || existing.input[[i]]$end_date != end_date)) { | ||
# Updating record with new dates | ||
db.query( | ||
paste0( | ||
"UPDATE inputs SET start_date='", start_date, "', end_date='", end_date, | ||
"' WHERE id=", existing.input[[i]]$id | ||
), | ||
con | ||
) | ||
id_not_added <- FALSE | ||
|
||
# The overall structure of this loop has been set up so that exactly one input.id and one dbfile.id will be written to newinput every iteration. | ||
newinput$input.id <- c(newinput$input.id, existing.input[[i]]$id) | ||
newinput$dbfile.id <- c(newinput$dbfile.id, existing.dbfile[[i]]$id) | ||
} | ||
|
||
if (overwrite) { | ||
# A bit hacky, but need to make sure that all fields are updated to expected values (i.e., what they'd be if convert_input was creating a new record) | ||
if (!is.null(existing.input) && nrow(existing.input[[i]]) > 0) { | ||
db.query( | ||
paste0( | ||
"UPDATE dbfiles SET file_path='", dirname(result[[i]]$file[1]), | ||
"', file_name='", result[[i]]$dbfile.name[1], | ||
"' WHERE id=", existing.dbfile[[i]]$id | ||
), | ||
con | ||
) | ||
} | ||
|
||
if (!is.null(existing.dbfile) && nrow(existing.dbfile[[i]]) > 0) { | ||
db.query(paste0( | ||
"UPDATE dbfiles SET file_path='", dirname(result[[i]]$file[1]), | ||
"', file_name='", result[[i]]$dbfile.name[1], | ||
"' WHERE id=", existing.dbfile[[i]]$id | ||
), con) | ||
} | ||
} | ||
|
||
# If there is no ensemble then for each record there should be one parent | ||
# But when you have ensembles, all of the members have one parent !! | ||
parent.id <- if (is.numeric(ensemble)) { | ||
ifelse(is.null(input[[i]]), NA, input[[1]]$id) | ||
} else { | ||
ifelse(is.null(input[[i]]), NA, input[[i]]$id) | ||
} | ||
|
||
|
||
if ("newsite" %in% names(input.args) && !is.null(input.args[["newsite"]])) { | ||
site.id <- input.args$newsite | ||
} | ||
|
||
if (insert.new.file && id_not_added) { | ||
dbfile.id <- dbfile.insert( | ||
in.path = dirname(result[[i]]$file[1]), | ||
in.prefix = result[[i]]$dbfile.name[1], | ||
"Input", | ||
existing.input[[i]]$id, | ||
con, | ||
reuse = TRUE, | ||
hostname = machine$hostname | ||
) | ||
|
||
newinput$input.id <- c(newinput$input.id, existing.input[[i]]$id) | ||
newinput$dbfile.id <- c(newinput$dbfile.id, dbfile.id) | ||
} else if (id_not_added) { | ||
# This is to tell input.insert if we are writing ensembles | ||
# Why does it need it? Because it checks for inputs with the same time period, site, and machine | ||
# and if it returns something it does not insert anymore, but for ensembles, it needs to bypass this condition | ||
ens.flag <- if (!is.null(ensemble) || is.null(ensemble_name)) TRUE else FALSE | ||
|
||
new_entry <- dbfile.input.insert( | ||
in.path = dirname(result[[i]]$file[1]), | ||
in.prefix = result[[i]]$dbfile.name[1], | ||
siteid = site.id, | ||
startdate = start_date, | ||
enddate = end_date, | ||
mimetype = mimetype, | ||
formatname = formatname, | ||
parentid = parent.id, | ||
con = con, | ||
hostname = machine$hostname, | ||
allow.conflicting.dates = allow.conflicting.dates, | ||
ens = ens.flag | ||
) | ||
|
||
newinput$input.id <- c(newinput$input.id, new_entry$input.id) | ||
newinput$dbfile.id <- c(newinput$dbfile.id, new_entry$dbfile.id) | ||
} | ||
} # End for loop | ||
return(newinput) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#' Function to check if result has empty or missing files | ||
#' | ||
#' @param result A list of dataframes with file paths | ||
#' @param existing.input Existing input records | ||
#' @param existing.dbfile Existing dbfile records | ||
#' @return A list of dataframes with file paths, a list of strings with the output file name, a list of existing input records, and a list of existing dbfile records | ||
#' | ||
#' @author Betsy Cowdery, Michael Dietze, Ankur Desai, Tony Gardella, Luke Dramko | ||
|
||
check_missing_files <- function(result, existing.input = NULL, existing.dbfile = NULL) { | ||
result_sizes <- purrr::map_dfr( | ||
result, | ||
~ dplyr::mutate( | ||
., | ||
file_size = purrr::map_dbl(file, file.size), | ||
missing = is.na(file_size), | ||
empty = file_size == 0 | ||
) | ||
) | ||
|
||
if (any(result_sizes$missing) || any(result_sizes$empty)) { | ||
log_format_df <- function(df) { | ||
formatted_df <- rbind(colnames(df), format(df)) | ||
formatted_text <- purrr::reduce(formatted_df, paste, sep = " ") | ||
paste(formatted_text, collapse = "\n") | ||
} | ||
|
||
PEcAn.logger::logger.severe( | ||
"Requested Processing produced empty files or Nonexistent files:\n", | ||
log_format_df(result_sizes[, c(1, 8, 9, 10)]), | ||
"\n Table of results printed above.", | ||
wrap = FALSE | ||
) | ||
} | ||
|
||
|
||
# Wrap in a list for consistent processing later | ||
if (is.data.frame(existing.input)) { | ||
existing.input <- list(existing.input) | ||
} | ||
|
||
if (is.data.frame(existing.dbfile)) { | ||
existing.dbfile <- list(existing.dbfile) | ||
} | ||
return(list(existing.input, existing.dbfile)) | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's pull this function definition out to the end of the file -- I think the runtime overhead of nested function definitions is small, but I like to avoid them for clarity anyway.