Skip to content

Commit

Permalink
improve pb_read and pb_write documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
tanho63 committed Dec 30, 2023
1 parent 6ff7768 commit 21a4737
Show file tree
Hide file tree
Showing 7 changed files with 184 additions and 22 deletions.
33 changes: 27 additions & 6 deletions R/pb_read.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,23 @@
#' A convenience wrapper around writing an object to a temporary file and then
#' uploading to a specified repo/release. This convenience comes at a cost to
#' performance efficiency, since it first downloads the data to disk and then
#' reads the data from disk into memory. See `vignette("duckdb_arrow")` for
#' reads the data from disk into memory. See `vignette("cloud_native")` for
#' alternative ways to bypass this flow and work with the data directly.
#'
#' @param file string: file name
#' @param repo string: GH repository name in format "owner/repo". Default
#' `guess_repo()` tries to guess based on current working directory's git repo
#' @param tag string: tag for the GH release, defaults to "latest"
#' @param read_function function: specifies how to read in the data. Default
#' tries to guess a function based on file extension (csv, rds, parquet, txt, json)
#' @param ... additional arguments passed to `read_function`
#' @param read_function function: used to read in the data, where the file is
#' passed as the first argument and any additional arguments are subsequently
#' passed in via `...`. Default `guess_read_function(file)` will check the file
#' extension and try to find an appropriate read function if the extension is one
#' of rds, csv, tsv, parquet, txt, or json, and will abort if not found.
#' @param ... additional arguments passed to `read_function` after file
#' @param .token GitHub authentication token, see [gh::gh_token()]
#'
#' @export
#' @family pb_rw
#'
#' @return Result of reading in the file in question.
#' @examples \donttest{
Expand Down Expand Up @@ -50,15 +54,32 @@ pb_read <- function(file,
read_function(file.path(tempdir(), file), ...)
}

#' Guess read function from file extension
#'
#' This function accepts a filename and tries to return a valid function for
#' reading it.
#'
#' `guess_read_function` understands the following file extensions:
#' - rds with `readRDS`
#' - csv, csv.gz, csv.xz with `utils::read.csv`
#' - tsv, tsv.gz, tsv.xz with `utils::read.delim`
#' - parquet with `arrow::read_parquet`
#' - txt, txt.gz, txt.xz with `readLines`
#' - json, json.gz, json.xz with `jsonlite::fromJSON`
#'
#' @family pb_rw
#' @param file filename to parse
#' @return function for reading the file, if found
#' @keywords internal
guess_read_function <- function(file){
file_ext <- tools::file_ext(gsub(x = file, pattern = ".gz$|.xz$", replacement = ""))
if (file_ext == "parquet") rlang::check_installed("arrow")

read_fn <- switch(
file_ext,
"rds" = readRDS,
"csv" = read.csv,
"tsv" = read.delim,
"csv" = utils::read.csv,
"tsv" = utils::read.delim,
"parquet" = arrow::read_parquet,
"txt" = readLines,
"json" = jsonlite::fromJSON,
Expand Down
35 changes: 29 additions & 6 deletions R/pb_write.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,26 @@
#' @param repo string: GH repository name in format "owner/repo". Default
#' `guess_repo()` tries to guess based on current working directory's git repo
#' @param tag string: tag for the GH release, defaults to "latest"
#' @param write_function function: specifies how to read in the data. Default
#' tries to guess a function based on file extension (csv, rds, txt, parquet, json)
#' @param write_function function: used to write an R object to file, where the
#' object is passed as the first argument, the filename as the second argument,
#' and any additional arguments are subsequently passed in via `...`. Default
#' `guess_write_function(file)` will check the file extension and try to find an
#' appropriate write function if the extension is one of rds, csv, tsv, parquet,
#' txt, or json, and will abort if not found.
#' @param ... additional arguments passed to `write_function`
#' @param .token GitHub authentication token, see [gh::gh_token()]
#'
#' @export
#' @family pb_rw
#'
#' @return Writes file to release and returns github API response
#' @examples \donttest{
#' if (interactive()) {
#' \dontshow{if (interactive()) \{}
#' pb_write(mtcars, "mtcars.rds", repo = "tanho63/piggyback-tests")
#' #> ℹ Uploading to latest release: "v0.0.2".
#' #> ℹ Uploading mtcars.rds ...
#' #> |===============================================================| 100%
#' }
#' \dontshow{\}}
#'}
pb_write <- function(x,
file,
Expand All @@ -43,17 +48,35 @@ pb_write <- function(x,
pb_upload(destfile, repo = repo, tag = tag, .token = .token)
}

#' Guess write function from file extension
#'
#' This function accepts a filename and tries to return a valid function for
#' writing to it.
#'
#' `guess_write_function` understands the following file extensions:
#' - rds with `saveRDS`
#' - csv, csv.gz, csv.xz with `utils::write.csv`
#' - tsv, tsv.gz, tsv.xz with a modified `utils::write.csv` where sep is set to `"\t"`
#' - parquet with `arrow::write_parquet`
#' - txt, txt.gz, txt.xz with `writeLines`
#' - json, json.gz, json.xz with `jsonlite::write_json`
#'
#' @family pb_rw
#' @param file filename to parse
#' @return function for reading the file, if found
#' @keywords internal
guess_write_function <- function(file){
file_ext <- tools::file_ext(gsub(x = file, pattern = ".gz$|.xz$", replacement = ""))
if (file_ext == "parquet") rlang::check_installed("arrow")

write_fn <- switch(
file_ext,
"rds" = saveRDS,
"csv" = write.csv,
"csv" = utils::write.csv,
"tsv" = function(x, file, ..., sep = "\t") utils::write.csv(x = x, file = file, sep = sep, ...),
"txt" = writeLines,
"parquet" = arrow::write_parquet,
"json" = jsonlite::toJSON,
"json" = jsonlite::write_json,
cli::cli_abort("File type {.val {file_ext}} is not recognized, please provide a {.arg write_function}")
)

Expand Down
37 changes: 37 additions & 0 deletions man/guess_read_function.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

37 changes: 37 additions & 0 deletions man/guess_write_function.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 26 additions & 3 deletions man/pb_download_url.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 14 additions & 4 deletions man/pb_read.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 14 additions & 3 deletions man/pb_write.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 21a4737

Please sign in to comment.