Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-18487: [R] Read Text (CSV/JSON) from character vector (apach…
…e#33968) ### Rationale for this change Allows literal strings to be read directly through the `I()` function in the same way as the `readr::read_csv()` function. This is useful for checking behavior without the need to create temporary files. ```r > read_csv_arrow(I("x,y\n1,2\n3,4")) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c( "x,y 1,2 3,4" ))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ```r > read_csv_arrow(I(c("x,y", "1,2", "3,4"))) # A tibble: 2 × 2 x y <int> <int> 1 1 2 2 3 4 ``` ### What changes are included in this PR? In `read_csv_arrow` and `read_json_arrow`, if the first argument `file` inherits `AsIs` class, `file` is now interpreted as literal data. This is consistent with the behavior of `readr::read_csv()`, which is widely used to read text files as data frames. This is a breaking change; the behavior of wrapping a path as a string with `I()` is changed. For example #### readr::read_csv ```r > readr::read_csv(I(readr::readr_example("mtcars.csv"))) Rows: 0 Columns: 1 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "," chr (1): /usr/local/lib/R/site-library/readr/extdata/mtcars.csv ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # A tibble: 0 × 1 # … with 1 variable: # /usr/local/lib/R/site-library/readr/extdata/mtcars.csv <chr> # ℹ Use `colnames()` to see all variable names ``` #### arrow 10.01's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) # A tibble: 32 × 11 mpg cyl disp hp drat wt qsec vs am gear carb <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 # … with 22 more rows # ℹ Use `print(n = ...)` to see more rows ``` #### This PR's arrow::read_csv_arrow ```r > arrow::read_csv_arrow(I(readr::readr_example("mtcars.csv"))) Error: ! Invalid: CSV parse error: Empty CSV file or block: cannot infer number of columns Run `rlang::last_error()` to see where the error occurred. ``` * Closes: apache#18487 Lead-authored-by: SHIMA Tatsuya <[email protected]> Co-authored-by: eitsupi <[email protected]> Co-authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
- Loading branch information