-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text-based serializations / data exchange formats for fables? #353
Comments
😍 Glad the interface is working well for you!
This is something really important that I haven't added to library(fable)
#> Loading required package: fabletools
library(distributional)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x <- fable(
date = Sys.Date() + 1:10,
dist = dist_normal(1:10),
index = date,
response = "test",
distribution = dist
) %>%
mutate(parameters(dist))
#> Warning: The dimnames of the fable's distribution are missing and have been set
#> to match the response variables.
x
#> # A fable: 10 x 4 [1D]
#> date dist mu sigma
#> <date> <dist> <dbl> <dbl>
#> 1 2021-11-07 N(1, 1) 1 1
#> 2 2021-11-08 N(2, 1) 2 1
#> 3 2021-11-09 N(3, 1) 3 1
#> 4 2021-11-10 N(4, 1) 4 1
#> 5 2021-11-11 N(5, 1) 5 1
#> 6 2021-11-12 N(6, 1) 6 1
#> 7 2021-11-13 N(7, 1) 7 1
#> 8 2021-11-14 N(8, 1) 8 1
#> 9 2021-11-15 N(9, 1) 9 1
#> 10 2021-11-16 N(10, 1) 10 1
library(readr)
file <- tempfile(fileext = "csv")
write_csv(x, file)
read_csv(file) %>%
mutate(dist = dist_normal(mu, sigma)) %>%
as_fable(index = date, response = "test", distribution = dist)
#> Rows: 10 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): dist
#> dbl (2): mu, sigma
#> date (1): date
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: The dimnames of the fable's distribution are missing and have been set
#> to match the response variables.
#> # A fable: 10 x 4 [1D]
#> date dist mu sigma
#> <date> <dist> <dbl> <dbl>
#> 1 2021-11-07 N(1, 1) 1 1
#> 2 2021-11-08 N(2, 1) 2 1
#> 3 2021-11-09 N(3, 1) 3 1
#> 4 2021-11-10 N(4, 1) 4 1
#> 5 2021-11-11 N(5, 1) 5 1
#> 6 2021-11-12 N(6, 1) 6 1
#> 7 2021-11-13 N(7, 1) 7 1
#> 8 2021-11-14 N(8, 1) 8 1
#> 9 2021-11-15 N(9, 1) 9 1
#> 10 2021-11-16 N(10, 1) 10 1 Created on 2021-11-06 by the reprex package (v2.0.0) Having
I don't think we need specific read/write functionality for fable, but rather better support for reading/writing distributions. If you want to read/write a fable object specifically, I would use suggest using rds. |
That said, writing some distributions to character can be extra tricky when they may depend on user environments. |
@mitchelloharawild thanks for this! Yes, definitely agree these issues are specific to The library(fable)
#> Loading required package: fabletools
library(distributional)
library(dplyr, quietly = TRUE)
x <- fable(
date = Sys.Date() + 1:10,
dist = dist_normal(1:10),
index = date,
response = "test",
distribution = dist
) %>%
mutate(parameters(dist))
#> Warning: The dimnames of the fable's distribution are missing and have been set
#> to match the response variables.
## probably a better way to do this??
dist_class <- function(y) class(vctrs::vec_data(y)[[1]])[[1]]
x %>% mutate(dist_class = dist_class(dist))
#> # A fable: 10 x 5 [1D]
#> date dist mu sigma dist_class
#> <date> <dist> <dbl> <dbl> <chr>
#> 1 2021-11-06 N(1, 1) 1 1 dist_normal
#> 2 2021-11-07 N(2, 1) 2 1 dist_normal
#> 3 2021-11-08 N(3, 1) 3 1 dist_normal
#> 4 2021-11-09 N(4, 1) 4 1 dist_normal
#> 5 2021-11-10 N(5, 1) 5 1 dist_normal
#> 6 2021-11-11 N(6, 1) 6 1 dist_normal
#> 7 2021-11-12 N(7, 1) 7 1 dist_normal
#> 8 2021-11-13 N(8, 1) 8 1 dist_normal
#> 9 2021-11-14 N(9, 1) 9 1 dist_normal
#> 10 2021-11-15 N(10, 1) 10 1 dist_normal Created on 2021-11-05 by the reprex package (v2.0.1) Like you say, we'd want to do something similar for transformed. Also it's still not clear to me if that would be sufficient, sounds like due to Even so, merely having parameters and distribution names is helpful to me in terms of serializing files that are more compatible with other formats, even if we cannot fully automate going back to the Just a note that |
Yes, <1 month new. Due for the next release soon. As a first pass at any rate, I would imagine similar methods to extract the other components of the distribution; i.e. Yes for distribution names. Probably no for transformations - I would consider the transformation functions as parameters of a transformed distribution. That said, data exchange of R functions is problematic - so the
The distribution suffix ('norm' in '(p/d/q/r)norm) is a parameter of the 'wrapped distribution'. So identifying the specific distribution here is actually easier. library(distributional)
dist <- dist_wrap("norm", mean = 1:10, sd = 1)
dist
#> <distribution[10]>
#> [1] norm(1, 1) norm(2, 1) norm(3, 1) norm(4, 1) norm(5, 1) norm(6, 1)
#> [7] norm(7, 1) norm(8, 1) norm(9, 1) norm(10, 1)
parameters(dist)
#> dist mean sd
#> 1 norm 1 1
#> 2 norm 2 1
#> 3 norm 3 1
#> 4 norm 4 1
#> 5 norm 5 1
#> 6 norm 6 1
#> 7 norm 7 1
#> 8 norm 8 1
#> 9 norm 9 1
#> 10 norm 10 1 Created on 2021-11-06 by the reprex package (v2.0.0) (requires latest dev of distributional, I added a custom method to remove the
Yes, also useful for tidyverts/fabletools#333 where you may want to run different code based on the distribution name.
Completely agree. I suggest |
Any suggested function names for obtaining the name of a distribution? Comparable to |
@mitchelloharawild good question, I agree |
I had also considered Here's a quick go at adding a library(distributional)
dist <- c(
dist_normal(1:2),
dist_poisson(3),
dist_multinomial(size = c(4, 3),
prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))
)
family(dist)
#> [1] "normal" "normal" "poisson" "multinomial" "multinomial" Created on 2021-11-06 by the reprex package (v2.0.0) |
I like this! I think it would cover most cases adequately. I wonder if |
Closing as serialisation of fables is mostly a question of the vctrs classes they contain. |
Hi @robjhyndman and fable team,
I have been using and teaching your Tidyverts approach to forecasting and wanted to say thanks for providing this amazingly well thought-out and implemented resource. In particular, I very much appreciate the support for distributional and grouped forecasts; both very important features for us that are rarely handled well in alternative approaches.
fable
makes very clever use of list-columns to provide a powerful abstraction of distribution. As long-time tidyverse user I completely appreciate this approach, but I did want to get your opinion on how a fable ought to be represented in a generic file-based format that would be most compatible with other tools and languages? Providing file-based serializations may make it easier to share and distribute forecasts, including generic forecasting competitions. (I am currently part of a team of folks hosting an "ecological forecasting challenge" described in https://projects.ecoforecast.org/neon4cast-docs/).The standard strategy of text-based files like
csv
(and some related standard serializations like parquet, netcdf, hdf5) seem a natural choice (happy to leave out grib, even if it is the standard way NOAA distributes forecast ensembles), but does not play nicely with list-columns. In our current iteration of the challenge, we have relied on participants submitting text-based serializations of distributional forecasts in which uncertainty is described either as mean + standard-deviation columns or as a set of ensemble draws. This is obviously not ideal -- one would like to express non-normal distributions without resorting to drawing an ensemble of samples from the distribution. Alternately one could define a convention such asfable
distribution print method to summarize a distribution as parable text, but that may be too fragile or unreliable.Given your experience with
fable
, I just wanted to see if you had any recommendations for file-based serializations of fables. In my ideal word, we would have methods likeread_fable()
/write_fable()
to serialize and parse files into and out of the fable format (which would also allow users to more easily leverage fable for comparing across forecasts, etc). Do you see a good way to go about this? Would you consider such functionality as being in scope forfable
?Thanks for considering!
The text was updated successfully, but these errors were encountered: