You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I often need to produce histograms where the x axis uses a date scale, typically binned by day, week, or month. The only sensible end result is where the scale's breaks align with the bins, but the existing methods I'm aware of for getting there are a bit fragile:
ggplot(df, aes(date)) +
geom_bar() +
scale_x_binned(
transform=scales::transform_date(),
breaks=scales::breaks_width("1 week")
)
#> Warning in scale_x_binned(transform = scales::transform_date(), breaks = scales::breaks_width("1 week")): Ignoring `n.breaks`. Use a breaks function that supports setting number of#> breaks.
Nice because the binning is specified only once, but now the whole scale is binned, so I can't for example add a geom_vline() to mark a specific date on the axis, since the vertical line would then be snapped into a bin by the scale transform.
The naive approach leaves the scale breaks and the bins unaligned (offset by 0.5 days here). Of course this can be improved by specifying a bin boundary or manually passing breaks but this gets a bit fiddly and fragile.
which is the result I want. However, there's duplication of the breaks and transforms between the scale and the stat. Ideally I'd like a way to request stat_bin() to just use the scale's breaks.
It's technically possible, since StatBin::compute_group (where the bins are computed) already has access to the scale object, but I'm not sure if it violates any sort of ggplot API encapsulation principles to have the scale directly affecting the stat's output in the way I'm proposing.
The same situation applies for stat_bin_2d() and stat_summary_bin(). I'd be happy to open a PR if there's agreement about the idea. I'm imagining either new value/s for breaks or a new param mutually exclusive with breaks that lets users choose to use the corresponding scale's major or minor breaks for the stat's binning breaks.
The text was updated successfully, but these errors were encountered:
On the one hand, I like the idea. On the second hand, I don't think it can be implemented cleanly.
The issue is that scales recompute their ranges, which form the basis for the breaks, in between when the stats are calculated and when the graphics are drawn. It means that another layer can invalidate the breaks that are used for binning, and the scale ends up displaying different breaks. However, this should not be an issue if fixed breaks are used.
To demonstrate the principle, we can make a quick and dirty extension that takes breaks from the scale. We see that it doesn't really work well because the computed bins are less wide than the full data and the final breaks end up different than the intermediate breaks.
I'm not sure if it violates any sort of ggplot API encapsulation principles to have the scale directly affecting the stat's output in the way I'm proposing.
I think the principle ggplot2 tries to adhere to is that scales and layers only communicate through the data and not directly with oneanother. On a personal level, I think it is fine to read out scale settings at the Stat$$compute_group() stage, but not fine to write scale settings. Pre-computing breaks and setting these at the scale's breaks should not happen.
I often need to produce histograms where the x axis uses a date scale, typically binned by day, week, or month. The only sensible end result is where the scale's breaks align with the bins, but the existing methods I'm aware of for getting there are a bit fragile:
Use
geom_bar()
and a binned scaleNice because the binning is specified only once, but now the whole scale is binned, so I can't for example add a
geom_vline()
to mark a specific date on the axis, since the vertical line would then be snapped into a bin by the scale transform.Use
stat_bin()
The naive approach leaves the scale breaks and the bins unaligned (offset by 0.5 days here). Of course this can be improved by specifying a bin
boundary
or manually passingbreaks
but this gets a bit fiddly and fragile.Since #5963 there's a better workaround:
Created on 2024-10-25 with reprex v2.1.1
which is the result I want. However, there's duplication of the breaks and transforms between the scale and the stat. Ideally I'd like a way to request
stat_bin()
to just use the scale's breaks.It's technically possible, since
StatBin::compute_group
(where the bins are computed) already has access to the scale object, but I'm not sure if it violates any sort of ggplot API encapsulation principles to have the scale directly affecting the stat's output in the way I'm proposing.The same situation applies for
stat_bin_2d()
andstat_summary_bin()
. I'd be happy to open a PR if there's agreement about the idea. I'm imagining either new value/s forbreaks
or a new param mutually exclusive withbreaks
that lets users choose to use the corresponding scale's major or minor breaks for the stat's binning breaks.The text was updated successfully, but these errors were encountered: