-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
span_length is the wrong abstraction #694
Comments
Interesting idea. Couple thoughts:
|
I have to read it again with the fresh mind but even with this version the source should guarantee that whole frame (samples for all channels) is received, otherwise it invites some complicated logic again. |
// This may need some optimizations, but I thought of having a struct that includes all the stream parameters that can be queried from a source. Besides our existing ones lie channel count, and sample rate that struct can include channel mapping (say channel 4 is "rear left"). |
They currently already may, but they practically do not change in between frames.
It makes no sense to change sample-rate or channel count mid frame. I think that is a fine demand to add to the source API. relatedDo we demand sources only emit whole frames? Its a balance,
|
my idea was to use the boolean as a flag. The consumer switches it back as soon as it reads it. There is no risk of race conditions since its all serial anyway. The consumer further more does not need to know if it missed a change, it just needs to know one or more have happened. |
@dvdsk What you mean by "stricter" there? Usually, better defined APIs are easier to integrate with. Did you mean those in-frame changes? I think I like the idea, as long as that flag reading is cheap. And since it is all in the same thread, no additional sync is needed. I thought of a scenario when the source is handed over somewhere else, but it looks like a boolean should still be sufficient there. I would have read and reset of that boolean in the same function to avoid situation when a sink forgets to clear it. But I cannot think of a good name for it, since normally names like |
I don't understand the first point around "it opens up a new source of bugs"? Can you clarify please? In my mind adding a strict enforced requirement that sources can only emit full frames removed the class of bugs that are being solved at the moment where channels are switching around?
I think we have just scratched the surface of the issues of channels switching/frames becoming mixed up. (e.g. it also needs solving in the resampler #252) IMO enforcing only emitting full frames would remove a class of bugs along with making the code easier to maintain by always being aware of which channel/frame you are at. Perhaps that is me being naive but that is my understanding. |
I do not count on end users to remember every detail of the Source documentation, a mistake is easily made. So any requirement not encoded by the type system is easily broken without being noticed. Especially since the result of breaking this requirement will emerge far away from where it was broken (for example in ChannelVolume).
I can tell you the pile of issues related to span length is huge, take a look at the new queue code in that PR... Its scares me how many special cases need to be checked. That code is not refactored and probably wont be merged since switching to
No your right there, its just me worrying about users messing up that requirement, I would like to see how hard it is to take care of it for them. |
/// Sorry, I thought the previous comment was lost so reposted it. @dvdsk What do you mean by "stricter"? Avoiding those in-frame changes? The problem is that I would agree to relax rules for sources if the complexity would be only on the source side. Say, it would be a lot harder to emit whole frames than to check for changes on the consuming side. But to me it seems that this would simplify a little on one side and make it a lot more complicated on the other. Queue is not the only consumer we or users may have. All the filters/sinks would have to handle it. Do we have input sources/decoders that may be difficult to implement with whole-frame requirement? I think I like the idea of having a flag for format changes. Since this check is in the same thread, it should be cheap. I would consider the getter to reset the flag immediately, with a method like |
The current API is stricter then the proposed because the current API guarantees you no changes in parameters for the next N frames. The proposed API makes no such guarantee and only lets you know if the parameters just changed. Missing that time guarantee its less strict in my eyes. You could build the new API on top of the old. You can not however guarantee the parameters will not change for N-samples knowing only if they just changed. (well you could, you could buffer N samples we will probably do that in the hifi resampler. Thats not a problem though, it needs a buffer anyway.) |
I like that idea! |
it was not :) |
Well, yes usually it is a good idea to encode important constraints in a type system, but with this one can go only so far. The stricter it is the more complicated it becomes. In this case, I imagine, a good compromise could be is sending slices, one for frame, but then each channel count will require a different type... Maybe we can have separate types for aligned or unaligned sources. I think it is possible to have a wrapper that aligns frames. It can detect incomplete frame on format change and fill the remainder with some kind of default (like frame average or most recent known values). It could also emit a tracing event for diagnostics. An alternative implementation could collect and pass whole frames and discard incomplete ones. With this filter the alignment logic can be shared. |
Reading the comments above I think you guys may already be aware, but just to make it explicit: we should not rely on sources providing full frames. Frames could be variable length, contain malformed data to skip (if not assumed as silence or zero-hold), or be shorter at the end of a track. For robustness, I advocate to not make any assumptions about it. The Symphonia decoders deal with this by allocating a full-frame buffer, but only returning a slice with the samples successfully decoded ( Working with frames instead of iterators can be a useful optimization, particularly to resample. Now for a wild idea, we could offer choice:
...all of them consuming from the same buffer maintained by the source. Another wild idea to work with frames is to have an |
I think you meant arrays? slices are the same type but then we get into borrow hell.
It might be worthwhile to define what we are talking about, you seem to be talking about external sources? The discussion above is about the rodio We have to ensure frames are complete somewhere given the OS needs a constant stream of interleaved samples. We can do that at the border between external sources and rodio's What seems undefined is if a source may end mid frame. It would be more consistent to disallow that (note I flipped on this, I was in favor of allowing mid frame ending). Though disallowing that makes extending rodio harder, that could be solved with some sort of convertor along the lines of your suggestion in #678. |
Yes, maybe I was on a different train of thought. If so just tell me to take it elsewhere 😉
Does Rubato require it or the linear resampler too? With Rubato I noticed that you opted for
When decoding there's simply no guarantee that the frame will be as full as indicated or expected. One reason indeed is ending mid-frame at the end of the stream (particularly when trimming in gapless mode). Another is malformed data. If Rodio makes strict assumptions about that today, then this could already be an issue today. Of course most sources are well-formed, and frames can be zero-padded. So while it may seldom rear its ugly head, for robustness I would advocate handling such cases. |
they both do, but its not really a problem. rubato ideally gets a large bunch of frames anyway so we will have to buffer/chunk it anyway.
We'll see what fits best, I'm leaving that PR until we get the major work done (saves me a lot of rebase effort).
I agree, handling such cases is the best way forward. Zero padding incurs very little overhead. But are we sure the decoders do not do this for us already? |
When decoding audio you get chunks with constant sample rate and channel count. It seemed natural to extend that into rodio's audio pipeline. The consumer of audio gets the guarantee that the sample rate and channel count will not change for the next n-samples. The consumer can then deal with that like this:
This seems very efficient right? It is however no different then:
Especially if we let the compiler know that the parameters usually do not change using:
Currently at least skipping, seeking and queue are non-trivial in rodio since we may not break our promised span_len. That span len is however never used except to create the boolean
parameters_changed
via a comparison to an increasing index. Replacingcurrent_span_len
byparameters_changed
makes rodio code far simpeler. For example seek:All the complex code in queue can just be removed, it can simply forward parameters_changed until a source ends: then it just switches it to true. Right now it is has to play silence to fill the current span if a source ends early. So this change will not only make the code far far simpler it will improve audio latency!
Regarding breaking the Source API, an argument along the line of #690 (comment) can be made. This would also replace #690.
I think this is a wonderful fix/improvement. I might be wrong. Looking forward to your input.
The text was updated successfully, but these errors were encountered: