Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

span_length is the wrong abstraction #694

Open
dvdsk opened this issue Jan 26, 2025 · 17 comments
Open

span_length is the wrong abstraction #694

dvdsk opened this issue Jan 26, 2025 · 17 comments
Labels
breaking Proposed change that would break the public API enhancement

Comments

@dvdsk
Copy link
Collaborator

dvdsk commented Jan 26, 2025

When decoding audio you get chunks with constant sample rate and channel count. It seemed natural to extend that into rodio's audio pipeline. The consumer of audio gets the guarantee that the sample rate and channel count will not change for the next n-samples. The consumer can then deal with that like this:

let mut resampler = Resampler::new();
loop {
  resampler.set_sample_rate(source.sample_rate());
  for _ in 0..source.current_span_len() {
    resample.resample(source.next()?);
  }
}

This seems very efficient right? It is however no different then:

let mut resampler = Resampler::new();
loop {
  resampler.set_sample_rate(source.sample_rate());
  while !source.parameters_changed() {
    resample.resample(source.next()?);
  }
}

Especially if we let the compiler know that the parameters usually do not change using:

impl Source {
  #[cold]
  fn parameters_changed() -> bool;

  ..
}

Currently at least skipping, seeking and queue are non-trivial in rodio since we may not break our promised span_len. That span len is however never used except to create the boolean parameters_changed via a comparison to an increasing index. Replacing current_span_len by parameters_changed makes rodio code far simpeler. For example seek:

impl Source {
  fn seek(&mut self, some_params: some_type) {
    self.parameters_changed = true
    // and some code to guarantee frame alignment
   //  but nothing for span alignment since spans do not exist anymore!
    ..
  }

  fn parameters_changed() -> bool {
    self.parameters_changed
  }
}

All the complex code in queue can just be removed, it can simply forward parameters_changed until a source ends: then it just switches it to true. Right now it is has to play silence to fill the current span if a source ends early. So this change will not only make the code far far simpler it will improve audio latency!

Regarding breaking the Source API, an argument along the line of #690 (comment) can be made. This would also replace #690.

I think this is a wonderful fix/improvement. I might be wrong. Looking forward to your input.

@dvdsk dvdsk added breaking Proposed change that would break the public API enhancement labels Jan 26, 2025
@will3942
Copy link
Contributor

Interesting idea.

Couple thoughts:

  • When would you set parameters_changed back to false? On the Sample after the Sample that changed parameters?
  • ChannelVolume / ChannelCountConverter / UniformSourceIterator may be hard to implement if there is no guarantee the parameters will not change between each 0..num_channels so perhaps the parameters should not change until all Samples have been emitted for the current num_channels (e.g. for a 6 channel source the parameters can only change every 6 channels). Or maybe we should move to typed streams like @PetrGlad suggested to avoid this (Fix(484) - ChannelVolume misbehaving #493 (comment))

@PetrGlad
Copy link
Collaborator

I have to read it again with the fresh mind but even with this version the source should guarantee that whole frame (samples for all channels) is received, otherwise it invites some complicated logic again.
Instead of boolean I'd consider a revision counter that is increased every time parameters change, this is usually my favorite in such situations. Or if it is a boolean it should be be reset every time target sink sees it's value.
Even if spans are not visible, I guess the our decoder wrappers will still need to handle a lot of cases?

@PetrGlad
Copy link
Collaborator

// This may need some optimizations, but I thought of having a struct that includes all the stream parameters that can be queried from a source. Besides our existing ones lie channel count, and sample rate that struct can include channel mapping (say channel 4 is "rear left").

@dvdsk
Copy link
Collaborator Author

dvdsk commented Jan 27, 2025

ChannelVolume / ChannelCountConverter / UniformSourceIterator may be hard to implement if there is no guarantee the parameters will not change between each 0..num_channels

They currently already may, but they practically do not change in between frames.

perhaps the parameters should not change until all Samples have been emitted for the current num_channels (e.g. for a 6 channel source the parameters can only change every 6 channels).

It makes no sense to change sample-rate or channel count mid frame. I think that is a fine demand to add to the source API.

related

Do we demand sources only emit whole frames?

Its a balance,

  • if we do that the API becomes stricter, adding new sources will be harder. It also opens up a new source of bugs. emitting a half frame at the end will be noticeable through subtle bugs in other parts of rodio (channels switching around).
  • Leaving the API as is puts all the work on us, though I think only queue needs to handle this right? And it is relatively doable.

@dvdsk
Copy link
Collaborator Author

dvdsk commented Jan 27, 2025

Instead of boolean I'd consider a revision counter that is increased every time parameters change, this is usually my favorite in such situations. Or if it is a boolean it should be be reset every time target sink sees it's value.

my idea was to use the boolean as a flag. The consumer switches it back as soon as it reads it. There is no risk of race conditions since its all serial anyway. The consumer further more does not need to know if it missed a change, it just needs to know one or more have happened.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 28, 2025

@dvdsk What you mean by "stricter" there? Usually, better defined APIs are easier to integrate with. Did you mean those in-frame changes?

I think I like the idea, as long as that flag reading is cheap. And since it is all in the same thread, no additional sync is needed.

I thought of a scenario when the source is handed over somewhere else, but it looks like a boolean should still be sufficient there.

I would have read and reset of that boolean in the same function to avoid situation when a sink forgets to clear it. But I cannot think of a good name for it, since normally names like get_format_changed() imply that it is read only. It could be take_format_has_changed() but that looks verbose.

@will3942
Copy link
Contributor

Do we demand sources only emit whole frames?

Its a balance,

  • if we do that the API becomes stricter, adding new sources will be harder. It also opens up a new source of bugs. emitting a half frame at the end will be noticeable through subtle bugs in other parts of rodio (channels switching around).
  • Leaving the API as is puts all the work on us, though I think only queue needs to handle this right? And it is relatively doable.

I don't understand the first point around "it opens up a new source of bugs"? Can you clarify please? In my mind adding a strict enforced requirement that sources can only emit full frames removed the class of bugs that are being solved at the moment where channels are switching around?

I think only queue needs to handle this right?

I think we have just scratched the surface of the issues of channels switching/frames becoming mixed up. (e.g. it also needs solving in the resampler #252)

IMO enforcing only emitting full frames would remove a class of bugs along with making the code easier to maintain by always being aware of which channel/frame you are at. Perhaps that is me being naive but that is my understanding.

@dvdsk
Copy link
Collaborator Author

dvdsk commented Jan 28, 2025

In my mind adding a strict enforced requirement that sources can only emit full frames removed the class of bugs that are being solved at the moment where channels are switching around?

I do not count on end users to remember every detail of the Source documentation, a mistake is easily made. So any requirement not encoded by the type system is easily broken without being noticed. Especially since the result of breaking this requirement will emerge far away from where it was broken (for example in ChannelVolume).

I think we have just scratched the surface of the issues of channels switching/frames becoming mixed up. (e.g. it also needs solving in the resampler #252)

I can tell you the pile of issues related to span length is huge, take a look at the new queue code in that PR... Its scares me how many special cases need to be checked. That code is not refactored and probably wont be merged since switching to span_changed will remove all those edge cases.

IMO enforcing only emitting full frames would remove a class of bugs along with making the code easier to maintain by always being aware of which channel/frame you are at. Perhaps that is me being naive but that is my understanding.

No your right there, its just me worrying about users messing up that requirement, I would like to see how hard it is to take care of it for them.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 28, 2025

/// Sorry, I thought the previous comment was lost so reposted it.

@dvdsk What do you mean by "stricter"? Avoiding those in-frame changes? The problem is that I would agree to relax rules for sources if the complexity would be only on the source side. Say, it would be a lot harder to emit whole frames than to check for changes on the consuming side. But to me it seems that this would simplify a little on one side and make it a lot more complicated on the other. Queue is not the only consumer we or users may have. All the filters/sinks would have to handle it. Do we have input sources/decoders that may be difficult to implement with whole-frame requirement?

I think I like the idea of having a flag for format changes. Since this check is in the same thread, it should be cheap.
I thought of scenarios when a source is handed over somewhere else, but it looks like just a boolean is sufficient also in that case.

I would consider the getter to reset the flag immediately, with a method like take_format_has_changed(), or poll_format_has_changed() ("get" normally implies that it is read-only), so the name looks like it is reading from a message queue.

@dvdsk
Copy link
Collaborator Author

dvdsk commented Jan 28, 2025

@dvdsk What you mean by "stricter" there? Usually, better defined APIs are easier to integrate with.

The current API is stricter then the proposed because the current API guarantees you no changes in parameters for the next N frames. The proposed API makes no such guarantee and only lets you know if the parameters just changed. Missing that time guarantee its less strict in my eyes. You could build the new API on top of the old. You can not however guarantee the parameters will not change for N-samples knowing only if they just changed.

(well you could, you could buffer N samples we will probably do that in the hifi resampler. Thats not a problem though, it needs a buffer anyway.)

@dvdsk
Copy link
Collaborator Author

dvdsk commented Jan 28, 2025

I would have read and reset of that boolean in the same function to avoid situation when a sink forgets to clear it.

take_format_has_changed()

I like that idea!

@dvdsk
Copy link
Collaborator Author

dvdsk commented Jan 28, 2025

/// Sorry, I thought the previous comment was lost..

it was not :)

@PetrGlad
Copy link
Collaborator

PetrGlad commented Feb 1, 2025

So any requirement not encoded by the type system is easily broken without being noticed. Especially since the result of breaking this requirement will emerge far away from where it was broken (for example in ChannelVolume).

Well, yes usually it is a good idea to encode important constraints in a type system, but with this one can go only so far. The stricter it is the more complicated it becomes. In this case, I imagine, a good compromise could be is sending slices, one for frame, but then each channel count will require a different type... Maybe we can have separate types for aligned or unaligned sources. I think it is possible to have a wrapper that aligns frames. It can detect incomplete frame on format change and fill the remainder with some kind of default (like frame average or most recent known values). It could also emit a tracing event for diagnostics. An alternative implementation could collect and pass whole frames and discard incomplete ones. With this filter the alignment logic can be shared.

@roderickvd
Copy link
Contributor

Reading the comments above I think you guys may already be aware, but just to make it explicit: we should not rely on sources providing full frames. Frames could be variable length, contain malformed data to skip (if not assumed as silence or zero-hold), or be shorter at the end of a track. For robustness, I advocate to not make any assumptions about it.

The Symphonia decoders deal with this by allocating a full-frame buffer, but only returning a slice with the samples successfully decoded (buffer[..n_written]).

Working with frames instead of iterators can be a useful optimization, particularly to resample.

Now for a wild idea, we could offer choice:

  • work with iterators like today (and suggest to check for audio specification changes while iterating);
  • "take" the next sample one-by-one;
  • read the next buffer up to a number of bytes (like Read::read).

...all of them consuming from the same buffer maintained by the source.

Another wild idea to work with frames is to have an enum with variants like FullSize and Deviant. I have no idea this would actually be useful, but just to throw it out there.

@dvdsk
Copy link
Collaborator Author

dvdsk commented Feb 4, 2025

a good compromise could be is sending slices, one for frame, but then each channel count will require a different type

I think you meant arrays? slices are the same type but then we get into borrow hell.

Reading the comments above I think you guys may already be aware, but just to make it explicit: we should not rely on sources providing full frames.

It might be worthwhile to define what we are talking about, you seem to be talking about external sources? The discussion above is about the rodio Source API contract. There is a balance between a strict contract that makes it easy rodio parts to communicate and fit together and a looser one that makes it easy to add new external sources by implementing Source for them.

We have to ensure frames are complete somewhere given the OS needs a constant stream of interleaved samples. We can do that at the border between external sources and rodio's Source and uphold that within rodio or we can do it at the edge to the OS. I think its easier and more performant to do it at the incoming edge then within rodio. That is also the current behavior, I am strongly against changing that.

What seems undefined is if a source may end mid frame. It would be more consistent to disallow that (note I flipped on this, I was in favor of allowing mid frame ending). Though disallowing that makes extending rodio harder, that could be solved with some sort of convertor along the lines of your suggestion in #678.

@roderickvd
Copy link
Contributor

It might be worthwhile to define what we are talking about, you seem to be talking about external sources? The discussion above is about the rodio Source API contract. There is a balance between a strict contract that makes it easy rodio parts to communicate and fit together and a looser one that makes it easy to add new external sources by implementing Source for them.

Yes, maybe I was on a different train of thought. If so just tell me to take it elsewhere 😉

We have to ensure frames are complete somewhere given the OS needs a constant stream of interleaved samples. We can do that at the border between external sources and rodio's Source and uphold that within rodio or we can do it at the edge to the OS. I think its easier and more performant to do it at the incoming edge then within rodio. That is also the current behavior, I am strongly against changing that.

Does Rubato require it or the linear resampler too? With Rubato I noticed that you opted for SincFixedIn and I was wondering why not change to SincFixedOut to alleviate that problem.

What seems undefined is if a source may end mid frame. It would be more consistent to disallow that (note I flipped on this, I was in favor of allowing mid frame ending). Though disallowing that makes extending rodio harder, that could be solved with some sort of convertor along the lines of your suggestion in #678.

When decoding there's simply no guarantee that the frame will be as full as indicated or expected. One reason indeed is ending mid-frame at the end of the stream (particularly when trimming in gapless mode). Another is malformed data. If Rodio makes strict assumptions about that today, then this could already be an issue today. Of course most sources are well-formed, and frames can be zero-padded. So while it may seldom rear its ugly head, for robustness I would advocate handling such cases.

@dvdsk
Copy link
Collaborator Author

dvdsk commented Feb 5, 2025

Does Rubato require it or the linear resampler too?

they both do, but its not really a problem. rubato ideally gets a large bunch of frames anyway so we will have to buffer/chunk it anyway.

With Rubato I noticed that you opted for SincFixedIn and I was wondering why not change to SincFixedOut to alleviate that problem.

We'll see what fits best, I'm leaving that PR until we get the major work done (saves me a lot of rebase effort).

Another is malformed data. If Rodio makes strict assumptions about that today, then this could already be an issue today. Of course most sources are well-formed, and frames can be zero-padded. So while it may seldom rear its ugly head, for robustness I would advocate handling such cases.

I agree, handling such cases is the best way forward. Zero padding incurs very little overhead. But are we sure the decoders do not do this for us already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Proposed change that would break the public API enhancement
Projects
None yet
Development

No branches or pull requests

4 participants