Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None Native Media Codec Stream Support #274

Open
amerghazal7 opened this issue Aug 23, 2023 · 4 comments
Open

None Native Media Codec Stream Support #274

amerghazal7 opened this issue Aug 23, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@amerghazal7
Copy link

Hello,
I was wondering If we can add the ability to stream different media types not just the native one that the camera provides, as you are relying on the gstreamer isn\t that possible? Like for example I can take any input format that my camera support, and output stream in any codec like h264.
If that's possible but not implemented, I would really appreciate a guide so I can add it on my own.

Thanks

@joaoantoniocardoso
Copy link
Collaborator

joaoantoniocardoso commented Aug 23, 2023

Welcome back @amerghazal7! First, thanks for the suggestion, and secondly, it is definitely possible, we just need to decide on a good path to do so.

Pipeline

Before anything, let's see how it is currently working:

When a user creates a UDP Sink from a v4l2 USB camera, a "v4l pipeline" is created with a Tee in its end, then, two sink pipelines are sequentially connected to the Tee using queues: the "UDP Sink" and the "Image Sink". After that, whenever a WebRTC client requests a stream, a new "WebRTC Sink" is also connected to the Tee, ending with a pipeline as in the following image for 1 WebRTC client:

image

Now, If we don't want to leave the original format (from the v4l source) available for everything, a simple version could be implemented as so: we add three optional elements (in red) to do the transcoding if the transcoding is enabled, exposing three new fields to the user to configure: decoder, encoder, and the final format, as shown in the image below:

image

If more performance is needed, we could get rid of some heavy elements by adding other Tee elements in strategic places, however, the complexity of managing it can grow considerably:

image

API

I prefer to let the user choose the specific decoder and encoder elements he wants to use because I don't want to let the GStreamer arbitrarily choose (using the videodecoder / videoencoder elements). For this, we can use GStreamer to list all possible decoders and decoders to do the transcoding from the original format (the v4l format) to the desired format (the sink format). Also, we can expose all the encoder/decoder properties so we don't need to manually add support for each possible encoder/decoder. Optionally, these properties could be exposed as Video/Stream Controls so could be changed in runtime.


@amerghazal7 and @patrickelectric Let me know what you think about it, especially the API.

Thanks

@amerghazal7
Copy link
Author

@joaoantoniocardoso First of all, thank you very much for this in-depth details and explanation.
I was trying to track your piplines and create them using the Gstreamer CLI (due to my poor rust skills) so I can figure out where should I insert the decoder/encoder components, and I came to the same conclusion you showed in second plot and I was able to output h.264 UDP stream from my MJPEG input format!.
Now, as you mentioned in the third plot for more performance I can see how it became more complex to managing it but it's a trade off and I may see it necessary to mentain a good performance.
For the API, I strongly agree with what you proposed because this way it becomes more generic and user friendly and HW restrictions free!

As I am a rookie in Rust, I'll work on getting to learn the code better first, then I'll keep up with the updates here and jump in to contribute with the components I'll feel I can be helpful doing it.

@joaoantoniocardoso
Copy link
Collaborator

@joaoantoniocardoso First of all, thank you very much for this in-depth details and explanation. I was trying to track your piplines and create them using the Gstreamer CLI (due to my poor rust skills) so I can figure out where should I insert the decoder/encoder components, and I came to the same conclusion you showed in second plot and I was able to output h.264 UDP stream from my MJPEG input format!. Now, as you mentioned in the third plot for more performance I can see how it became more complex to managing it but it's a trade off and I may see it necessary to mentain a good performance. For the API, I strongly agree with what you proposed because this way it becomes more generic and user friendly and HW restrictions free!

As I am a rookie in Rust, I'll work on getting to learn the code better first, then I'll keep up with the updates here and jump in to contribute with the components I'll feel I can be helpful doing it.

Great, as soon as @patrickelectric gives his oppninon on the API, I'll write where the important parts lie in the code, like a quick draft as a starting point, so you don't need to understand the whole project to contribute.

I'll let it clear that there's no rush from our side, but I'd also like to say that you don't need to be a pro at Rust, you can create a PR with a lot of todo!()s, and then we figure out how to go forward together.

@patrickelectric
Copy link
Member

patrickelectric commented Aug 24, 2023

Sorry for taking that long to reply @joaoantoniocardoso about the software architecture. Btw, thanks for presenting https://excalidraw.com.

After thinking more and doing a discussion about the matter, I was happy with a solution that brings the best of the current approach and the first option provided here (I call it the third option). The second option, where there is a tee/queue between decoder/encoder for the imagesink appears to be a bit of over-engineering for performance. Codewise.. it's trick.. the trade-offs appears to be not worth compared with the second option.

Now, I present the the pipeline for the third option:

image

The problem with the second option compared with current code is the latency and resources that'll be used. Requiring decoding/encoding on the source.

My suggestion is the creation of a smart component, or function block, where we can create a different pipeline based on user input.
Lets call this configuration input block as SourceConfiguration:

/// `A` and `M` definition are trivial and left as an exercise for the reader
pub enum SourceConfiguration<A, M> {
    Classic(String),
    AutoTranscoding(A),
    ManualTranscoding(M),
}

...
/// We could create a more gst like element that can be configured, but we don't have time for that, a free function should do the job
fn create_source_configuration_block(configuration: SourceConfiguration) -> Result<GstElement, Error> {
    match configuration {
        SourceConfiguration::Classic(encoding) => {
            todo!("Return a simple filtercaps")
        },
        SourceConfiguration::AutoTranscoding(configuration) => {
            ///User should have control with capsfilter and also with the possibility to change gst element ranking
            ///of the elements used by transcodebin
            todo!("Return a transcodebin with necessary configuration with capsfilter to work")
        },
        SourceConfiguration::ManualTranscoding(configuration) => {
            todo!("Return a decoder->encoder->capsfilter that the user chooses to use")
        },
    }
}

As the code points:

  • The Classic will contains the encoding that is natively supported on the camera.
  • AutoTranscoding will uses the magic transcoding element to do the job, the element should use decodebin3 or similar, where it also should be a passthrough for native encodes that the camera supports, but I believe that @joaoantoniocardoso don't thrust this elements enough to do the job where the Classic approach should do.
  • ManualTranscoding gives the user full power to do what he wants, having access to all encoders and decoders and custom configuration for each elements in the list, that'll make the API a bit more complex, but we can encapsulate everything, not affecting the project in general.

With this approach we have the possibility to have the lowest latency possible from the Classic approach and custom or simpler transcoding, where the user can choose what he wants to use and how.

One important thing is to make the API clear when providing all this information, and help the frontend developers to create the interface empower the users to deal with such custom configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants