Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libass support #439

Open
rmtjokar opened this issue Nov 9, 2024 · 175 comments
Open

libass support #439

rmtjokar opened this issue Nov 9, 2024 · 175 comments

Comments

@rmtjokar
Copy link

rmtjokar commented Nov 9, 2024

Hi,
I'm using FFmpeginteropX for a long time, thank you for your great work.

It seems ShiftMediaProject updated all the libraries to latest one around 3 weeks ago.
I read in #384 that @lukasf said:

it is difficult to keep it updated

Since its updated can we at least use libass version and make it work in FFmpeginteropX without touching FFmpeg builds? I tried to compile ShiftMediaProject's libass version and all its dependencies and I managed to build them all but there is no winmd file in the output folder.

Can you please help me in this?

Thanks in advance.

@brabebhin
Copy link
Collaborator

Hi

May I ask why you need libass?

I don't think you can use libass easily without modifying ffmpeg builds because all of the addon libraries need to be linked into ffmpeg, otherwise ffmpeg wouldn't know it can use it.

You technically can use libass into ffmpeginteropx but that would require a code change that's not trivial.

@softworkz
Copy link
Collaborator

Since its updated can we at least use libass version and make it work in FFmpeginteropX without touching FFmpeg builds? I tried to compile ShiftMediaProject's libass version and all its dependencies and I managed to build them all but there is no winmd file in the output folder.

That's because ffmpeg is a C library and does not provide .net interfaces, which is what .wnmd files are for.

@softworkz
Copy link
Collaborator

I don't think you can use libass easily without modifying ffmpeg builds because all of the addon libraries need to be linked into ffmpeg, otherwise ffmpeg wouldn't know it can use it.

The av-libs built by Shift-Media-Project include all this.

You technically can use libass into ffmpeginteropx but that would require a code change that's not trivial

It would be rather easy, but there's a little caveat to that: you'll loose hardware acceleration, because the subtitle "burn-in" can only happen in a sw filter. Even "hw decoding to cpu mem" is pointless to do, because experience has shown that software decoding is less resource intesive than "hw-decode, hwdownload, hwupload" in most cases. The hwupload alone is killing already for high-res videos like 4k).

That little caveat is a KO for the idea, unfortunately.

@brabebhin
Copy link
Collaborator

brabebhin commented Nov 9, 2024

I might have misunderstood but i was under the impression that OP wanted libass from SMP with our own ffmpeg builds. Which is ofc not possible.

Filter burnin isn't the only possible way to work with libass. We can expose ssa/srt as image cues and render the images ourselves to feedv into the sub stream. Then the media sink will handle rendering/burn in/whatever.

@lukasf
Copy link
Member

lukasf commented Nov 10, 2024

Hi @rmtjokar, long time no see!

I think libass has recently added a meson build system. It should be pretty easy now to directly integrate it, without having to resort to a SMP fork. The SMP build system has some disadvantages for us, like, not all its libs do have UWP targets, and if they do, they usually don't have ARM targets. And their project files are horribly messy, which makes it hard to maintain a fork with added WinRT+ARM configs. So using SMP is always last resort for me.

As others have noted, the question is, what are you trying to accomplish with adding libass? You won't be able to use it without bigger changes in our lib. Subtitles are not normal streams, and our effects system does not work with them. We would need explicit code to post-process subs with libass (through ffmpeg filters).

A big problem is that libass does not support GPU rendering, and copying frames from GPU to CPU memory for rendering is very expensive (and afterwards they need to be copied back!). Which means that we cannot really use libass to directly render the subs into video frames. And the subtitle rendering system in windows is rather poorly implemented. The bitmap subtitle rendering is intended for static bitmap (text) frames, not animated live frames. I am pretty sure that it would be horribly jaggy if we'd try to feed animated subs into it. We had to use quite elaborate workarounds, only to get clean flicker-free static subs.

The only use I could currently see is for transcoding static (non-animated) ssa/ass subs into bitmap subtitles, using the libass rendering engine, which is sure better than the Windows rendering system. Downside is that we don't know which target size to render to, which might lead to more or less noticeable scaling artifacts. And I am not sure how flexible the libass filter in ffmpeg is - we would somehow have to disable animations.

Oh and keep in mind that bitmap subtitle rendering is still broken in WinUI. Or has this bug "already" been resolved? I guess not, but I did not check for a long time. Has anyone tried with a recent version @brabebhin @softworkz?

@brabebhin
Copy link
Collaborator

Hi @lukasf

I haven't heard anything on the MPE front, but I would guess the bug is still there as all the winui 3 effort seems to have gone into supporting AOT and designers. I have since developed my own MPE based on directx and frame server mode, as well as custom sub rendering with win2D.

But this shouldn't be a show stopper for us. We can use UWP as a benchmark, since both UWP and winUI use the same MF interfaces. As an ugly workaround for size rendering we could simply ask the user to provide a size for us to render against and have the user update it on resize.

@lukasf
Copy link
Member

lukasf commented Nov 10, 2024

Sure we could pass in a size, but at least when using the Windows rendering, it is not even clear at which exact position and size a sub is rendered. Of course this is not a problem when using custom subtitle rendering (which is probably the better approach anyways).

@brabebhin
Copy link
Collaborator

brabebhin commented Nov 10, 2024

For sub animations, custom rendering would be the only way to do it.

Windows rendering is just 50% arcane. The regions containing a cue will direct where subtitles are rendered. The region itself has coordinates that determine where it will be rendered on screen (these may be absolute positions or percentages). IIRC, images cues are always rendered in their own region and will have pretty much absolute positioning and size.

For text cues it gets more arcane because Windows groups them by regions, and then inside the region you have some sort of flow directions. For whatever ungodly reason they also use XAML composition for rendering, which I would guess is why we observe flickering.

The MF interfaces provide A LOT of customization and allows applications fine grained control over subtitles, but seems MPE chooses to only implement a few of the combinations. Which is why it seems arcane, as it only implements whatever suits them. It's almost as if the MF team had nothing to do with MPE.

At this point I think the conversation moves towards whether we want to also become a rendering library and not just demux+decoding. In the end I think creating our own MPE isn't that hard. If we do want to create our own MPE, we could completely forget about MF's way of doing subtitles and just render them directly inside FFmpegInteropX. We'd still have to support Windows rendering too.

@softworkz
Copy link
Collaborator

My Subtitle Fitlering patchset includes a text2graphicsub filter, which allows to convert text subtitles (including ASS) to graphical subtitles like dvd, dvb or x-subs, so all you need to do is to add some filters and at the end you get graphical subs like from any other file. It also has an option to strip animations.
I just haven't come to update the patchset to the latest FFmpeg version.

Yet, my general view on this though is this: Most of those who have ASS subtitles are expecting animations to work. For non-animated subtitles we are not 100% accurate but pretty close to libass. Starting any work to integrate libass without animation support is rather pointless as it won't make anybody happy. So either go for making libass fully work, including animations, or just leave it. IMO, putting effort into this is only justified when it allows to get it working in full effect.

@rmtjokar
Copy link
Author

Hi, sorry I had to go out of the city for the past week.
@lukasf Indeed, it's been a long time. I've been out of programming for quite some time.

@brabebhin Using libass is mainly for supporting ASS effects and animations. Its renderer is quite fast and smooth, and other player apps like PotPlayer, KMPlayer, and even MXPlayer on Android use this library. Maybe we can do the same.
I'm targeting x64 (only Xbox version), so it's UWP only.

@softworkz Thanks for the information.

As I saw in PotPlayer, there are two ways of showing subtitles: "Vector Text Renderer" and "Image Text Renderer," both of which have better quality than displaying text in a UWP app (using SubtitleCue or even Win2D Canvas with the same font/style). It's strange that they have better text rendering quality.

I was thinking maybe I can use libass.dll directly in my app (just giving the whole ASS text to libass for external subtitles only), so I created a wrapper around libass.dll x64 version with P/Invoke in a WPF app (because I couldn't with UWP), but I got stuck.

Another option I've been using for the past eight years is Win2D and its CanvasControl. I created a new renderer using ChatGPT, and it seems you can manipulate the fading effect via the color's alpha channel without touching anything else. (Take a look at this:)

Rec.0002.mp4

@rmtjokar
Copy link
Author

So I found a project which uses libass:
https://github.com/hozuki/assassin/

And I could've use it in uwp with Image control and this is the result:
UWP 1>
https://1drv.ms/v/s!AjZLOqQJPNFqfOb53jwXZS4RITw?e=DbdygW

Pot Player >
https://1drv.ms/v/s!AjZLOqQJPNFqfagCs3E4oyehgxo?e=x2NCca

UWP 2 with sound>
https://1drv.ms/v/s!AjZLOqQJPNFqf5OHo1X5i1OD_WA?e=bm7fle

It’s actually quite good. It’s not as smooth as PotPlayer, but it’s still very good.
The font family feature doesn’t seem to work at all, but it’s a good start.

@brabebhin
Copy link
Collaborator

brabebhin commented Nov 15, 2024

I think this ultimately comes down to whether we want to venture into the land of rendering subtitles. So far we've been strictly a demix+decoding library. MPE is quite limited when it comes to subtitles, and not much can be done about it.

I could devote some time to this once a decision is made.

@lukasf
Copy link
Member

lukasf commented Nov 18, 2024

@softworkz Your changeset is exactly what would be needed for rendering static subtitle frames libass. But I think we all agree that static subs this is not the intention here. If we'd add libass, it is for the animated subtitles.

I am not such a big fan of writing our own renderers. While it is very flexible, it makes it more difficult to use our lib. Currently, our output can just be put in a MPE and it all works. If some of our features require custom renderers, we would break with that concept, and require devs to migrate their apps from MPE to our custom rendering. Also, I think it is difficult to synchronize the subtitle renderer with the video renderer. Decoding is decoupled from rendering, and we do not get much information about the actual playback position. Text subs rendering is easier, since we get events when a new sub is to be shown.

It would be great if we could find a way to use libass and somehow integrate it in our decoding chain - without killing performance. I am trying to brainstorm in that direction.

Idea 1: ffmpeg recently added vulkan as a new cross-platform hw decoder type, and included a buch of filters which can run directly on vulkan frames. There is a filter called "hwmap", which can be used to map frames from one hw decoder type to a different hw decoder type (or a different gpu device of same type). It has an option to "derive" its output hw context from the device which was used in the input frames. It seems that if the underlying device on input and output hw context is same and compatible, then hwmap can directly map the frame, without copying. Setting the mode to "direct" can enforce this. If this would indeed work, then we could achieve hw accelerated gpu rendering: We could render the frames from libass onto a transparent hw texture and overlay this on the video using hwmap(vulkan) -> overlay_vulkan -> hwmap(d3d11). Of course, I don't even know if the hwmap stuff really works like that, and how easy it is to get a ffmpeg build with vulkan support. A downside of the approach is that we would be locked to the video output resolution.

Idea 2: We could create a second MediaSource, which just contains the rendered subtitle as a video stream with alpha channel. Users of the lib would have to add a second MPE layered above the first one, and link them using a MediaTimelineController. I never tried MediaTimelineController, so don't know how well it does actually. But at least theoretically, it should take care of the syncing. Frankly, this also requires quite some modification on app side.

Not sure if any of this makes sense, just trying to explore some alternative approaches...

@softworkz
Copy link
Collaborator

Here's the full tree of options from my point of view:

mindmap
  root((Subtitle<br>Overlay))
    Burn into video
      **B1**<br>hw decode<br>hw download<br>sw burn-in<br>hw upload
      **B2**<br>sw decode<br>sw burn-in<br>hw upload
      **B3**<br>sw render blank frame<br>hw upload<br>++++++++++<br>hw decode<br>hw overlay
      **B4**<br>sw render half-size frame<br>hw upload<br>hw upscale<br>++++++++++<br>hw decode<br>hw overlay
      **B5**<br>sw render partial sprites<br>hw upload<br>hw upscale<br>++++++++++<br>hw decode<br>hw overlay
    Presentation Layering
      **L1**<br>render full frames<br>Copy to D3D surface<br>overlay manually
      **L2**<br>render partial sprites<br>Copy tp D3D surfaces<br>overlay manually
Loading

Burn into video options

B1

This is the worst of all options, because you need to copy every single uncompressed frame from GPU to CPU memory and then again from CPU memory to GPU memory.
An uncompressed 4k HDR frame is about 120MB. With a framerate of 60fps, this means you have a bandwidth for memory transfers of 14.4 GB/s.

B2

The advantage of hw decoding is often over-estimated. Other than in case of encoding, video decoding can be easily done by CPUs as well. The big advantage of using hw decoding is that the large amounts of data never need to be copied between system and gpu memory, because gpu memory is always the eventual target. HW decoding followed by immediate downloading to cpu memory almost never makes sense.
(Though, this can turn quickly, when you're doing at least one more thing at the GPU side, like hw tone mapping, deinterlacing or hw downscaling. Then it's usually justified to start in hardware, and download for sw processing then.)

But for this case of doing sw burn-in of subtitles, B2 is significantly better than B1, because the memory transfer hits in much harder than the sw (instead of hw) decoding.

B3

This is another scenario which requires my subtitles patchset and which will go into our server transcoding process shortly (it's there already, just not unlocked for the public).
It works by rendering subtitles onto transparent frames, then doing hwupload to the same hw context as the video and using the hw overlay filter of that context to burn-in the subtitles into the video.

While it still involves uploading the overlay frames from cpu to gpu memory, like in case of B2, there's still a massive advantage: You don't need to upload at the same rate as the video fps:

  • Many times, there are no changes to the subtitle overlay content. libass tells you that, and in turn, you don't need to do hwupload - you can rather designate the previously uploaded frame to be re-used for overlaying
  • You can also choose to use a lower fps for the subtitles - for example it typically suffices to render ASS animations at 30fps, even when the video is 60 fps

Unfortunately, in context of FFmpegInteropX it's not straightforward to go that way, becauzse you cannot use this with the DX11VA decoders. Instead, you would need to use either the vendor-specific hw contexts and fitlers (like overlay_qsv or overlay_cuda).

AFAIK, Vulkan is not stable enough on Windows yet. At least it doesn't work reliably with MPV player, even though it supports it.

The one other (stable) option is to use OpenCL. There's also an overlay_opencl filter in ffmpeg, I'm just not sure whether you can hwmap from a d3d11va context to OpenCL. I know that d3d11va-to-opencl works for AMD, I know that it works from qsv-to-opencl and from cuda-to-opencl. But I'm not sure whether d3d11va-to-opencl works with Intel and Nvidia gpus.

B4

This is a low-profile variant of B3. By using only half-size frames for the subtitle overlay, you save 75% of the memory bandwidth. You woluldn't do that for 720p or lower-res videos, but for 4k, it's a good way to optimize for performance

B5

That would be the "holy grail": Instead of full frames, you would use one or more smaller surfaces, to cover only the regions with subtitle content. It's difficult to implement though, because you are typically working with a pool of D3D surfaces and that becomes difficult to manage when you have sizes which are changing dynamically. This would require modificationsn to the overlay filters in ffmpeg.

Presentation Layering options

Basic

The idea of not touching the video frames at all has a lot of appeal, as the nature of subtitles is significantly different from the actual video:

  • Subtitles are rarely covering the full video area and even when they do, it's just for short moments
  • Subtitles rarely changing at the same rate (fps) as the video and when they do, it's just for short moments and limited to certain areas

Means, while the case of full-screen+full-fps needs to be accounted for, but the implementation doesn't need to permanently create full-size overlays at the same rate as the video.

ffmpeg Side

I don't think it's required to have a totally separate ffmpg instance for subtitles. This would cause a lot of problems and a lot of work. ffmpeg can have more than a single video output, so for this case, one output would be the video frames as D3D surfaces and the rendered subtitles as software frames (L1) or a collection of multiple areas per frame (L2).

How these eventually get on screen is an open question though, at this point. Maybe it's feasible to expose a secondary MediaSource from the main session?

Another throught I had at some point is whether the 3d stereo capability of Windows.Media could be "misused" for displaying a secondary layer on top of the main video...

"Rendering"

It's not clear to me what have been referred to as "custom renderer". The renderer is always libass: You give it a range of memory, representing the video frame and then it renders the subtitles into that memory - pixel by pixel.

The "only" thing that's left to do is to bring this rendered image on the screen.

Canvas2d

I've never used it and don'T knoiw about it's abilities. Maybe it's worth to try out whether the images generated by libass can be copied onto the canvas, but it's not clear to me how to do the switch from one image to the next one at the right moment in time. That's rather the domain of a

SwapChain

Having a second wapchain on top of the video swapchain seems to be the most natural approach.
The big question though - for whch I don't know the answer: Is it even possible in WinUI3 to have a swapchain panel on top of another swapchain panel with transparency overlay and composition?

This would need to be found out, because if not, then the only way would be to create a XAML island hwnd window on a separate thread or any other non-Winui3 technique to render the subs in a win32 window on top of the video so that the DWM (desktop window manager) does the composiion (typically using gpu overlay).

L1, L2

Same as with B5, a perfect implementation would work with individual areas rather than full-size frames, but it also adds a lot of complication.

@brabebhin
Copy link
Collaborator

brabebhin commented Nov 19, 2024

Here's another idea.

If we keep this feature only when using DirectX decoders, then we can use compute shaders to burn in the image into the HW AVFrame before we send it to the MF pipeline. This will include an additional step which entitles a CPU->GPU memory copy of the subtitle, and another GPU->GPU operation. Might kill a frame or two.

Compute shaders should be available on all Windows devices that we target, since their feature level is a hard requirement for windows support. This is simialr to @softworkz 's L1 (nice graph btw), except it is done on our side.

@lukasf your first idea is technically possible. We can share memory between Vulkan and DirectX, there's something called VK_NV_external_memory in Vulkan which allows this kind of thing.

@softworkz
Copy link
Collaborator

If we keep this feature only when using DirectX decoders, then we can use compute shaders to burn in the image into the HW AVFrame before we send it to the MF pipeline. This will include an additional step which entitles a CPU->GPU memory copy of the subtitle, and another GPU->GPU operation.

Then you'll have to deal with all the HW formats that are being used for video frames on d3d surfaces when overlaying the subs.

Compute shaders should be available on all Windows devices that we target, since their feature level is a hard requirement for windows support. This is simialr to @softworkz 's L1 (nice graph btw), except it is done on our side.

It's similar to B3 because it would be applied into the video frame. L1/L2 means that video and subs remain separate blended only during presentation (like when you have one semitrasnparent window on top of another.

(nice graph btw),

It's not a bitmap. Use the three-dot menu and shoose edit to see it 😄

@lukasf your first idea is technically possible. We can share memory between Vulkan and DirectX, there's something called VK_NV_external_memory in Vulkan which allows this kind of thing.

ffmpeg has hw mapping to Vulkan currently only for VAAPI and CUDA.
OpenCL hw mapping is implemented for many hw contexts, including D3D11Va (as said, I just don't know whether it's working with all vendors).

@softworkz
Copy link
Collaborator

Compute shaders should be available on all Windows devices that we target, since their feature level is a hard requirement for windows support.

I'm not sure whether shaders are even needed for a trivial overlay.

@brabebhin
Copy link
Collaborator

The elephant mama in the room here is a GPU->CPU memory copy, which is what's going to kill performance no matter how you spin it.
The baby elephant is that most CPUs aren't capable of decoding 4K at acceptable performance, so that kind of endangers a software-only approach.

I guess we wouldn't need compute shaders, but the compute shader has the advantage that you sort of know when it will run.

For a custom MPE, we can stick to the same interface of the official MPE and only add new stuff, this will allow a drop-in replacement and ease adoption. We can use frame server mode to detect video position and render subtitles accordingly.

@softworkz
Copy link
Collaborator

The elephant mama in the room here is a GPU->CPU memory copy, which is what's going to kill performance no matter how you spin it.

Yes, it's this PLUS re-uploading again (B1). In case of B2, it's just one direction, so half of B1.

The baby elephant is that most CPUs aren't capable of decoding 4K at acceptable performance, so that kind of endangers a software-only approach.

Like I said above:

The advantage of hw decoding is often over-estimated. Other than in case of encoding, video decoding can be easily done by CPUs as well.

and this includes 4k videos. SW decoding alone is not an elephant (of any age ;-).

You can easily verify this by yourself. Just call ffmpeg like this:

ffmpeg -i "Your4kVideo.mkv" -f null -

Then you need to watch the "Speed" value in the output and also your CPU usage (because it's often not going to 100%). So for example, when you see 50% CPU usage and Speed of 6.0x, this means that your CPU is 12 times faster than needed for decoding the video in realtime (presentation at 1.0x).

@softworkz
Copy link
Collaborator

You might want to take a look at these two videos, demonstating dozens of ways for doing subtitle burn-in:

https://github.com/softworkz/SubtitleFilteringDemos/tree/master/TestRun1

@brabebhin
Copy link
Collaborator

Sure, a desktop CPU or a plugged in laptop CPU will deal with 4K just fine in software mode. However, as soon as you factor in mobile devices that are not always plugged in and older CPUs, things get complicated.
Even my i7 14th gen gaming laptop will not handle a 4K video smoothly on battery. It will eat through it plugged in though.

Ideally we should support both software and hardware anyways (we can skip over the system decoders as these are black boxes and only a MF filter will help us there).

@softworkz
Copy link
Collaborator

Even my i7 14th gen gaming laptop will not handle a 4K video smoothly on battery

Yes, but that's because of the memory transfer. I'm sure it will run the pure decoding (the ffmpeg command above) of 4k video comfortably above 1.0x speed, even on batteries.
It's the mem movement which is the tough part, that's the point I want to make.

Sure, a desktop CPU or a plugged in laptop CPU will deal with 4K just fine in software mode. However, as soon as you factor in mobile devices that are not always plugged in and older CPUs, things get complicated.

A laptop on batteries will hardly ever be used to drive a 4k display. Even full HD can be considered as a bit too much for a typical laptop screen. But still, FFmpegInteropX is moving around 4k frames when the source video is 4k., which is pretty bad, obviously.

But there exists another trick for those cases, which isn't even specific to subtile overlay but can generally improve performance in case of 4k playback when the output display isn't 4k anyway:

Above I said that there are no filtering capabilities available for the D3D11Va hw context, which applies to ffmpeg, but it's not the full truth. In fact there exists an API for video hw processing for D3D11Va and it's supported by all major GPU vendors (probably they wouldn't get certified for Windows without it). It's just that ffmpeg doesn't have an implementation for it.
Yet there exists an implementation from which you might be able to take inspiration for implementing a similar filter. I won't name or link it, because the license is incompatible to ours and anyway, there might be other implementations as well. You can just look up the corresponding class/inteface names in the Microsoft docs for D3D11Va video processors, and then do a GitHub search for implementations.

The two most important filtering capabilities that you get from this are hw deinterlacing and hw scaling, but let's forget about deinterlacing for now, and focus on scaling.
The scaling you get from it is quite special, because the computation cost is close to zero. It's implemented in fixed-function blocks (asics) on the GPU dies, (like hw decoders, encoders partially).

Having such a filter, would allow to optimize performance significantly in all cases where the source video resolution is larger than the presentation (or the max presentation) resolution. The detailed conditions need to be decided upon by every developer individually, but examples would be like:

  • scale down to be no larger than the largest display (in pixels)
  • scale down to not exceed the resolution of the display on which it is displayed currently
  • provide an option like "power saving" or "high efficiency", and scale down to half or 2/3 of the output display size
  • etc.

Like I mentioned above for B2: As soon as you are doing something with the data in hardware before downloading, the cost balance changes, which means: B1 with hw downscaling before hw downlaod becomes better than B2.

And even outside of the subtitles subject, this would massively improve performance and reduce energy consumption for 4k playback on non-4k screens.

@brabebhin
Copy link
Collaborator

It's the mem movement which is the tough part, that's the point I want to make.

And I completely agree. However, any real world scenario of decoding involves memory transfer at some point. And the CPU does bear some responsibility for it. Caching, memory controllers etc will all play a part in it.

A laptop on batteries will hardly ever be used to drive a 4k display. Even full HD can be considered as a bit too much for a typical laptop screen. But still, FFmpegInteropX is moving around 4k frames when the source video is 4k., which is pretty bad, obviously.

You can always find those ridiculously speced "business" laptops that rock a 4k display with iGPU that can barely handle windows animations smoothly at that resolution xD

We could technically implement that scaling optimization at our level. We know we can dynamically change the resolution of the video stream descriptors and MF will obey. However, wouldn't this downscaling already happen anyway? We are basically zero memory copy throughout all the decoding loops, MF will do the downscaling as it has to. I am not sure if we would actually win anything from this? It would just be us doing the down scaling instead of MF. I am speaking about the general implementation of this, not specifically for sub animations (that part is pretty clear).

@softworkz
Copy link
Collaborator

softworkz commented Nov 19, 2024

MF will do the downscaling as it has to

Media Foundation? How does that come into play?

AFAIU, FFmpegInteropX is decoding via ffmpeg using D3D11VA hw decoders and the output from ffmpeg are D3D surfaces. Each time, when the media player element fires its event, we give them one of the D3D surfaces.

Not right?

We could technically implement that scaling optimization at our level. We know we can dynamically change the resolution of the video stream descriptors and MF will obey. However, wouldn't this downscaling already happen anyway? We are basically zero memory copy throughout all the decoding loops,

Yes, that's a really good question. Using the hw scaling right after decoding has two advantages:

1. It reduces GPU memory consumption

There's always a pool of hw frames involved in decoding. The decoder needs to have a certain number of full-size frames to resolve references (forward and backward). These frames are a fixed requirement. The decoder doesn't produce exactly one frame right at the moment when it needs to be displayed. So there's another number of frames which are needed for queuing up between the decoder (between possible filters) and the final output of ffmpeg before they are actually provided for display.

And this second number of hw frames is where GPU memory is reduced when scaling down each frame immediately after it gets out of the decoder. Scaling down 4k to FHD reduces the amount of memory by 75%.

2. Fixed functoin block scaling: you can't get it any cheaper

Zero-copy sounds great, because copying is expensive, but what's even more expensive is scaling. When you supply the D3D surfaces to the media player element for display, and those are 4k while the display is just 1920, these surfaces need to be downscaled to the exact size of the element's panel. And who does perform that scaling => the gpu.
Downscaling 60 4k surfaces per second to roughly the half size, is not a small thing. For most recent gpus, still rather moderate, but for those mobile gpu versions found in laptops, it's quite a bit of effort to do that kind of scaling and you'll easily see 20-80% (or more) GPU load - for the scaling and the overlay (desktop composition).
That gpu percentage which you see in task manager doesn't include decoding. And it doesn't include fixed-function scaling. This is a capacity which is separate from the GPU compute units. It is always there and works independently from everything else. The GPU itself cannot use it. It's just for video scaling and we can use it at almost zero cost!

We probably cannot prevent the GPU scaling from happening at all (or maybe there's a property in the mp element?
But anyway - even when we cannot prevent that scaling from happening, we are still achieving a major difference: When we do the "free" fixed-function scaling, like from 4k to FHD, then we reduce the amount of data by 75% and the GPU needs to scale just from FHD to something smaller instead of having 4k frames at the input.
From a rough and simple calculation, this changes the projected gpu activity range of 20-80% down to 5-20%.

@brabebhin
Copy link
Collaborator

brabebhin commented Nov 19, 2024

Media Foundation? How does that come into play?

AFAIU, FFmpegInteropX is decoding via ffmpeg using D3D11VA hw decoders and the output from ffmpeg are D3D surfaces. Each time, when the media player element fires its event, we give them one of the D3D surfaces.

Not right?

Without access to the MS source code it is impossible to know, but I believe the inner working is something similar to this:

FFmpegInteropx--> MediaPlayer --> MediaPlayerElement.

A MediaElement would basically be a MediaPlayerElement with an abstracted MediaPlayer attach to it.
Neither FFmpegInteroipX nor MediaPlayerElement belong to MF.
However, the magic happens inside MediaPlayer. My suspicion is that MediaPlayer is a combination over several MF functionalities, but most importantly, I believe it wraps the IMFMediaEngineEx API of MF.

MediaPlaybackItem will match MediaTopology.
MediaStreamSource will match MediaSource.
MediaPlaybackList is a MediaTopology with playlists, forgot the name.

I am pretty sure MediaPlayer will do the scaling you are referring to.
When you use it in frame server mode, you do not need to do scaling manually. It will detect the size of the direct3d surface you're trying to render to and it will scale accordingly.
I also believe the frame server mode is actually the natural way these things work, and the swap chain is something specifically done to allow easy building of things like a MPE.

@softworkz
Copy link
Collaborator

@brabebhin - I believe there are a number of inaccuracies in your post. Let's just wait for @lukasf to clear things up. 😄

@brabebhin
Copy link
Collaborator

There is a mistake, which I have since corrected ^^

@lukasf
Copy link
Member

lukasf commented Nov 19, 2024

It is absolutely clear that MediaPlayer is based on MF. MF is the way how media is done in Windows, it is the replacement of DirectShow. All the error messages you get from MediaPlayer have MF error codes, you can register IMFByteStreamHandlers and they will be automatically pulled in by the MF engine. You can even obtain some of the MF services from the MediaSource, which is how we get the D3D device. I also assume that internally the MediaPlayer is a wrapper around IMFMediaEngine, which has a very similar API surface and was introduced in a similar time frame, as a replacement of the older MFPlay apis.

Sure the GPU can do scaling in no time. But of course, the same super fast scaling is used when rendering the HW frames on the same HW device. You won't gain any performance benefit by forcing a downscale after decode. In fact you will lose a (tiny) bit, because that means there will be two scale operations, on after decode, and a second scale to the actual target size (unless you exactly know it upfront). If you don't know the exact size, the double scaling will not only cost performance, but it will also introduce scaling artifacts which you don't have if you only scale once directly to the final target size (that would be the bigger concern for me here). VRAM is really not an issue, it is only a bunch of frames that are decoded upfront, so even 4K video is easily handled on iGPUs without any issues.

I totally disagree that HW decoding is overrated. Sure my high power dev machine can easily do it. But a vast majority of devices out there are old and rather poorly powered and will never be able to decode a high bitrate 4K HEVC on the CPU. A lot of devices are sold even with Celeron CPUs. HW decoding is the only way to bring smooth high res video to those devices. And even if a device can SW decode, it will use at least 10x more CPU power compared to the dedicated HW decoder engines. They are so much more efficient. That means, a laptop that has enough battery to easily play 2h of video on HW decoder will be probably out of battery after half an hour SW decoding. And it will make a lot more noise. I would never use a player which cannot do HW decoding on my laptop, because of noise and battery lifetime concerns.

@softworkz
Copy link
Collaborator

softworkz commented Nov 19, 2024

It is absolutely clear that MediaPlayer is based on MF. MF is the way how media is done in Windows, it is the replacement of DirectShow.

There's no doubt about that. But @brabebhin wrote that MF would downscale the video which can be understood in two ways:

  1. The downscaling being an additional filter in the MediaFoundation graph (at the end)
    this is not possible because there's no MF graph. What FFmpegInteropX does is to provide a custom processing chain which is a replacement for what's normally provided by an MF graph
  2. MF would handle handle the scaling at the presentation level
    this is not possible becase MF has nothing to do with the presentation layer

.

Sure the GPU can do scaling in no time. But of course, the same super fast scaling is used when rendering the HW frames on the same HW device.

Of course not. Incorrect.

You won't gain any performance benefit by forcing a downscale after decode.

Incorrect. You do.

it will use at least 10x more CPU power compared to the dedicated HW decoder engines.

These are impossible to compare, and that factor is pure fantasy.

It appears that you have mistakenly assumed that I'd have been spilling out some opinions and assumptions above.
You should know that I've spent a significant amount of my life on this subject during the past 8 years.

You can pick any of the details I stated above and I'll take you into that subject as deeply as necessary until you'll acknowledge that I'm right about it.

My intention was to share some of the knowledge I have gained over time, especially on things that are not like you would normally think they would be. Don't know how I seemingly created the impression of doing some gossip talk.

@brabebhin
Copy link
Collaborator

The downscaling being an additional filter in the MediaFoundation graph (at the end)
this is not possible because there's no MF graph. What FFmpegInteropX does is to provide a custom processing chain which is a replacement for what's normally provided by an MF graph

This is not what I mean, but the claim that there's no graph is likely incorrect.
I am saying likely because we have no way of knowing.

MF would handle handle the scaling at the presentation level
this is not possible becase MF has nothing to do with the presentation layer

This is also a claim that you cannot really make unless you have access to MS's source control, in which case I will likely bombard you with more questions haha. MF does have something to do with the presentation layer.
Since I have created my own MPE using the frame server, I am pretty sure that the scaling you are referring to happens inside the MediaPlayer, since I can provide any size for the direct3d surface and the MediaPlayer will scale the video accordingly. Effortlessly :)

For non frame server implementation, MediaPlayer likely uses something like this:

https://learn.microsoft.com/en-us/windows/win32/api/mfmediaengine/nf-mfmediaengine-imfmediaengineex-updatevideostream

Just because MediaPlayer isn't by itself an UI element, it doesn't mean it doesn't have anything to do with the presentation layer. Taking in some parameters to render to, as opposed to encapsulating them, is simply a separation of concerns thing.

@brabebhin
Copy link
Collaborator

Like said above, I don't think it makes sense to send all individual bitmaps to the GPU, there needs to be some preprocessing at the CPI side. Look at these figures from overlaying some more complex animated ass subtitles onto a 4k video:

The amount of CPU processing that will actually be needed remains to be seen. We should off-load as much as possible to the GPU.

Thanks to @arch1t3cht I have a fairly good idea how this will work but until I see the actual outputs and I can play with libass and render some of these frames myself, it really is hard to just imagine it and figure out the best approach :)

@softworkz
Copy link
Collaborator

softworkz commented Nov 29, 2024

It's also worth noting that this very much depends on what "complex" means here.

http://streams.videolan.org/samples/sub/SSA/subtitle_testing_complex.mkv

and a 4k version created like this:

ffmpeg -c:v:0 h264 -i http://streams.videolan.org/samples/sub/SSA/subtitle_testing_complex.mkv -filter_complex "scale=w=3840:h=2160" -map v:0 -map s:0 -an -c:s copy -c:v:0 libx264 -g:v:0 72 -preset:v:0 slow -profile:v:0 high -crf:v:0 23 subtitle_testing_complex_4k.mkv

Benchmarking subtitle rendering is hard, especially when comparing different renderers, but one option is to play the subtitles on MPC-HC (clsid's fork) with the internal subtitle renderer (a VSFilter variant) and libass, and compare the number of dropped frames.

It looks like MPC-HC is rendering the subtitles at a lower resolution and upscales them for display.
Should it have a libass option for subtitle rendering, I don't see any in the latest version.

Aegisub uses VSFilter via CSRI, but I'm afraid I can't tell you much more than that.

Can you tell what CSRI is? Never heard of that..

Finally, the question of all questions: What are you working on, the most recent version of Aegisub is 3.2.2 from 2014... Or is there any newer somewhere? 😆

@astiob
Copy link

astiob commented Nov 30, 2024

VSFilter has many interfaces.

In particular [libass] outsources the blending to the user, who can then decide to blend on the GPU if they want - to my knowledge this is not possible with VSFilter.

XySubFilter uses SubRenderIntf, originally designed for madVR but nowadays also supported by MPC-HC, which outputs a list of RGBA bitmaps. I don’t know how/whether XySubFilter actually combines small bitmaps into bigger RGBA ones, but at any rate, the final blending onto video is done by the consumer.

MPC-HC’s internal VSFilter may have something similar of its own.

It looks like MPC-HC is rendering the subtitles at a lower resolution and upscales them for display.

When using the internal renderer as arch1t3cht suggested (or when using XySubFilter), it doesn’t. You may be using an external VSFilter/DirectVobSub: check your settings in Options → Playback → Output (or in older versions, directly in Options → Playback).

Should it have a libass option for subtitle rendering, I don't see any in the latest version.

Assuming you’re using the latest version from clsid2, the libass checkbox is tucked away in Options → Subtitles → Default style.

@softworkz
Copy link
Collaborator

softworkz commented Nov 30, 2024

I made some comparisons:

image

  1. Burn-In via ffmpeg
    This was too slow to play, you see the execution times above being 50ms per frame even without libass' rendering time, so the ffmpeg implementation is not something for us to look at
  2. This is from the Aegisub preview window
    Quality is good but it didn't play fluently (explained by @arch1t3cht)
  3. From our app using MPV
    Good quality, low CPU
  4. VLC Player
    Good quality, low CPU
  5. MPC-HC (internal renderer, VSFilter based)
    Works, but uses high CPU
    5b. MPC-HC with libass enabled
    Smaller fonts than all other cases, uses same high CPU like 5
  6. Well, that's FFmpegInteropX currently

So, the places to look at are MPV and VLC. MPV does a lot of things with shaders which puts significant load on GPUs, VLC is the the most efficient player among all. Their use of libass might be more straightforward, but it's just a guess. In terms of what ASS rendering adds to the CPU and GPU loads appears to be similar.

@astiob
Copy link

astiob commented Nov 30, 2024

[VLC’s] use of libass might be more straightforward

Performance aside, it may also be less correct. It certainly has been in the past. Exercise caution.

mpv is the exemplary existing user of libass that’s known to configure and use everything correctly.

@softworkz
Copy link
Collaborator

@astiob - Thanks a lot for the comment!

You were right, I needed to block loading of external VSFilter implementations, then it played fluently with the internal renderer and also with libass enabled (I've updated my post above accordingly). In both cases I've seen very high CPU load, very different from VLC and MPV.

Assuming you’re using the latest version from clsid2, the libass checkbox is tucked away in Options → Subtitles → Default style.

Yup, latest from clsid2. Found it, thanks, awkward placement indeed.

mpv is the exemplary existing user of libass that’s known to configure and use everything correctly.

Then it's definitely worth looking at it. I'm only familiar with ffmpeg's way of using it.

@arch1t3cht
Copy link

arch1t3cht commented Nov 30, 2024

In particular, VLC always renders its subtitles at the video's storage resolution and blends them to a single RGBA image, which is then scaled to the display resolution. This can cause artifacts, in particular when the display resolution is lower than the storage resolution (edit: I mean VLC's scaling specifically here. In general there can be good reasons for rendering at storage resolution, in particular for typesetting). This may also be the reason why VLC appears faster than mpv to you: If you're watching subtitles on a 1080p video in fullscreen on a 4k display, VLC will render subtitles at 1080p while mpv will render at 4k, which is slower. (You can make mpv render subtitles at the video's storage resolution using blend-subtitles=video, though this only works on vo=gpu and not yet on vo=gpu-next.)

@softworkz
Copy link
Collaborator

softworkz commented Nov 30, 2024

This may also be the reason why VLC appears faster than mpv to you:

It didn't. I said it seems equal.
What I said about VLC being most efficient in general is not about subtitles.
(edit: while MPV may have better quality)

This can cause artifacts, in particular when the display resolution is lower than the storage resolution.

Right, I've seen that before, It's bad scaling algorithm in place.

This may also be the reason why VLC appears

From the screenshot images, you you can see that what you said doesn't apply to my test - in case you know that video: I had created a version upscaled to 4k, to avoid players rendering the subs at the original video resolution 😄

@brabebhin
Copy link
Collaborator

So despite libass supporting meson, it seems some of its dependencies don't support UWP, namely fribidi and fontconfig.

@astiob
Copy link

astiob commented Dec 16, 2024

You don’t need Fontconfig for Windows (including UWP) libass. For FriBidi, I’m surprised to hear UWP matters. Surely it doesn’t access any system APIs and is subsystem-agnostic?

@brabebhin
Copy link
Collaborator

Yes, we can do without Fontconfig.
For FriBidi I will need to do more research. UWP API restrictions aren't just system calls, some C APIs will not be supported as they are considered unsafe. Normally requires some small code patches to make it compatible (stuff like replacing scanf with scanf_s).

@lukasf
Copy link
Member

lukasf commented Dec 18, 2024

I also noticed that for adding libass, we'd have to add 4-5 other libs first, most of which are not easy to build using MSVC. This was kind of a bummer, I did not expect it to have so many dependencies. Adding a single lib is usually quite some work already, and success is not guaranteed.

@brabebhin
Copy link
Collaborator

Worse, some of its dependencies also have dependencies, like freetype2.
This is going to be a very long journey.

@softworkz
Copy link
Collaborator

It's not just that. Things are breaking regularly and you need to determine which versions of all those libs are needed so that everything will be working together properly.

This can quickly become an insane task to keep up with.
SMP or MABS - both ways will allow you to maintain a happy life 😆

@wang-bin
Copy link

https://github.com/wang-bin/devpkgs can build libass by cmake, also provides prebuilt dlls

@lukasf
Copy link
Member

lukasf commented Dec 19, 2024

Interesting, thank you @wang-bin!

Is there a specific reason why the UWP builds do not contain a x86 target?

@wang-bin
Copy link

Interesting, thank you @wang-bin!

Is there a specific reason why the UWP builds do not contain a x86 target?

x86 is included in the latest build

@lukasf
Copy link
Member

lukasf commented Dec 22, 2024

Thank you @wang-bin, I will definitely check this out!

@lukasf
Copy link
Member

lukasf commented Dec 23, 2024

Small update on this: I got an experimental build script ready, which pulls the latest bins from @wang-bin and extracts them to the corresponding folders, allowing ffmpeg to build with libass enabled. Generally this seems to work - at least there are no build or linker errors.

Some observations:

  1. While there are some build variants available, there does not seem to be a static lib build. We usually statically link all dependencies into the ffmpeg dlls to keep the dll count low. While a dll would work as well, a static libass lib would be preferred from my side.
  2. It seems that something is wrong with the uwp libass.dll. It references VCRUNTIME_140.dll, but as UWP dll it should reference VCRUNTIME_140_APP.dll instead. I think it wouldn't be possible to submit an app to the Windows Store using this dll. The CMake parameters in the github actions look correct to me, so I am surprised that the dll carries a wrong dependency. But I never actually tried to build UWP dlls using CMake. As stated in 1. we always build static libs and link them into ffmpeg dlls using ffmpeg build scripts (which do produce correct _app.dll dependencies). Static UWP libs from CMake work fine for us.

We can still use this for experimenting with libass integration. But sooner or later we need to make sure we have a proper setup.

The next few days will be busy, but I will try to push out a libass enabled preview build before new year's.

@softworkz
Copy link
Collaborator

softworkz commented Dec 23, 2024

2. I think it wouldn't be possible to submit an app to the Windows Store using this dll

MSI/Exe: yes
WinUI3 MSIX: yes
UWP: no

@wang-bin
Copy link

wang-bin commented Dec 24, 2024

  1. While there are some build variants available, there does not seem to be a static lib build. We usually statically link all dependencies into the ffmpeg dlls to keep the dll count low. While a dll would work as well, a static libass lib would be preferred from my side.

main branch builds both static lib and dll

  1. It seems that something is wrong with the uwp libass.dll. It references VCRUNTIME_140.dll, but as UWP dll it should reference VCRUNTIME_140_APP.dll instead.

wrong vc lib dir is used in my ci, fixed in the latest build. change lib\<arch> to lib\<arch>\store

@rmtjokar
Copy link
Author

I checked @wang-bin static builds in C# P/Invoke and was messing around with libass again. I found that the fonts issue I mentioned before can be solved by calling ass_set_extract_fonts(IntPtr, 1) before calling ass_renderer_init, which resolves the issue.

Additionally, I have successfully used Win2D CanvasControl instead of Image control + WriteableBitmap, and now it renders smoothly, even with @softworkz' example.

Source Code:
LibAss UWP2024.zip

File example:
http://streams.videolan.org/samples/sub/SSA/subtitle_testing_complex.mkv
Extracted Subtitle:
subtitle_testing_complex sub file.zip

Notes:

  • This sample only works with external subtitles, loading an MKV and ASS subtitle file.
  • Attached is the extracted subtitle for subtitle_testing_complex.mp4.
  • I didn't use FFmpegInteropX since this is just a sample for external subtitles.

Extras:
I've also discovered the following after researching for the past few hours:
For embedded subtitles, you can call ass_process_codec_private(ASS_TRACK, header, headerSize) instead of calling ParseHeaders():

Additionally, use ass_process_chunk(ASS_TRACK, dialogue, size, start, duration) to add subtitle dialogues (MKV subtitle chunk from FFmpeg) in

find_and_replace(str, L"\\n", L"\r\n");
.

@lukasf
Copy link
Member

lukasf commented Dec 27, 2024

  1. While there are some build variants available, there does not seem to be a static lib build. We usually statically link all dependencies into the ffmpeg dlls to keep the dll count low. While a dll would work as well, a static libass lib would be preferred from my side.

main branch builds both static lib and dll

  1. It seems that something is wrong with the uwp libass.dll. It references VCRUNTIME_140.dll, but as UWP dll it should reference VCRUNTIME_140_APP.dll instead.

wrong vc lib dir is used in my ci, fixed in the latest build. change lib\<arch> to lib\<arch>\store

@wang-bin I see in the github actions that a static libass.lib is supposed to be generated, but the output zip does not contain it. Maybe an install step must be added to the libass CMake files?

And also in the latest uwp builds, libass.dll still references the desktop VCRuntime dlls instead of the _app.dll. It's interesting that the libdav1d.dll and zlib.dll do reference the _app.dll files as expected. It is only the libass.dll which still does not reference the right dlls.

@wang-bin
Copy link

wang-bin commented Dec 28, 2024

@lukasf

@wang-bin I see in the github actions that a static libass.lib is supposed to be generated, but the output zip does not contain it. Maybe an install step must be added to the libass CMake files?

added in the latest build

And also in the latest uwp builds, libass.dll still references the desktop VCRuntime dlls instead of the _app.dll. It's interesting that the libdav1d.dll and zlib.dll do reference the _app.dll files as expected. It is only the libass.dll which still does not reference the right dlls.

VCRuntime dll is correct. which download link?

@lukasf
Copy link
Member

lukasf commented Dec 28, 2024

VCRuntime dll is correct. which download link?

@wang-bin You are absolutely right. I just checked the same files from December 24 again, and the VC references are indeed correct. Maybe I was too tired and got confused at some point, sorry about that. Thank you for adding the static lib!

@brabebhin @softworkz @rmtjokar I have uploaded experimental nuget packages containing FFmpeg 7.1 with libass (currently dll version, not yet static linked). Please check out the branch libass-wang. It already references the new nuget packages, and ass headers are available directly in FFmpegInteropX (#include "ass/ass.h"). So now a base is available for further experiments to get libass into the game.

Side note: I ran into some ffmpeg build issues. The latest VS comes with a VC compiler version 14.4x, while the platform toolset it uses is still v143. So the old rule that the platform toolset version can be inferred from the compiler version is not true anymore. I had to split up and add a separate parameter, to get the build environment working again. But then the next issues started to appear: For unknown reasons, when using the 14.4 compiler, the ffmpeg build script does not correctly detect MSVC or Windows target anymore. It tries to link unix headers or use unix functions at different places, causing the build to fail. I had to manually install and use an older VC compiler (I used 14.38), to get the build working again. The reason is still unclear to me. This is not specific to libass, even without it, the error occurs. So in case you run into issues doing custom ffmpeg builds, be warned, you could be hitting this one as well. Updating to ffmpeg 7.1 did not change anything.

@brabebhin
Copy link
Collaborator

Excellent work @lukasf . Thanks for your help @wang-bin

I'll get to it ASAP.

I'll try to have everything as an image sub and relay on MediaPlayerElement to do the rendering. This should be the simplest goal to reach.

@brabebhin
Copy link
Collaborator

@lukasf I am getting some linker errors with the latest commit on origin/libass-wang

FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_library_init referenced in function "public: long __cdecl LibassSubtitleProviderSsaAss::InitLibass(void)" (?InitLibass@LibassSubtitleProviderSsaAss@@QEAAJXZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_library_done referenced in function "public: virtual __cdecl LibassSubtitleProviderSsaAss::~LibassSubtitleProviderSsaAss(void)" (??1LibassSubtitleProviderSsaAss@@UEAA@XZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_renderer_init referenced in function "public: long __cdecl LibassSubtitleProviderSsaAss::InitLibass(void)" (?InitLibass@LibassSubtitleProviderSsaAss@@QEAAJXZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_set_frame_size referenced in function "public: long __cdecl LibassSubtitleProviderSsaAss::InitLibass(void)" (?InitLibass@LibassSubtitleProviderSsaAss@@QEAAJXZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_set_use_margins referenced in function "public: long __cdecl LibassSubtitleProviderSsaAss::InitLibass(void)" (?InitLibass@LibassSubtitleProviderSsaAss@@QEAAJXZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_set_font_scale referenced in function "public: long __cdecl LibassSubtitleProviderSsaAss::InitLibass(void)" (?InitLibass@LibassSubtitleProviderSsaAss@@QEAAJXZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_set_fonts referenced in function "public: long __cdecl LibassSubtitleProviderSsaAss::InitLibass(void)" (?InitLibass@LibassSubtitleProviderSsaAss@@QEAAJXZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_render_frame referenced in function "public: virtual struct winrt::Windows::Media::Core::IMediaCue __cdecl LibassSubtitleProviderSsaAss::CreateCue(struct AVPacket *,class std::chrono::duration<__int64,struct std::ratio<1,10000000> > *,class std::chrono::duration<__int64,struct std::ratio<1,10000000> > *)" (?CreateCue@LibassSubtitleProviderSsaAss@@UEAA?AUIMediaCue@Core@Media@Windows@winrt@@PEAUAVPacket@@peav?$duration@_JU?$ratio@$00$0JIJGIA@@std@@@chrono@std@@1@Z)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_free_track referenced in function "public: __cdecl LibassSubtitleProviderSsaAss::AssTrackWraper::~AssTrackWraper(void)" (??1AssTrackWraper@LibassSubtitleProviderSsaAss@@qeaa@XZ)
1>FFmpegMediaSource.obj : error LNK2019: unresolved external symbol ass_read_memory referenced in function "public: virtual struct winrt::Windows::Media::Core::IMediaCue __cdecl LibassSubtitleProviderSsaAss::CreateCue(struct AVPacket *,class std::chrono::duration<__int64,struct std::ratio<1,10000000> > *,class std::chrono::duration<__int64,struct std::ratio<1,10000000> > *)" (?CreateCue@LibassSubtitleProviderSsaAss@@UEAA?AUIMediaCue@Core@Media@Windows@winrt@@PEAUAVPacket@@peav?$duration@_JU?$ratio@$00$0JIJGIA@@std@@@chrono@std@@1@Z)
1>..\Output\FFmpegInteropX\x64\Debug_UWP\FFmpegInteropX.dll : fatal error LNK1120: 10 unresolved externals

I see ass headers, ass.lib and ass.dll are all present in the nuget package, I am not quite sure what is missing.

@lukasf
Copy link
Member

lukasf commented Dec 29, 2024

Oh I forgot to add ass.lib as reference in the Nuget target file. As a workaround, can you try to add ass.lib in the linker AdditionalDependencies in line 184 of FFmpegInteropX.vcxproj?

I hope it works that way, otherwise I need to build a new nuget package.

@brabebhin
Copy link
Collaborator

Yep. That was it. Didn't cross my mind to check the nuget target file haha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants