-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libass support #439
Comments
Hi May I ask why you need libass? I don't think you can use libass easily without modifying ffmpeg builds because all of the addon libraries need to be linked into ffmpeg, otherwise ffmpeg wouldn't know it can use it. You technically can use libass into ffmpeginteropx but that would require a code change that's not trivial. |
That's because ffmpeg is a C library and does not provide .net interfaces, which is what .wnmd files are for. |
The av-libs built by Shift-Media-Project include all this.
It would be rather easy, but there's a little caveat to that: you'll loose hardware acceleration, because the subtitle "burn-in" can only happen in a sw filter. Even "hw decoding to cpu mem" is pointless to do, because experience has shown that software decoding is less resource intesive than "hw-decode, hwdownload, hwupload" in most cases. The hwupload alone is killing already for high-res videos like 4k). That little caveat is a KO for the idea, unfortunately. |
I might have misunderstood but i was under the impression that OP wanted libass from SMP with our own ffmpeg builds. Which is ofc not possible. Filter burnin isn't the only possible way to work with libass. We can expose ssa/srt as image cues and render the images ourselves to feedv into the sub stream. Then the media sink will handle rendering/burn in/whatever. |
Hi @rmtjokar, long time no see! I think libass has recently added a meson build system. It should be pretty easy now to directly integrate it, without having to resort to a SMP fork. The SMP build system has some disadvantages for us, like, not all its libs do have UWP targets, and if they do, they usually don't have ARM targets. And their project files are horribly messy, which makes it hard to maintain a fork with added WinRT+ARM configs. So using SMP is always last resort for me. As others have noted, the question is, what are you trying to accomplish with adding libass? You won't be able to use it without bigger changes in our lib. Subtitles are not normal streams, and our effects system does not work with them. We would need explicit code to post-process subs with libass (through ffmpeg filters). A big problem is that libass does not support GPU rendering, and copying frames from GPU to CPU memory for rendering is very expensive (and afterwards they need to be copied back!). Which means that we cannot really use libass to directly render the subs into video frames. And the subtitle rendering system in windows is rather poorly implemented. The bitmap subtitle rendering is intended for static bitmap (text) frames, not animated live frames. I am pretty sure that it would be horribly jaggy if we'd try to feed animated subs into it. We had to use quite elaborate workarounds, only to get clean flicker-free static subs. The only use I could currently see is for transcoding static (non-animated) ssa/ass subs into bitmap subtitles, using the libass rendering engine, which is sure better than the Windows rendering system. Downside is that we don't know which target size to render to, which might lead to more or less noticeable scaling artifacts. And I am not sure how flexible the libass filter in ffmpeg is - we would somehow have to disable animations. Oh and keep in mind that bitmap subtitle rendering is still broken in WinUI. Or has this bug "already" been resolved? I guess not, but I did not check for a long time. Has anyone tried with a recent version @brabebhin @softworkz? |
Hi @lukasf I haven't heard anything on the MPE front, but I would guess the bug is still there as all the winui 3 effort seems to have gone into supporting AOT and designers. I have since developed my own MPE based on directx and frame server mode, as well as custom sub rendering with win2D. But this shouldn't be a show stopper for us. We can use UWP as a benchmark, since both UWP and winUI use the same MF interfaces. As an ugly workaround for size rendering we could simply ask the user to provide a size for us to render against and have the user update it on resize. |
Sure we could pass in a size, but at least when using the Windows rendering, it is not even clear at which exact position and size a sub is rendered. Of course this is not a problem when using custom subtitle rendering (which is probably the better approach anyways). |
For sub animations, custom rendering would be the only way to do it. Windows rendering is just 50% arcane. The regions containing a cue will direct where subtitles are rendered. The region itself has coordinates that determine where it will be rendered on screen (these may be absolute positions or percentages). IIRC, images cues are always rendered in their own region and will have pretty much absolute positioning and size. For text cues it gets more arcane because Windows groups them by regions, and then inside the region you have some sort of flow directions. For whatever ungodly reason they also use XAML composition for rendering, which I would guess is why we observe flickering. The MF interfaces provide A LOT of customization and allows applications fine grained control over subtitles, but seems MPE chooses to only implement a few of the combinations. Which is why it seems arcane, as it only implements whatever suits them. It's almost as if the MF team had nothing to do with MPE. At this point I think the conversation moves towards whether we want to also become a rendering library and not just demux+decoding. In the end I think creating our own MPE isn't that hard. If we do want to create our own MPE, we could completely forget about MF's way of doing subtitles and just render them directly inside FFmpegInteropX. We'd still have to support Windows rendering too. |
My Subtitle Fitlering patchset includes a text2graphicsub filter, which allows to convert text subtitles (including ASS) to graphical subtitles like dvd, dvb or x-subs, so all you need to do is to add some filters and at the end you get graphical subs like from any other file. It also has an option to strip animations. Yet, my general view on this though is this: Most of those who have ASS subtitles are expecting animations to work. For non-animated subtitles we are not 100% accurate but pretty close to libass. Starting any work to integrate libass without animation support is rather pointless as it won't make anybody happy. So either go for making libass fully work, including animations, or just leave it. IMO, putting effort into this is only justified when it allows to get it working in full effect. |
Hi, sorry I had to go out of the city for the past week. @brabebhin Using libass is mainly for supporting ASS effects and animations. Its renderer is quite fast and smooth, and other player apps like PotPlayer, KMPlayer, and even MXPlayer on Android use this library. Maybe we can do the same. @softworkz Thanks for the information. As I saw in PotPlayer, there are two ways of showing subtitles: "Vector Text Renderer" and "Image Text Renderer," both of which have better quality than displaying text in a UWP app (using SubtitleCue or even Win2D Canvas with the same font/style). It's strange that they have better text rendering quality. I was thinking maybe I can use libass.dll directly in my app (just giving the whole ASS text to libass for external subtitles only), so I created a wrapper around libass.dll x64 version with P/Invoke in a WPF app (because I couldn't with UWP), but I got stuck. Another option I've been using for the past eight years is Win2D and its CanvasControl. I created a new renderer using ChatGPT, and it seems you can manipulate the fading effect via the color's alpha channel without touching anything else. (Take a look at this:) Rec.0002.mp4 |
So I found a project which uses libass: And I could've use it in uwp with Image control and this is the result: Pot Player > UWP 2 with sound> It’s actually quite good. It’s not as smooth as PotPlayer, but it’s still very good. |
I think this ultimately comes down to whether we want to venture into the land of rendering subtitles. So far we've been strictly a demix+decoding library. MPE is quite limited when it comes to subtitles, and not much can be done about it. I could devote some time to this once a decision is made. |
@softworkz Your changeset is exactly what would be needed for rendering static subtitle frames libass. But I think we all agree that static subs this is not the intention here. If we'd add libass, it is for the animated subtitles. I am not such a big fan of writing our own renderers. While it is very flexible, it makes it more difficult to use our lib. Currently, our output can just be put in a MPE and it all works. If some of our features require custom renderers, we would break with that concept, and require devs to migrate their apps from MPE to our custom rendering. Also, I think it is difficult to synchronize the subtitle renderer with the video renderer. Decoding is decoupled from rendering, and we do not get much information about the actual playback position. Text subs rendering is easier, since we get events when a new sub is to be shown. It would be great if we could find a way to use libass and somehow integrate it in our decoding chain - without killing performance. I am trying to brainstorm in that direction. Idea 1: ffmpeg recently added vulkan as a new cross-platform hw decoder type, and included a buch of filters which can run directly on vulkan frames. There is a filter called "hwmap", which can be used to map frames from one hw decoder type to a different hw decoder type (or a different gpu device of same type). It has an option to "derive" its output hw context from the device which was used in the input frames. It seems that if the underlying device on input and output hw context is same and compatible, then hwmap can directly map the frame, without copying. Setting the mode to "direct" can enforce this. If this would indeed work, then we could achieve hw accelerated gpu rendering: We could render the frames from libass onto a transparent hw texture and overlay this on the video using hwmap(vulkan) -> overlay_vulkan -> hwmap(d3d11). Of course, I don't even know if the hwmap stuff really works like that, and how easy it is to get a ffmpeg build with vulkan support. A downside of the approach is that we would be locked to the video output resolution. Idea 2: We could create a second MediaSource, which just contains the rendered subtitle as a video stream with alpha channel. Users of the lib would have to add a second MPE layered above the first one, and link them using a MediaTimelineController. I never tried MediaTimelineController, so don't know how well it does actually. But at least theoretically, it should take care of the syncing. Frankly, this also requires quite some modification on app side. Not sure if any of this makes sense, just trying to explore some alternative approaches... |
Here's the full tree of options from my point of view: mindmap
root((Subtitle<br>Overlay))
Burn into video
**B1**<br>hw decode<br>hw download<br>sw burn-in<br>hw upload
**B2**<br>sw decode<br>sw burn-in<br>hw upload
**B3**<br>sw render blank frame<br>hw upload<br>++++++++++<br>hw decode<br>hw overlay
**B4**<br>sw render half-size frame<br>hw upload<br>hw upscale<br>++++++++++<br>hw decode<br>hw overlay
**B5**<br>sw render partial sprites<br>hw upload<br>hw upscale<br>++++++++++<br>hw decode<br>hw overlay
Presentation Layering
**L1**<br>render full frames<br>Copy to D3D surface<br>overlay manually
**L2**<br>render partial sprites<br>Copy tp D3D surfaces<br>overlay manually
Burn into video optionsB1This is the worst of all options, because you need to copy every single uncompressed frame from GPU to CPU memory and then again from CPU memory to GPU memory. B2The advantage of hw decoding is often over-estimated. Other than in case of encoding, video decoding can be easily done by CPUs as well. The big advantage of using hw decoding is that the large amounts of data never need to be copied between system and gpu memory, because gpu memory is always the eventual target. HW decoding followed by immediate downloading to cpu memory almost never makes sense. But for this case of doing sw burn-in of subtitles, B2 is significantly better than B1, because the memory transfer hits in much harder than the sw (instead of hw) decoding. B3This is another scenario which requires my subtitles patchset and which will go into our server transcoding process shortly (it's there already, just not unlocked for the public). While it still involves uploading the overlay frames from cpu to gpu memory, like in case of B2, there's still a massive advantage: You don't need to upload at the same rate as the video fps:
Unfortunately, in context of FFmpegInteropX it's not straightforward to go that way, becauzse you cannot use this with the DX11VA decoders. Instead, you would need to use either the vendor-specific hw contexts and fitlers (like overlay_qsv or overlay_cuda). AFAIK, Vulkan is not stable enough on Windows yet. At least it doesn't work reliably with MPV player, even though it supports it. The one other (stable) option is to use OpenCL. There's also an overlay_opencl filter in ffmpeg, I'm just not sure whether you can hwmap from a d3d11va context to OpenCL. I know that d3d11va-to-opencl works for AMD, I know that it works from qsv-to-opencl and from cuda-to-opencl. But I'm not sure whether d3d11va-to-opencl works with Intel and Nvidia gpus. B4This is a low-profile variant of B3. By using only half-size frames for the subtitle overlay, you save 75% of the memory bandwidth. You woluldn't do that for 720p or lower-res videos, but for 4k, it's a good way to optimize for performance B5That would be the "holy grail": Instead of full frames, you would use one or more smaller surfaces, to cover only the regions with subtitle content. It's difficult to implement though, because you are typically working with a pool of D3D surfaces and that becomes difficult to manage when you have sizes which are changing dynamically. This would require modificationsn to the overlay filters in ffmpeg. Presentation Layering optionsBasicThe idea of not touching the video frames at all has a lot of appeal, as the nature of subtitles is significantly different from the actual video:
Means, while the case of full-screen+full-fps needs to be accounted for, but the implementation doesn't need to permanently create full-size overlays at the same rate as the video. ffmpeg SideI don't think it's required to have a totally separate ffmpg instance for subtitles. This would cause a lot of problems and a lot of work. ffmpeg can have more than a single video output, so for this case, one output would be the video frames as D3D surfaces and the rendered subtitles as software frames (L1) or a collection of multiple areas per frame (L2). How these eventually get on screen is an open question though, at this point. Maybe it's feasible to expose a secondary MediaSource from the main session? Another throught I had at some point is whether the 3d stereo capability of Windows.Media could be "misused" for displaying a secondary layer on top of the main video... "Rendering"It's not clear to me what have been referred to as "custom renderer". The renderer is always libass: You give it a range of memory, representing the video frame and then it renders the subtitles into that memory - pixel by pixel. The "only" thing that's left to do is to bring this rendered image on the screen. Canvas2dI've never used it and don'T knoiw about it's abilities. Maybe it's worth to try out whether the images generated by libass can be copied onto the canvas, but it's not clear to me how to do the switch from one image to the next one at the right moment in time. That's rather the domain of a SwapChainHaving a second wapchain on top of the video swapchain seems to be the most natural approach. This would need to be found out, because if not, then the only way would be to create a XAML island hwnd window on a separate thread or any other non-Winui3 technique to render the subs in a win32 window on top of the video so that the DWM (desktop window manager) does the composiion (typically using gpu overlay). L1, L2Same as with B5, a perfect implementation would work with individual areas rather than full-size frames, but it also adds a lot of complication. |
Here's another idea. If we keep this feature only when using DirectX decoders, then we can use compute shaders to burn in the image into the HW AVFrame before we send it to the MF pipeline. This will include an additional step which entitles a CPU->GPU memory copy of the subtitle, and another GPU->GPU operation. Might kill a frame or two. Compute shaders should be available on all Windows devices that we target, since their feature level is a hard requirement for windows support. This is simialr to @softworkz 's L1 (nice graph btw), except it is done on our side. @lukasf your first idea is technically possible. We can share memory between Vulkan and DirectX, there's something called VK_NV_external_memory in Vulkan which allows this kind of thing. |
Then you'll have to deal with all the HW formats that are being used for video frames on d3d surfaces when overlaying the subs.
It's similar to B3 because it would be applied into the video frame. L1/L2 means that video and subs remain separate blended only during presentation (like when you have one semitrasnparent window on top of another.
It's not a bitmap. Use the three-dot menu and shoose edit to see it 😄
ffmpeg has hw mapping to Vulkan currently only for VAAPI and CUDA. |
I'm not sure whether shaders are even needed for a trivial overlay. |
The elephant mama in the room here is a GPU->CPU memory copy, which is what's going to kill performance no matter how you spin it. I guess we wouldn't need compute shaders, but the compute shader has the advantage that you sort of know when it will run. For a custom MPE, we can stick to the same interface of the official MPE and only add new stuff, this will allow a drop-in replacement and ease adoption. We can use frame server mode to detect video position and render subtitles accordingly. |
Yes, it's this PLUS re-uploading again (B1). In case of B2, it's just one direction, so half of B1.
Like I said above:
and this includes 4k videos. SW decoding alone is not an elephant (of any age ;-). You can easily verify this by yourself. Just call ffmpeg like this:
Then you need to watch the "Speed" value in the output and also your CPU usage (because it's often not going to 100%). So for example, when you see 50% CPU usage and Speed of 6.0x, this means that your CPU is 12 times faster than needed for decoding the video in realtime (presentation at 1.0x). |
You might want to take a look at these two videos, demonstating dozens of ways for doing subtitle burn-in: https://github.com/softworkz/SubtitleFilteringDemos/tree/master/TestRun1 |
Sure, a desktop CPU or a plugged in laptop CPU will deal with 4K just fine in software mode. However, as soon as you factor in mobile devices that are not always plugged in and older CPUs, things get complicated. Ideally we should support both software and hardware anyways (we can skip over the system decoders as these are black boxes and only a MF filter will help us there). |
Yes, but that's because of the memory transfer. I'm sure it will run the pure decoding (the ffmpeg command above) of 4k video comfortably above 1.0x speed, even on batteries.
A laptop on batteries will hardly ever be used to drive a 4k display. Even full HD can be considered as a bit too much for a typical laptop screen. But still, FFmpegInteropX is moving around 4k frames when the source video is 4k., which is pretty bad, obviously. But there exists another trick for those cases, which isn't even specific to subtile overlay but can generally improve performance in case of 4k playback when the output display isn't 4k anyway: Above I said that there are no filtering capabilities available for the D3D11Va hw context, which applies to ffmpeg, but it's not the full truth. In fact there exists an API for video hw processing for D3D11Va and it's supported by all major GPU vendors (probably they wouldn't get certified for Windows without it). It's just that ffmpeg doesn't have an implementation for it. The two most important filtering capabilities that you get from this are hw deinterlacing and hw scaling, but let's forget about deinterlacing for now, and focus on scaling. Having such a filter, would allow to optimize performance significantly in all cases where the source video resolution is larger than the presentation (or the max presentation) resolution. The detailed conditions need to be decided upon by every developer individually, but examples would be like:
Like I mentioned above for B2: As soon as you are doing something with the data in hardware before downloading, the cost balance changes, which means: B1 with hw downscaling before hw downlaod becomes better than B2. And even outside of the subtitles subject, this would massively improve performance and reduce energy consumption for 4k playback on non-4k screens. |
And I completely agree. However, any real world scenario of decoding involves memory transfer at some point. And the CPU does bear some responsibility for it. Caching, memory controllers etc will all play a part in it.
You can always find those ridiculously speced "business" laptops that rock a 4k display with iGPU that can barely handle windows animations smoothly at that resolution xD We could technically implement that scaling optimization at our level. We know we can dynamically change the resolution of the video stream descriptors and MF will obey. However, wouldn't this downscaling already happen anyway? We are basically zero memory copy throughout all the decoding loops, MF will do the downscaling as it has to. I am not sure if we would actually win anything from this? It would just be us doing the down scaling instead of MF. I am speaking about the general implementation of this, not specifically for sub animations (that part is pretty clear). |
Media Foundation? How does that come into play? AFAIU, FFmpegInteropX is decoding via ffmpeg using D3D11VA hw decoders and the output from ffmpeg are D3D surfaces. Each time, when the media player element fires its event, we give them one of the D3D surfaces. Not right?
Yes, that's a really good question. Using the hw scaling right after decoding has two advantages: 1. It reduces GPU memory consumption There's always a pool of hw frames involved in decoding. The decoder needs to have a certain number of full-size frames to resolve references (forward and backward). These frames are a fixed requirement. The decoder doesn't produce exactly one frame right at the moment when it needs to be displayed. So there's another number of frames which are needed for queuing up between the decoder (between possible filters) and the final output of ffmpeg before they are actually provided for display. And this second number of hw frames is where GPU memory is reduced when scaling down each frame immediately after it gets out of the decoder. Scaling down 4k to FHD reduces the amount of memory by 75%. 2. Fixed functoin block scaling: you can't get it any cheaper Zero-copy sounds great, because copying is expensive, but what's even more expensive is scaling. When you supply the D3D surfaces to the media player element for display, and those are 4k while the display is just 1920, these surfaces need to be downscaled to the exact size of the element's panel. And who does perform that scaling => the gpu. We probably cannot prevent the GPU scaling from happening at all (or maybe there's a property in the mp element? |
Without access to the MS source code it is impossible to know, but I believe the inner working is something similar to this: FFmpegInteropx--> MediaPlayer --> MediaPlayerElement. A MediaElement would basically be a MediaPlayerElement with an abstracted MediaPlayer attach to it. MediaPlaybackItem will match MediaTopology. I am pretty sure MediaPlayer will do the scaling you are referring to. |
@brabebhin - I believe there are a number of inaccuracies in your post. Let's just wait for @lukasf to clear things up. 😄 |
There is a mistake, which I have since corrected ^^ |
It is absolutely clear that MediaPlayer is based on MF. MF is the way how media is done in Windows, it is the replacement of DirectShow. All the error messages you get from MediaPlayer have MF error codes, you can register IMFByteStreamHandlers and they will be automatically pulled in by the MF engine. You can even obtain some of the MF services from the MediaSource, which is how we get the D3D device. I also assume that internally the MediaPlayer is a wrapper around IMFMediaEngine, which has a very similar API surface and was introduced in a similar time frame, as a replacement of the older MFPlay apis. Sure the GPU can do scaling in no time. But of course, the same super fast scaling is used when rendering the HW frames on the same HW device. You won't gain any performance benefit by forcing a downscale after decode. In fact you will lose a (tiny) bit, because that means there will be two scale operations, on after decode, and a second scale to the actual target size (unless you exactly know it upfront). If you don't know the exact size, the double scaling will not only cost performance, but it will also introduce scaling artifacts which you don't have if you only scale once directly to the final target size (that would be the bigger concern for me here). VRAM is really not an issue, it is only a bunch of frames that are decoded upfront, so even 4K video is easily handled on iGPUs without any issues. I totally disagree that HW decoding is overrated. Sure my high power dev machine can easily do it. But a vast majority of devices out there are old and rather poorly powered and will never be able to decode a high bitrate 4K HEVC on the CPU. A lot of devices are sold even with Celeron CPUs. HW decoding is the only way to bring smooth high res video to those devices. And even if a device can SW decode, it will use at least 10x more CPU power compared to the dedicated HW decoder engines. They are so much more efficient. That means, a laptop that has enough battery to easily play 2h of video on HW decoder will be probably out of battery after half an hour SW decoding. And it will make a lot more noise. I would never use a player which cannot do HW decoding on my laptop, because of noise and battery lifetime concerns. |
There's no doubt about that. But @brabebhin wrote that MF would downscale the video which can be understood in two ways:
.
Of course not. Incorrect.
Incorrect. You do.
These are impossible to compare, and that factor is pure fantasy. It appears that you have mistakenly assumed that I'd have been spilling out some opinions and assumptions above. You can pick any of the details I stated above and I'll take you into that subject as deeply as necessary until you'll acknowledge that I'm right about it. My intention was to share some of the knowledge I have gained over time, especially on things that are not like you would normally think they would be. Don't know how I seemingly created the impression of doing some gossip talk. |
This is not what I mean, but the claim that there's no graph is likely incorrect.
This is also a claim that you cannot really make unless you have access to MS's source control, in which case I will likely bombard you with more questions haha. MF does have something to do with the presentation layer. For non frame server implementation, MediaPlayer likely uses something like this: Just because MediaPlayer isn't by itself an UI element, it doesn't mean it doesn't have anything to do with the presentation layer. Taking in some parameters to render to, as opposed to encapsulating them, is simply a separation of concerns thing. |
The amount of CPU processing that will actually be needed remains to be seen. We should off-load as much as possible to the GPU. Thanks to @arch1t3cht I have a fairly good idea how this will work but until I see the actual outputs and I can play with libass and render some of these frames myself, it really is hard to just imagine it and figure out the best approach :) |
http://streams.videolan.org/samples/sub/SSA/subtitle_testing_complex.mkv and a 4k version created like this:
It looks like MPC-HC is rendering the subtitles at a lower resolution and upscales them for display.
Can you tell what CSRI is? Never heard of that.. Finally, the question of all questions: What are you working on, the most recent version of Aegisub is 3.2.2 from 2014... Or is there any newer somewhere? 😆 |
VSFilter has many interfaces.
XySubFilter uses SubRenderIntf, originally designed for madVR but nowadays also supported by MPC-HC, which outputs a list of RGBA bitmaps. I don’t know how/whether XySubFilter actually combines small bitmaps into bigger RGBA ones, but at any rate, the final blending onto video is done by the consumer. MPC-HC’s internal VSFilter may have something similar of its own.
When using the internal renderer as arch1t3cht suggested (or when using XySubFilter), it doesn’t. You may be using an external VSFilter/DirectVobSub: check your settings in Options → Playback → Output (or in older versions, directly in Options → Playback).
Assuming you’re using the latest version from clsid2, the libass checkbox is tucked away in Options → Subtitles → Default style. |
I made some comparisons:
So, the places to look at are MPV and VLC. MPV does a lot of things with shaders which puts significant load on GPUs, VLC is the the most efficient player among all. Their use of libass might be more straightforward, but it's just a guess. In terms of what ASS rendering adds to the CPU and GPU loads appears to be similar. |
Performance aside, it may also be less correct. It certainly has been in the past. Exercise caution. mpv is the exemplary existing user of libass that’s known to configure and use everything correctly. |
@astiob - Thanks a lot for the comment! You were right, I needed to block loading of external VSFilter implementations, then it played fluently with the internal renderer and also with libass enabled (I've updated my post above accordingly). In both cases I've seen very high CPU load, very different from VLC and MPV.
Yup, latest from clsid2. Found it, thanks, awkward placement indeed.
Then it's definitely worth looking at it. I'm only familiar with ffmpeg's way of using it. |
In particular, VLC always renders its subtitles at the video's storage resolution and blends them to a single RGBA image, which is then scaled to the display resolution. This can cause artifacts, in particular when the display resolution is lower than the storage resolution (edit: I mean VLC's scaling specifically here. In general there can be good reasons for rendering at storage resolution, in particular for typesetting). This may also be the reason why VLC appears faster than mpv to you: If you're watching subtitles on a 1080p video in fullscreen on a 4k display, VLC will render subtitles at 1080p while mpv will render at 4k, which is slower. (You can make mpv render subtitles at the video's storage resolution using |
It didn't. I said it seems equal.
Right, I've seen that before, It's bad scaling algorithm in place.
From the screenshot images, you you can see that what you said doesn't apply to my test - in case you know that video: I had created a version upscaled to 4k, to avoid players rendering the subs at the original video resolution 😄 |
So despite libass supporting meson, it seems some of its dependencies don't support UWP, namely fribidi and fontconfig. |
You don’t need Fontconfig for Windows (including UWP) libass. For FriBidi, I’m surprised to hear UWP matters. Surely it doesn’t access any system APIs and is subsystem-agnostic? |
Yes, we can do without Fontconfig. |
I also noticed that for adding libass, we'd have to add 4-5 other libs first, most of which are not easy to build using MSVC. This was kind of a bummer, I did not expect it to have so many dependencies. Adding a single lib is usually quite some work already, and success is not guaranteed. |
Worse, some of its dependencies also have dependencies, like freetype2. |
It's not just that. Things are breaking regularly and you need to determine which versions of all those libs are needed so that everything will be working together properly. This can quickly become an insane task to keep up with. |
https://github.com/wang-bin/devpkgs can build libass by cmake, also provides prebuilt dlls |
Interesting, thank you @wang-bin! Is there a specific reason why the UWP builds do not contain a x86 target? |
x86 is included in the latest build |
Thank you @wang-bin, I will definitely check this out! |
Small update on this: I got an experimental build script ready, which pulls the latest bins from @wang-bin and extracts them to the corresponding folders, allowing ffmpeg to build with libass enabled. Generally this seems to work - at least there are no build or linker errors. Some observations:
We can still use this for experimenting with libass integration. But sooner or later we need to make sure we have a proper setup. The next few days will be busy, but I will try to push out a libass enabled preview build before new year's. |
MSI/Exe: yes |
main branch builds both static lib and dll
wrong vc lib dir is used in my ci, fixed in the latest build. change |
I checked @wang-bin static builds in C# P/Invoke and was messing around with libass again. I found that the fonts issue I mentioned before can be solved by calling Additionally, I have successfully used Source Code: File example: Notes:
Extras:
Additionally, use
|
@wang-bin I see in the github actions that a static libass.lib is supposed to be generated, but the output zip does not contain it. Maybe an install step must be added to the libass CMake files? And also in the latest uwp builds, libass.dll still references the desktop VCRuntime dlls instead of the _app.dll. It's interesting that the libdav1d.dll and zlib.dll do reference the _app.dll files as expected. It is only the libass.dll which still does not reference the right dlls. |
added in the latest build
VCRuntime dll is correct. which download link? |
@wang-bin You are absolutely right. I just checked the same files from December 24 again, and the VC references are indeed correct. Maybe I was too tired and got confused at some point, sorry about that. Thank you for adding the static lib! @brabebhin @softworkz @rmtjokar I have uploaded experimental nuget packages containing FFmpeg 7.1 with libass (currently dll version, not yet static linked). Please check out the branch libass-wang. It already references the new nuget packages, and ass headers are available directly in FFmpegInteropX ( Side note: I ran into some ffmpeg build issues. The latest VS comes with a VC compiler version 14.4x, while the platform toolset it uses is still v143. So the old rule that the platform toolset version can be inferred from the compiler version is not true anymore. I had to split up and add a separate parameter, to get the build environment working again. But then the next issues started to appear: For unknown reasons, when using the 14.4 compiler, the ffmpeg build script does not correctly detect MSVC or Windows target anymore. It tries to link unix headers or use unix functions at different places, causing the build to fail. I had to manually install and use an older VC compiler (I used 14.38), to get the build working again. The reason is still unclear to me. This is not specific to libass, even without it, the error occurs. So in case you run into issues doing custom ffmpeg builds, be warned, you could be hitting this one as well. Updating to ffmpeg 7.1 did not change anything. |
@lukasf I am getting some linker errors with the latest commit on origin/libass-wang
I see ass headers, ass.lib and ass.dll are all present in the nuget package, I am not quite sure what is missing. |
Oh I forgot to add ass.lib as reference in the Nuget target file. As a workaround, can you try to add ass.lib in the linker AdditionalDependencies in line 184 of FFmpegInteropX.vcxproj? I hope it works that way, otherwise I need to build a new nuget package. |
Yep. That was it. Didn't cross my mind to check the nuget target file haha. |
Hi,
I'm using FFmpeginteropX for a long time, thank you for your great work.
It seems ShiftMediaProject updated all the libraries to latest one around 3 weeks ago.
I read in #384 that @lukasf said:
Since its updated can we at least use libass version and make it work in FFmpeginteropX without touching FFmpeg builds? I tried to compile ShiftMediaProject's libass version and all its dependencies and I managed to build them all but there is no winmd file in the output folder.
Can you please help me in this?
Thanks in advance.
The text was updated successfully, but these errors were encountered: