Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status of clang config files in wrappers #469

Open
jeremyd2019 opened this issue Nov 13, 2024 · 21 comments
Open

status of clang config files in wrappers #469

jeremyd2019 opened this issue Nov 13, 2024 · 21 comments

Comments

@jeremyd2019
Copy link

jeremyd2019 commented Nov 13, 2024

I was just doing my usual post-llvm-update update of my cross-compiler wrappers (msys2/MINGW-packages#8762) (which are basically ripped off from yours), which includes looking to see if you made any relevant changes to the wrappers here. I saw a somewhat confusing commit history, where you switched to using cfg files, tweaked them, then reverted all of that due to several issues, and then seemingly reapplied the commits without change. I decided for this go-around to use the version of the wrappers at the revert (which was just a comment update different from what I had before), but I wanted to check in to see how the issues referenced in the revert were addressed here before I consider also switching to cfg files.

@mstorsjo
Copy link
Owner

Thanks for checking in!

See #253 for more discussion around the use of config files. The issues I ran into, which I mentioned in the revert, e0964ce, have been fixed.

So the config files themselves work fine overall, but there seems to be a bit of occurrance that some projects build with --no-default-config (to combat cases where afaik Gentoo have a global default that sets options that aren't really suitable for some cases of cross compilation and/or testing).

Note that while the config files are neat for many things, the mingw32uwp target (which admittedly sees very little use) doesn't quite work neatly with config files, at least not without changing what triples it uses. Also, for this configuration, see https://github.com/mstorsjo/llvm-mingw/blob/master/wrappers/clang-target-wrapper.sh#L82-L83 - we intentionally pass these default linker options after the user specified ones, rather than before, to make sure the user specified libs to link have precedence. I'm not sure if there's a neat way to do that with config files...

If there are no more surprises, the next point release of llvm-mingw will use the config files, and once that's out, I guess we'll get more widespread testing of it, and see if people stumble on other issues with it.

@invertego
Copy link

One apparent side effect of this change is a reduction in the set of target triples that will successfully compile.

For example, all the following -target options work in the 20241030 release: x86_64-w64-windows-gnu, x86_64-pc-windows-gnu, x86_64-windows-gnu.

In the 20241119 release, of that list, only x86_64-w64-windows-gnu still works.

I noticed this because llvm-mingw stopped working with vcpkg. In their mingw toolchain script they define the compiler target like so:
set(CMAKE_${lang}_COMPILER_TARGET "${CMAKE_SYSTEM_PROCESSOR}-windows-gnu" CACHE STRING "")

https://github.com/microsoft/vcpkg/blob/b545373a9a536dc559dac8583467a21497a0e897/scripts/toolchains/mingw.cmake#L42

Would it be more correct for them to use "${CMAKE_SYSTEM_PROCESSOR}-w64-windows-gnu" instead? Making that change locally, at least, seems to get things working again.

@mstorsjo
Copy link
Owner

One apparent side effect of this change is a reduction in the set of target triples that will successfully compile.

For example, all the following -target options work in the 20241030 release: x86_64-w64-windows-gnu, x86_64-pc-windows-gnu, x86_64-windows-gnu.

In the 20241119 release, of that list, only x86_64-w64-windows-gnu still works.

I noticed this because llvm-mingw stopped working with vcpkg. In their mingw toolchain script they define the compiler target like so: set(CMAKE_${lang}_COMPILER_TARGET "${CMAKE_SYSTEM_PROCESSOR}-windows-gnu" CACHE STRING "")

https://github.com/microsoft/vcpkg/blob/b545373a9a536dc559dac8583467a21497a0e897/scripts/toolchains/mingw.cmake#L42

Would it be more correct for them to use "${CMAKE_SYSTEM_PROCESSOR}-w64-windows-gnu" instead? Making that change locally, at least, seems to get things working again.

Thanks for the report!

Yes, I think that probably would be more correct to do, in general. I think most distributions of mingw-w64 based toolchains use the <arch64>-w64-mingw32 form of the triples (which gets normalized into <arch>-w64-windows-gnu within LLVM).

Alternatively, if there are strong reasons not to do it, we could add more copies of the config files, to pick up other forms of the triples, but I would say that overall, mingw-w64 toolchains do tend to use w64 as the vendor field in triples, quite consistently.

@Andarwinux
Copy link
Contributor

It would be nice to have *-pc-windows-gnu config files, that triple does not cause lld target mismatch warnings when cross-language LTO with rust gnullvm.

@mati865
Copy link
Contributor

mati865 commented Jan 7, 2025

@Andarwinux can you provide more details about the Rust issue?

@Andarwinux
Copy link
Contributor

@Andarwinux can you provide more details about the Rust issue?

Use -Clinker-plugin-lto=yes -Clto to build a rust project such as libdovi as a static library, then use LTO to build a C/C++ project such as svtav1-psy and link it with libdovi

ld.lld: warning: Linking two modules of different target triples: '/build/install/x86_64-w64-mingw32/lib/libdovi.adovi-68488083f3cf9169.74zkeaa4r2r0o3wwtfd1bgc69.rcgu.o824272' is 'x86_64-pc-windows-gnu' whereas 'Source/App/CMakeFiles/SvtAv1EncApp.dir/app_config.c.obj' is 'x86_64-w64-windows-gnu'

rustc apparently using pc-windows-gnu as the internal triple for the gnullvm target, whereas llvm will normalize w64-mingw32 to w64-windows-gnu.

@mstorsjo
Copy link
Owner

mstorsjo commented Jan 7, 2025

@Andarwinux can you provide more details about the Rust issue?

Use -Clinker-plugin-lto=yes -Clto to build a rust project such as libdovi as a static library, then use LTO to build a C/C++ project such as svtav1-psy and link it with libdovi

ld.lld: warning: Linking two modules of different target triples: '/build/install/x86_64-w64-mingw32/lib/libdovi.adovi-68488083f3cf9169.74zkeaa4r2r0o3wwtfd1bgc69.rcgu.o824272' is 'x86_64-pc-windows-gnu' whereas 'Source/App/CMakeFiles/SvtAv1EncApp.dir/app_config.c.obj' is 'x86_64-w64-windows-gnu'

rustc apparently using pc-windows-gnu as the internal triple for the gnullvm target, whereas llvm will normalize w64-mingw32 to w64-windows-gnu.

So to recap here - this issue isn't something that appeared when llvm-mingw switched to using config files, but a preexisting issue that you hope to fix with the config files?

Just adding <arch>-pc-windows-gnu config files wouldn't fix this on its own, though - you would need to do all the compilation of the C/C++ code with explicit -target <arch>-pc-mingw32 (or <arch>-pc-windows-gnu) as well. (Or we'd need to add separate <arch>-pc-mingw32-<tool> frontend wrapper links as well, but I feel that's probably out of scope.)

@Andarwinux
Copy link
Contributor

@Andarwinux can you provide more details about the Rust issue?

Use -Clinker-plugin-lto=yes -Clto to build a rust project such as libdovi as a static library, then use LTO to build a C/C++ project such as svtav1-psy and link it with libdovi

ld.lld: warning: Linking two modules of different target triples: '/build/install/x86_64-w64-mingw32/lib/libdovi.adovi-68488083f3cf9169.74zkeaa4r2r0o3wwtfd1bgc69.rcgu.o824272' is 'x86_64-pc-windows-gnu' whereas 'Source/App/CMakeFiles/SvtAv1EncApp.dir/app_config.c.obj' is 'x86_64-w64-windows-gnu'

rustc apparently using pc-windows-gnu as the internal triple for the gnullvm target, whereas llvm will normalize w64-mingw32 to w64-windows-gnu.

So to recap here - this issue isn't something that appeared when llvm-mingw switched to using config files, but a preexisting issue that you hope to fix with the config files?

Just adding <arch>-pc-windows-gnu config files wouldn't fix this on its own, though - you would need to do all the compilation of the C/C++ code with explicit -target <arch>-pc-mingw32 (or <arch>-pc-windows-gnu) as well. (Or we'd need to add separate <arch>-pc-mingw32-<tool> frontend wrapper links as well, but I feel that's probably out of scope.)

For wrapper-based llvm-mingw, I just append -target x86_64-pc-windows-gnu to the wrapper, while config-based llvm-mingw overrides -target via global CFLAGS, but can't build anything at all without a config for pc-windows-gnu. Of course, I can rename the existing w64 config, but it would be more convenient if llvm-mingw also included a pc-windows-gnu config.

Overall, I prefer the old wrapper way, at least on Linux it's not as slow as on Windows, and $CCACHE is very convenient.

Note that while the config files are neat for many things, the mingw32uwp target (which admittedly sees very little use) doesn't quite work neatly with config files, at least not without changing what triples it uses. Also, for this configuration, see https://github.com/mstorsjo/llvm-mingw/blob/master/wrappers/clang-target-wrapper.sh#L82-L83 - we intentionally pass these default linker options after the user specified ones, rather than before, to make sure the user specified libs to link have precedence. I'm not sure if there's a neat way to do that with config files...

llvm/llvm-project#117573
It seems to be able to do this with LLVM 20

@mati865
Copy link
Contributor

mati865 commented Jan 7, 2025

Can we somehow normalize the triples to the same output string somehow?
That's probably a topic for a different issue, however.

@mstorsjo
Copy link
Owner

mstorsjo commented Jan 7, 2025

For wrapper-based llvm-mingw, I just append -target x86_64-pc-windows-gnu to the wrapper, while config-based llvm-mingw overrides -target via global CFLAGS, but can't build anything at all without a config for pc-windows-gnu. Of course, I can rename the existing w64 config, but it would be more convenient if llvm-mingw also included a pc-windows-gnu config.

Ah, right, I see. Yes, with the config file based setup, your use case does become more complicated indeed.

Overall, I prefer the old wrapper way, at least on Linux it's not as slow as on Windows, and $CCACHE is very convenient.

It's not so much about performance (I doubt the wrappers cause that much extra overhead anyway), as it is about making various tool invocation cases work better (where e.g. clangd can figure out that the <arch>-w64-mingw32 target is supposed to use e.g. libc++) - and we still have the wrappers in place (even if they don't do much any longer) so you can still use $CCACHE if you find it convenient :-)

Note that while the config files are neat for many things, the mingw32uwp target (which admittedly sees very little use) doesn't quite work neatly with config files, at least not without changing what triples it uses. Also, for this configuration, see https://github.com/mstorsjo/llvm-mingw/blob/master/wrappers/clang-target-wrapper.sh#L82-L83 - we intentionally pass these default linker options after the user specified ones, rather than before, to make sure the user specified libs to link have precedence. I'm not sure if there's a neat way to do that with config files...

llvm/llvm-project#117573 It seems to be able to do this with LLVM 20

Oh, neat, thanks for the pointer! (We'd still need to munge the -uwp suffix somewhere in some inconvenient way, as mingw32uwp gets normalized into windows-gnu anyway though, so I'm not really sure if we want to spend a lot of fuss on that as I don't think many people use that configuration anyway.)

@Andarwinux
Copy link
Contributor

It's not so much about performance (I doubt the wrappers cause that much extra overhead anyway), as it is about making various tool invocation cases work better

I've heard that creating new processes is more expensive on Windows than Linux. If the wrapper is fast enough, then maybe it would be possible to build llvm as a busybox-style single executable with LLVM_TOOL_LLVM_DRIVER_BUILD without requiring users to enable symlink support, allowing statically linking libLLVM to speed up compilation significantly without inflating the size or even reducing the total size.

Ah, right, I see. Yes, with the config file based setup, your use case does become more complicated indeed.

It occurs to me that I could add -Xclang -triple -Xclang x86_64-pc-windows-gnu to the w64 config to override internal triple by default without affecting driver behavior, is this a clean approach? Perhaps llvm-mingw could include this by default?

@mstorsjo
Copy link
Owner

mstorsjo commented Jan 7, 2025

allowing statically linking libLLVM to speed up compilation significantly without inflating the size or even reducing the total size.

Are you referring to the performance penalty of the dynamically linked libLLVM/libclang-cpp (LLVM_LINK_LLVM_DYLIB)? I think the cost of that generally is overstated; I recently measured it to be 0.7% of total compilation time on Linux, for the test case of compiling Clang. Perhaps it is more costly on Windows - I haven't measured that - but I doubt it's very siginificant (more than a couple of percent, tops) anyway compared to the actual work of doing compilation. And yes, process creation generally is more expensive on Windows, so the wrapper executable setup does cost us a little bit there.

Ah, right, I see. Yes, with the config file based setup, your use case does become more complicated indeed.

It occurs to me that I could add -Xclang -triple -Xclang x86_64-pc-windows-gnu to the w64 config to override internal triple by default without affecting driver behavior, is this a clean approach? Perhaps llvm-mingw could include this by default?

I don't think I'd include that by default, that sounds quite odd and specific for your case.

In the current x86_64-w64-windows-gnu.cfg, where we have an explicit -target x86_64-w64-mingw32 - what happens if you change that to -target x86_64-pc-windows-gnu? I presume it wouldn't stop processing the current config files, so the config files would be read. Wouldn't that option have the desired effect for your case? Or would it end up in some other case where the mismatch breaks something (sounds plausible)?

Do note that mixing different variants of triples can be prone to other breakage as well. We currently don't build any of the runtimes with the LLVM_ENABLE_PER_TARGET_RUNTIME_DIR flag enabled, but if we do, e.g. the compiler-rt files would be placed in <root>/lib/clang/<version>/lib/<triple> rather than in <root>/lib/clang/<version>/lib/windows, and the libc++ files would be in <root>/lib/<triple>. That layout makes this kind of subtle triple mismatches entirely fatal - so I haven't opted to switch to that setup yet, and probably will avoid it as long as it possible.

@Andarwinux
Copy link
Contributor

Andarwinux commented Jan 7, 2025

Are you referring to the performance penalty of the dynamically linked libLLVM/libclang-cpp (LLVM_LINK_LLVM_DYLIB)? I think the cost of that generally is overstated; I recently measured it to be 0.7% of total compilation time on Linux, for the test case of compiling Clang. Perhaps it is more costly on Windows - I haven't measured that - but I doubt it's very siginificant (more than a couple of percent, tops) anyway compared to the actual work of doing compilation. And yes, process creation generally is more expensive on Windows, so the wrapper executable setup does cost us a little bit there.

The difference is small enough for most small source files, but is noticeable for some huge sources, such as sqlite3.c and Qt6 with UNITY_BUILD enabled. On my machine, clang statically linked to libLLVM takes 14% less time to compile sqlite3.c with -O3 -march=znver5. You can experiment with this using Fuchsia Clang (LLVM_TOOL_LLVM_DRIVER_BUILD static without PGO)

I don't think I'd include that by default, that sounds quite odd and specific for your case.

In the current x86_64-w64-windows-gnu.cfg, where we have an explicit -target x86_64-w64-mingw32 - what happens if you change that to -target x86_64-pc-windows-gnu? I presume it wouldn't stop processing the current config files, so the config files would be read. Wouldn't that option have the desired effect for your case? Or would it end up in some other case where the mismatch breaks something (sounds plausible)?

Do note that mixing different variants of triples can be prone to other breakage as well. We currently don't build any of the runtimes with the LLVM_ENABLE_PER_TARGET_RUNTIME_DIR flag enabled, but if we do, e.g. the compiler-rt files would be placed in <root>/lib/clang/<version>/lib/<triple> rather than in <root>/lib/clang/<version>/lib/windows, and the libc++ files would be in <root>/lib/<triple>. That layout makes this kind of subtle triple mismatches entirely fatal - so I haven't opted to switch to that setup yet, and probably will avoid it as long as it possible.

-Xclang is passed directly to cc1, so it doesn't change the behavior of the driver searching for config and resource-dir

@mstorsjo
Copy link
Owner

mstorsjo commented Jan 8, 2025

The difference is small enough for most small source files, but is noticeable for some huge sources, such as sqlite3.c and Qt6 with UNITY_BUILD enabled. On my machine, clang statically linked to libLLVM takes 14% less time to compile sqlite3.c with -O3 -march=znver5. You can experiment with this using Fuchsia Clang (LLVM_TOOL_LLVM_DRIVER_BUILD static without PGO)

14 % for that sounds like a lot - that's not what I'm seeing, so I would think there's another factor playing in here too.

I just (re)tested this; on Ubuntu 24.04 x86_64, a plain 1 stage build of Clang using the host compiler (GCC 13) and linker with -DCMAKE_BUILD_TYPE=Release, one build with -DLLVM_LINK_LLVM_DYLIB=OFF and one with -DLLVM_LINK_LLVM_DYLIB=ON. Compiling sqlite3.c with time clang -c sqlite3.c -O3 -march=znver5 gives 10.403 seconds for dylib=off and 10.513 seconds for dylib=on (lowest user time out of 3 compiles), i.e. a 1% difference.

I guess it's possible that there's a bigger difference if the build is more tuned (PGO, or the linker attempting to sort things in a clever way, etc).

In the current x86_64-w64-windows-gnu.cfg, where we have an explicit -target x86_64-w64-mingw32 - what happens if you change that to -target x86_64-pc-windows-gnu? I presume it wouldn't stop processing the current config files, so the config files would be read. Wouldn't that option have the desired effect for your case? Or would it end up in some other case where the mismatch breaks something (sounds plausible)?

-Xclang is passed directly to cc1, so it doesn't change the behavior of the driver searching for config and resource-dir

Yes, but that's not what I asked - I wanted to explore what effects it has if that existing -target option in the .cfg is changed.

@Andarwinux
Copy link
Contributor

14 % for that sounds like a lot - that's not what I'm seeing, so I would think there's another factor playing in here too.

I just (re)tested this; on Ubuntu 24.04 x86_64, a plain 1 stage build of Clang using the host compiler (GCC 13) and linker with -DCMAKE_BUILD_TYPE=Release, one build with -DLLVM_LINK_LLVM_DYLIB=OFF and one with -DLLVM_LINK_LLVM_DYLIB=ON. Compiling sqlite3.c with time clang -c sqlite3.c -O3 -march=znver5 gives 10.403 seconds for dylib=off and 10.513 seconds for dylib=on (lowest user time out of 3 compiles), i.e. a 1% difference.

Building LLVM with GCC seems strange, could you try Fuchsia Clang?
My Clang is built with Clang+ThinLTO (which is almost free for LLVM_TOOL_LLVM_DRIVER_BUILD, but may still need to disable some non-driverized component builds) and statically linked to libc++ and compiler-rt, very similar to Fuchsia Clang.

I guess it's possible that there's a bigger difference if the build is more tuned (PGO, or the linker attempting to sort things in a clever way, etc).

Yes, LTO+PGO+BOLT for static linking are so powerful that they can actually reduce compilation time by 36%.
PGO is too expensive for llvm-mingw, but once switched to LLVM_LINK_LLVM_DYLIB=OFF+LLVM_TOOL_LLVM_DRIVER_BUILD=ON, we can do instrumentation-based BOLT for llvm-mingw in minutes on Github Action.

In the current x86_64-w64-windows-gnu.cfg, where we have an explicit -target x86_64-w64-mingw32 - what happens if you change that to -target x86_64-pc-windows-gnu? I presume it wouldn't stop processing the current config files, so the config files would be read. Wouldn't that option have the desired effect for your case? Or would it end up in some other case where the mismatch breaks something (sounds plausible)?

-Xclang is passed directly to cc1, so it doesn't change the behavior of the driver searching for config and resource-dir

Yes, but that's not what I asked - I wanted to explore what effects it has if that existing -target option in the .cfg is changed.

After some experimentation, I realized that clang just ignores -target or --target in config and infers it from argv[0].

@mstorsjo
Copy link
Owner

mstorsjo commented Jan 8, 2025

14 % for that sounds like a lot - that's not what I'm seeing, so I would think there's another factor playing in here too.
I just (re)tested this; on Ubuntu 24.04 x86_64, a plain 1 stage build of Clang using the host compiler (GCC 13) and linker with -DCMAKE_BUILD_TYPE=Release, one build with -DLLVM_LINK_LLVM_DYLIB=OFF and one with -DLLVM_LINK_LLVM_DYLIB=ON. Compiling sqlite3.c with time clang -c sqlite3.c -O3 -march=znver5 gives 10.403 seconds for dylib=off and 10.513 seconds for dylib=on (lowest user time out of 3 compiles), i.e. a 1% difference.

Building LLVM with GCC seems strange

Not sure what's strange with that? That's a fairly reasonable default stage1 on Linux using whatever the host toolchain is.

could you try Fuchsia Clang? My Clang is built with Clang+ThinLTO (which is almost free for LLVM_TOOL_LLVM_DRIVER_BUILD, but may still need to disable some non-driverized component builds) and statically linked to libc++ and compiler-rt, very similar to Fuchsia Clang.

I can try building with Clang and ThinLTO - it is plausible that there is a notable difference with ThinLTO.

I guess it's possible that there's a bigger difference if the build is more tuned (PGO, or the linker attempting to sort things in a clever way, etc).

Yes, LTO+PGO+BOLT for static linking are so powerful that they can actually reduce compilation time by 36%. PGO is too expensive for llvm-mingw, but once switched to LLVM_LINK_LLVM_DYLIB=OFF+LLVM_TOOL_LLVM_DRIVER_BUILD=ON, we can do instrumentation-based BOLT for llvm-mingw in minutes on Github Action.

One of the main issues I have with anything profile/instrumentation based, is that it's problematic to apply when Clang is cross compiled. (E.g. all my Windows toolchains are cross built from Linux, and built for architectures which aren't even runnable on github actions yet, like Windows/aarch64. I also build for Linux/aarch64 this way on Linux/x86_64.)

Yes, but that's not what I asked - I wanted to explore what effects it has if that existing -target option in the .cfg is changed.

After some experimentation, I realized that clang just ignores -target or --target in config and infers it from argv[0].

Hmm, that's odd. In an earlier stage of the cfg file support (see e2e9216), I explicitly selected a config file with --config, and the config file then selected the -target - and this worked just fine (if not, it wouldn't have gotten any correct cross target at all). Since 866d47c the config file is picked implicitly from -target, but I wouldn't see why it wouldn't react to -target in the config files still - or perhaps it's a difference when the command line already contains an explicit -target option, if the config file options are parsed before the command line ones. That's probably it.

@mstorsjo
Copy link
Owner

mstorsjo commented Jan 8, 2025

Back on topic, since the config triple hack was rejected, I'm not sure how hard it would be to fix this on Rust or LLVM side. While the target mismatch warning is harmless, I'm concerned that it's still enough to make people think that gnullvm+llvm-mingw+LTO isn't a tested and robust combination. It would be nice if there was a way to solve this problem out-of-the-box without hacking.

I wonder if it could be reasonable to make LLVM skip that warning for LTO, if the triples only differ in the vendor field (either in general, or for specific OSes where there are known multiple vendor fields used in the wild).

In order for LTO to work in this combination, isn't there still a requirement that both Rust and llvm-mingw use pretty much the same version of LLVM? (IIRC Rust uses a patched LLVM - hopefully those patches don't affect IR and LTO interop. And based on https://www.npopov.com/2025/01/05/This-year-in-LLVM-2024.html#rust, the number of patches these days is down to only one.)

@mstorsjo
Copy link
Owner

Back on topic, since the config triple hack was rejected, I'm not sure how hard it would be to fix this on Rust or LLVM side. While the target mismatch warning is harmless, I'm concerned that it's still enough to make people think that gnullvm+llvm-mingw+LTO isn't a tested and robust combination. It would be nice if there was a way to solve this problem out-of-the-box without hacking.

I wonder if it could be reasonable to make LLVM skip that warning for LTO, if the triples only differ in the vendor field (either in general, or for specific OSes where there are known multiple vendor fields used in the wild).

See llvm/llvm-project#122801 for an implementation of this - it should be ready to land after rerunning the CI for it.

@mstorsjo
Copy link
Owner

mstorsjo commented Jan 14, 2025

14 % for that sounds like a lot - that's not what I'm seeing, so I would think there's another factor playing in here too.
I just (re)tested this; on Ubuntu 24.04 x86_64, a plain 1 stage build of Clang using the host compiler (GCC 13) and linker with -DCMAKE_BUILD_TYPE=Release, one build with -DLLVM_LINK_LLVM_DYLIB=OFF and one with -DLLVM_LINK_LLVM_DYLIB=ON. Compiling sqlite3.c with time clang -c sqlite3.c -O3 -march=znver5 gives 10.403 seconds for dylib=off and 10.513 seconds for dylib=on (lowest user time out of 3 compiles), i.e. a 1% difference.

Building LLVM with GCC seems strange

Not sure what's strange with that? That's a fairly reasonable default stage1 on Linux using whatever the host toolchain is.

could you try Fuchsia Clang? My Clang is built with Clang+ThinLTO (which is almost free for LLVM_TOOL_LLVM_DRIVER_BUILD, but may still need to disable some non-driverized component builds) and statically linked to libc++ and compiler-rt, very similar to Fuchsia Clang.

I can try building with Clang and ThinLTO - it is plausible that there is a notable difference with ThinLTO.

This is off topic for this discussion here, but just for reference - I tried to dig into the actual performance for your testcase, compiling sqlite3.c (I dropped the -march=znver5, I doubt it makes much of a difference). I set up builds where Clang is built by both the host GCC and with a host Clang (18, so fairly new), and the Clang hosted builds with and without LTO, with and without PGO (trained specifically on compiling sqlite3.c, so it should be the best case) - all these configurations both with and without dylib builds.

I set up the whole build matrix for doing this on github actions, so that it should be reproducible if someone wants to tweak the test build setup: mstorsjo/llvm-project@gha-clang-perf
The measurements of the built toolchains are here: https://github.com/mstorsjo/llvm-project/actions/runs/12773172943/job/35610404093

I'll summarize it in a table (picking the minimum execution time for each benchmarked case):

nodylib dylib
GCC hosted 20.092 20.287
Clang hosted 20.083 19.888
Clang hosted, LTO 18.516 18.604
Clang hosted, PGO 14.981 14.946
Clang hosted, PGO+LTO 13.846 13.949

Overall, the slowdown due to using dylibs seems to be <1%, and in two of the build cases it even seems to be marginally faster. Nowhere near the mentioned 14% in any case.

@Andarwinux
Copy link
Contributor

I modified this workflow to add a fuchsia-clang benchmark, and got similar results to "Clang hosted, LTO, nodylib". But my local machine is indeed very close to 14% (fuchsia clang 14.094s - llvm-mingw nightly clang 16.296s). It seems that improvements mainly come from LTO and will be affected by hardware.
(The distro cachyos I'm using has clang that is dynamically linked and built with LTO+PGO, but the performance is close to fuchsia-clang which only with LTO but is statically linked, so I thought dynamic linking was the main factor).

I also modified the workflow to match my local build (building with fuchsia-clang, enabling LLVM_TOOL_LLVM_DRIVER_BUILD and disabling unnecessary components) and now the LTO build takes only 4 minutes longer.

https://github.com/Andarwinux/llvm-project/actions/runs/12792083533

@mstorsjo
Copy link
Owner

I modified this workflow to add a fuchsia-clang benchmark, and got similar results to "Clang hosted, LTO, nodylib". But my local machine is indeed very close to 14% (fuchsia clang 14.094s - llvm-mingw nightly clang 16.296s). It seems that improvements mainly come from LTO and will be affected by hardware. (The distro cachyos I'm using has clang that is dynamically linked and built with LTO+PGO, but the performance is close to fuchsia-clang which only with LTO but is statically linked, so I thought dynamic linking was the main factor).

Right, that explains the confusion. Yes, LTO and PGO give very large, undisputable speedups, while the dylib configuration slowdown is in the range of <1% (if it even is a slowdown at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants