Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLD LTO Failing to optimize and Crashing Compiler #313

Open
DanielDeLayo opened this issue Feb 3, 2025 · 4 comments
Open

LLD LTO Failing to optimize and Crashing Compiler #313

DanielDeLayo opened this issue Feb 3, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@DanielDeLayo
Copy link

Describe the bug

Compiling and Linking with LLD's LTO causes either 1. Debug (or just unexpected) output with ineffective optimization or 2. a compiler crash.

Expected behavior

  1. Compilation and Linking should print no extra debug information and optimize away function calls to empty functions.
  2. The Cilksan version to be compiled, linked, and optimized successfully.

Note: cilksort.out (instrumented with cilkprace) asserting almost immediately is correct behavior, at least as far as the compiler is concerned.

OpenCilk version

Built from source (with modifications):

  • opencilk-project: cilkprace af4ec7f
  • cheetah: 5a80d8a534e8d56a5e354997bbfea23257c1c36b
  • productivity-tools: cilkprace af31596

System information

  • OS: Linux Mint 20.1
  • CPU: Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz

Steps to reproduce (include relevant output)

  1. Clone with infrastructure to build from source
  2. Set git urls to the above projects and branches and pull
  3. in the build directoy, enable LLD as a subproject. We also build clang with LLD and split-dwarf, but it's unlikely to be load bearing. Recompile
  4. Compile with -fuse-ld=lld and -flto

Build command:
../build/bin/clang -fopencilk -fcilktool=cilkprace -g -O3 -DTIMING_COUNT=3 -fuse-ld=lld -flto -lm cilksort.c ktiming.c getoptions.c -o cilksort.out

Sample unexpected output:

<unknown>:0:0: 5 instructions in function
<unknown>:0:0: 1 instructions in function
cilksort.c:189:0: 272 instructions in function
cilksort.c:305:0: 554 instructions in function
cilksort.c:367:0: 391 instructions in function
cilksort.c:442:0: 1877 instructions in function
<unknown>:0:0: 5 instructions in function
<unknown>:0:0: 1 instructions in function
ktiming.c:41:0: 66 instructions in function
<unknown>:0:0: 5 instructions in function
<unknown>:0:0: 1 instructions in function
getoptions.c:9:0: 1523 instructions in function
cilksort.c:367:0: 112 instructions in function
cilksort.c:367:0: 109 instructions in function
cilksort.c:367:0: 109 instructions in function
cilksort.c:367:0: 109 instructions in function
cilksort.c:305:0: 112 instructions in function

Calling objdump to check for LTO
objdump -D cilksort.out

The code is littered with calls to __csi_after_store
15438: e8 93 6f ff ff callq c3d0 <__csi_after_store>

Which are prime targets for LTO to opimize away:

17388 000000000000c3d0 <__csi_after_store>:
17389     c3d0: c3                    retq
17390     c3d1: 66 66 66 66 66 66 2e  data16 data16 data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
17391     c3d8: 0f 1f 84 00 00 00 00
17392     c3df: 00
17393

Instrumenting the same code with cilksan instead gives us a compiler crash:

../build/bin/clang -fopencilk -fsanitize=cilk -mllvm -enable-static-race-detection=false -g -O3 -DTIMING_COUNT=3 -lm -g -O3 -DTIMING_COUNT=3 -fuse-ld=lld -flto cilksort.c ktiming.c getoptions.c -o cilksort.csan

cilksort backtrace.txt

Requested attached (bonus .txt extensions to appease github)
cilksort-d7b45f.sh.txt
getoptions-ff4d8b.c.txt
ktiming-2759e6.c.txt
cilksort-d7b45f.c.txt

Additionally, using LTO to compile and link cilksort without either cilktool gives the unexpected output and does not crash. I haven't verified if LTO is being applied properly.

Working example code

cilksort.c, ktiming.c, and getoptions.c can be found in the cilkprace subdirectory of cilktools cilkprace/bench_cheetah
There's also a makefile there, but the all target doesn't work. I'd recommend using make cilksort.out and make cilksort.csan instead of make.

Additional comments

Add any other comments about the issue here.

@DanielDeLayo DanielDeLayo added the bug Something isn't working label Feb 3, 2025
@neboat
Copy link
Collaborator

neboat commented Feb 5, 2025

Hey @DanielDeLayo, thanks for the issue report.

Here's what I found from looking into the issues you mentioned.

First, the way you're building the cilkpracer tool does not support LTO. The build system in the productivity-tools repo builds each Cilk tool to an ordinary static library, that is, an archive or ordinary object files. The -fcilktool= flag will link this static library with the Cilk program in the standard way. That standard linking does not support LTO, which is why you don't see any of the calls to trivial hooks get optimized away.

If you would like to use LTO, you will need to compile the cilkpracer tool, at least in part, to LLVM bitcode and then link that LLVM bitcode with your program.

  • You can try using the add_cilktools_bitcode CMake function to build part or all of the cilkpracer tool to bitcode. That function should produce a .bc file for your tool in a subdirectory of the build directory. I recommend using find on the build directory to find the exact path to that .bc file.
  • Once you have that .bc file, you can try linking it with your program. I recommend trying to use -mllvm -csi-tool-bitcode=/path/to/libcilkpracer.bc. You should not need to use LLD in order to use that flag. Alternatively, you can try linking your tool's LLVM bitcode file with your program using LLD.

Second, whatever causes the crash on your version of the compiler is fixed in the latest development branch, dev/19.x. I rebased your changes onto this version of the compiler and the corresponding version of cilktools. On that version, the command that was causing a crash succeeds without any issue.

Note that, if you choose to rebase your changes onto the dev/19.x branch, you will need to add your tool to the list of Cilk tools that LLD recognizes. See the changes to LTOBackend.cpp in this commit for an idea of the changes you might need to make: c53cd12#diff-f409934ba27ad86494f3012324e9a3995b56e0743609ded7a387ba62bbf5edb0. You should not need to make these changes if you use the flags -mllvm -csi-tool-bitcode= to link your tool's bitcode file.

I hope that helps. Let us know if you have more questions or run into more issues.

@DanielDeLayo
Copy link
Author

DanielDeLayo commented Feb 5, 2025

Thanks!

Good to know-- I guess I assumed the LTO flags would be passed through, especially since the original CSI paper stresses the importance of LTO. Would it be possible to add a warning if a cilktool and LTO are both enabled? It'll probably save someone time in the future.

We've swapped over to using add_cilktools_bitcode, basing our CMakeLists.txt on the one from cilksan. I also tried porting the other cilktool I'm working on, cilkiaf, with these same changes. For both, I'm generating and linking .bc files as described and the executable files have the empty functions elided. However, the bitcode optimization seems a bit overzealous and it's introducing crashes at runtime in both cases. I'll focus on the cilkprace ones and briefly mention the cilkiaf ones for completion. For cilkiaf, accessing the fed tables causes a crash when csirt.cpp is included in CILKIAF_BITCODE_SOURCE. Removing csirt.cpp fixes the fed table crash, but a static object's vector member gets an address of 0xe0, causing a segfault. Without the above bitcode argument, neither of these issues happen.

For Cilkprace, we get this: [Edit: the difference in build directories is cosmetic, ../build is symlinked to ../../../../build]

daniel@CSE236-bigboy:~/cilk/opencilk/cilktools/cilkprace/bench_cheetah$ make -B -j cilksort.out
../build/bin/clang -fopencilk -fcilktool=cilkprace -g -O3 -DTIMING_COUNT=3 -lm -march=native  -mllvm -csi-tool-bitcode=../../../../build/lib/clang/17/lib/x86_64-unknown-linux-gnu/libcilkprace.bc cilksort.c ktiming.c getoptions.c -o cilksort.out
daniel@CSE236-bigboy:~/cilk/opencilk/cilktools/cilkprace/bench_cheetah$ ./cilksort.out
cilksort.out: /home/daniel/cilk/opencilk/cheetah/runtime/local-hypertable.c:214: _Bool insert_hyperobject(hyper_table *, struct bucket): Assertion `b.key != KEY_EMPTY && b.key != KEY_DELETED' failed.
cilksort.out: /home/daniel/cilk/opencilk/cheetah/runtime/local-hypertable.c:214: _Bool insert_hyperobject(hyper_table *, struct bucket): Assertion `b.key != KEY_EMPTY && b.key != KEY_DELETED' failed.
cilksort.out: /home/daniel/cilk/opencilk/cheetah/runtime/local-hypertable.c:214: _Bool insert_hyperobject(hyper_table *, struct bucket): Assertion `b.key != KEY_EMPTY && b.key != KEY_DELETED' failed.
cilksort.out: /home/daniel/cilk/opencilk/cheetah/runtime/local-hypertable.c:214: _Bool insert_hyperobject(hyper_table *, struct bucket): Assertion `b.key != KEY_EMPTY && b.key != KEY_DELETED' failed.
Aborted (core dumped)

The cilkprace commit is 35406a

Is there something we're doing wrong with bitcode, or should we rebase to dev/19.x?

Thanks again for all the help,
Daniel

@neboat
Copy link
Collaborator

neboat commented Feb 11, 2025

Compared to the CSI paper, we removed the requirement to use LTO for the Cilktools because LTO isn't well supported across all the systems and use cases where we want OpenCilk to run. To work around the performance issues from lacking LTO, we added Cilktool-specific compiler support to avoid inserting instrumentation hooks that specific Cilktools don't need. Specifically, we added the getCSIOptionsForCilkscale() and related functions you've seen in the compiler for this purpose.

We have been working on an alternative to LTO for CSI and Cilktools. That alternative is essentially the -csi-bitcode-file flag you're using. But that flag is still quite new and relatively untested. Although we have been able to use it for some tools, I'm not surprised if there are some issues with the flag at this time.

I believe there are technical reasons why some parts of a Cilktool, such as csirt.cpp, cannot be part of the bitcode file. If I recall correctly. the original work on CSI also required a traditional library — for the CSI runtime — to be linked in with the instrumented program under test, even when using LTO.

Have you tried using both a bitcode file and a traditional library together for your tool? You should be able to compile your tool to a traditional library, plus part of your tool — with the hook implementations — to a bitcode file. You can then pass the bitcode file to the compiler, using the -csi-bitcode-file flag, and also link the static library to the program under test. That approach might give you the important optimizations on hooks while resolving some of the issues you're encountering, such as the segfaults.

I don't know why you're getting assertion failures from using your cilkprace tool with cilksort. I suggest trying to use the combined approach of both a bitcode file and a traditional library for your tool and seeing if that approach resolves the issues you're encountering.

Hope this helps. Let us know if you still encounter problems.

@DanielDeLayo
Copy link
Author

Thanks for the explanation and the recommendation. I've managed to solve the bug and approximately understand what was going on.

While separating the hook implementations into their own .cpp file, I had to take away the static keyword from the pointer we accessed our tool's instance with. At that point, I realized that static's internal visibility was likely why the compiler was optimizing so heavily. I removed it and included all of the source code in the add_cilktools_bitcode call in the makefile. The resulting binary both had the expected optimizations and did not segfault or otherwise error.

I removed it from the other tool as well and it worked as is. I also peaked back at the fed tables and, as expected, saw that they're static.
I'm in the process of testing these changes against cilkbench, but I think it's a pretty compelling explanation.

We can also write up a minimal example if it would be useful for you.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants