-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't get different optimisation levels to work in c2m #339
Comments
I tried the test. It seems that difference exists but small:
on bbv branch:
Imho, the test is call bound and calls are actually not cheap in MIR as they are done through thunks. Thunk on x86-64 is additional pair of insns
The optimization options are working as supposed. For me, the interesting result is that |
Thx so much for looking at that - I had large timing differences in clang (between O0 through to O3) and didn't check in the detail you have as I assumed it might be similar - good to know where the optimisation is happening - apparently clang's llvm output is post optimisation so I think I just assumed the same. I'm on macos aarch64 (M1) if that's at all relevant and for what it's worth the MIR version is a little faster than my QBE version. You say calls are expensive - I'm assuming you are a ruby enthusiast? My target is a functional style language - think Smalltalk with multi-dispatch (mainly static but with unions it needs a bit of dynamic) so pretty much everything is a function call, (i.e. what would be a method in Smalltalk) including for loops and if then else etc. I have the inference, REPL, and multi-dispatch working (in Python) and a noddy AST interpreter. I haven't really thought through how the code gen should work but I am anticipating dispatch to C-ABI compliant functions as the norm and using inlining in the front end quite a bit. This is something for me to think more about I suspect. |
I guess that Clang and GCC can inline abs function (although it is a library function) calls used a lot in this program. That is why there is the large time difference for them. MIR inlining is quite rudimentary and does not depend on optimization level. There are a lot of ways to improve it, e.g. inlining in hot regions like loops. Currently MIR inlining is done on the 1st calls until some threshold achieved. Good inlining is quite complicated and requires a big infrastructure which is absent in MIR. I think about implementing calls w/o thunks for some cases. But it requires a considerable work. Now you can call one function directly by using
I'd not say that. I use myself Ruby very rare although I like some aspects of the languages. But I found Ruby performance improvement as the most hard problem I saw. Therefore I did some research work on improving Ruby performance for last 6 years. But it seems, Shopify YJIT is progressing fine. It uses more pragmatic approach starting with one target Ruby specific machine code generation and now the Shopify team is introducing IR to do optimizations. I started from opposite direction by creating general purpose multi-target MIR JIT engine and tried to use it for CRuby. I am too late with this approach and man power I can invest into Ruby JIT is tiny in comparison of Shopify team resources. So recently I decide not to work on Ruby anymore. |
One of the interesting things for me about using an IR is the case where more structure can be expressed in a high level language that can be exploited in code gen. Haskell I believe does this sort of thing. A trivial example would be
Where the types are: Thus you can image a composable language at the frontend but with an efficient looping implementation. How hard would it be to translate non-strict SSA style into strict SSA style - this would make the frontend's job easier and I imagine might be generally applicable. This is one of the things I like about the QBE IR as it makes the distance between the higher level front end language and the IR shorter. Some other problems I (personally) need to solve are:
Some of those probably need support from the backend, and some can be done in a library. When the user has profiling info they can add inline hints (my convention is to suffix function names with _ to indicate laziness or suggest inlining) etc. For example the user can change add to add_ and the frontend can inline the IR or indicate to the backend that this specific call should be inlined.
Btw do you have a debugger for the MIR interpreter? |
Besides using existing JIT compilers, there are a lot of work to do to implement a good perfromance JIT. The more dynamic language the more work to do. Inlining higher order functions could be considered such work too. Unfortunately, it is not easy to implement a general pruprose JIT supporting most of the features. QBE is interesting but according to my benchmarks it is slower and generates slower code (about 20% slower than MIR). Although the bigger range of simplicity/code generation performance spectrum for existing JITs, the better it is for JIT developers.
It is possible to switch on interpreter insn tracing by using macro I finally thinking to start working on debugging support of |
Hi,
This may well be a really daft question but I thought I'd ask. If I do:
or similarly with the -S option, the output files are identical. And if I run the program:
It runs at the same speed. Is it my program (below) or am I doing something wrong in using c2m etc?
Many thx,
knight_mir.c is:
The text was updated successfully, but these errors were encountered: