-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update sycl_native_experimental branch #322
Merged
PietroGhg
merged 175 commits into
uxlfoundation:sycl_native_experimental
from
PietroGhg:pietro/update_30_jan
Jan 30, 2024
Merged
Update sycl_native_experimental branch #322
PietroGhg
merged 175 commits into
uxlfoundation:sycl_native_experimental
from
PietroGhg:pietro/update_30_jan
Jan 30, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nightly tartan job using Intel nightly release
[Tartan CI]: Add separate actions to nightly Tartan build to build portBLAS and portDNN
LLVM 18 already sets everything we should need in LLVMConfig.cmake, which is included already. We can just skip DetectLLVMMSVCCRT entirely.
LLVM commit 7b9d73c2f90c0ed8497339a16fc39785349d9610 removes this function, as with opaque pointers all pointers are equivalent. While we're at it, we don't need to do any bitcasts between pointer types in this pass.
Avoid ChooseMSVCCRT on LLVM 18+.
[compiler] Remove use of Type::getInt8PtrTy
This commit aims to simplify how we handle the management of different types of debug scope. Debug info is now attached to instructions when we close a range. We no longer store all of the ranges and process them at the end of the module. This keeps debug information better contained within the builder, and means we have to track less volatile data. The code should be simpler as a result, and hopefully easier to maintain. We also introduce a new concept to help manage debug info. In addition to the old 'line' range which is built in to SPIR-V, we introduce another: a 'lexical scope'. This will be used for the translation of the various DebugInfo extended instruction sets. The lexical scopes and line ranges do interact, in the same way that the DebugInfo instruction sets interact with the score spec: the DebugInfo instruction sets still rely on line number information provided by line ranges. A lexical scope is of no use without line information. We may define lexical scopes in one of two ways, in priority order: 1. The DebugInfo extended instruction sets generate them using dedicated instructions 2. We generate them on the fly when attaching debug info when we process line ranges Thus, when we close a line range or a lexical scope, we apply debug info to all instructions within the range. For the scope information, we take the lexical scope information, if set, and else we generate one on the fly.
…info-scopes [spirv-ll] Refactor how debug ranges/scopes are handled
* LLVM 18 moves clang::CodeGenOptions::VectorLibrary to llvm::driver::VectorLibrary. Use decltype to handle both. * LLVM 18 drops CallingConv::WebKit_JS. * LLVM 18 drops <llvm/Transforms/Vectorize.h> which we include needlessly. * LLVM 18 requires us to link in libLLVMFrontendDriver, libclangAPINotes, libclangBasic. * LLVM 18 adds a disjoint flag to or instructions which we need to account for in tests. * LLVM 18 moves <llvm/Support/Host.h> to <llvm/TargetParser/Host.h>. * LLVM 18 is able to infer that we could potentially end up with a subvector size of zero, in which case we would end up with a division by zero. A subvector size of zero would be a bug elsewhere in OCK, so add an assert that it is not zero.
In the refsi simulator, there are two memory regions: * "Main" memory, starting at 0x10000 * "Local" memory, starting at 0x10000000 When emitting the sections, we move the output cursor to location 0x10000000 before writing the sections for local memory. This has the effect of, in some unknown circumstances, resulting in the generated ELF file including padding in the file itself for 255MiB. Besides wasting memory, this also made the refsi simulator unable to load the binaries due to its 128MiB limit. This patch updates the linker script to be more explicit about the memory layout, which also has the added advantage of having the linker verify that the binary isn't too large. It also reverts 73c6af2, which was a workaround for this issue.
[refsi] Update Refsi memory specifications
More LLVM 18 fixups.
The SYCL CTS generates binaries larger than 1MiB, so this patch increases the limit.
Increase main memory size limit in memory map
This removes any difference in behaviour depending on whether or not you create the new target within a git directory. With `git apply` the patch files are applied relative to the current git directory, which means the patch files could silently be skipped.
This wasn't updated after we changed/refactored the compiler utility APIs and kernel ABI. The wrapper function produced by the `RefSiWrapperPass` does indeed match the kernel ABI expected by the RefSi HAL. Furthermore, the compiler is now better at generating parameter names and parameter attributes than it once was, so the LLVM IR should be more legible.
We were returning a newly allocated kernel handle each time a kernel was found, while never deleting them. While this is okay for a tutorial, provided we documented/explained what's going on, the code to better manage memory isn't too much to handle and shouldn't detract from the tutorial itself. Another benefit is that the updated tutorial code better follows the reference HAL implementation (though not exactly), hopefully meaning there's less dissonance between the two targets.
The previous behaviour would be that the `patch` commands would enter `-R` mode and the symlink command would fail noisily. It now provides a warning explaining what's happened: that the patch files weren't applied (assumed to be already applied) and that the symlink already exists, so doesn't need to be set.
The helper function was unnecessarily padding the size of already aligned buffers by the align amount. For example, if a buffer was of size 64 and was to be aligned to 8, it would increase the size to 72. This isn't incorrect but is unintuitive behaviour.
Passing relative paths to `--external-dir` would do strange things. I've observed it not creating all of the directories during the first step, and one of the post-gen hooks would raise an exception (possibly as a result of that). The simplest fix is to ensure the external directory is made absolute before working with it.
LLVM's own type legalization promotes floating point operations without truncation of intermediate results. We rely on that truncation, so run our own pass to legalize manually before LLVM's legalization runs.
…tion Manual type legalization.
[tutorial] Fix various issues in the HAL tutorial and new-target scripts
This commit extends spirv-ll to consume OpenCL.DebugInfo.100 instructions and translate them to meaningful LLVM debug info. Although the legacy DebugInfo instruction set is similar, there are differences, and we aren't seeing real-world binaries with this extended instruction set and so lack tests to confidently enable it. The translation process differs slightly from the translation of other extended instruction sets. There is a small core of 'root' instructions in the debug info hierarchy, and many more 'leaves' which don't meaningfully add debug information on their own. Thus the translator only processes the roots on demand as we visit them in program order. Those roots may then reference other instructions (roots or leaves) which we translate on the fly. We cache the translation of each instruction as, in practice, they are usually referenced multiple times. Generally, instructions are translated to `llvm::Metadata*`. The special instruction `DebugInfoNone` translates to `nullptr`: this is not an error; each instruction has to decide which operands may or may not be `nullptr`. This usually ends up depending on what the LLVM APIs accept - some accept `nullptr` as a valid value, others crash or assert. One thing to note about the `OpenCL.DebugInfo.100` instruction set is that there are numerous bugs all over the ecosystem: * Even recent versions of `spirv-val` complain about what should be valid SPIR-V binaries * `llvm-spirv` (and thus DPC++) contains known bugs that won't be fixed soon. We try to work around these as best as possible * Mixed instruction opcodes, explain below * Extra dummy operands which shouldn't be there * Forward references where none should be allowed * etc * `llvm-spirv` (and thus DPC++) support undocumented extensions of their own, such as the ability to translate more LLVM expression opcodes than the spec permits. The most egregious bug is that `llvm-spirv` has mixed up two instruction opcodes and thus generates binaries which a standards-compliant SPIR-V consumer will struggle with. To cope with this, we add a bitfield of 'workarounds' to the DebugInfoBuilder, which can toggle on and off behaviour. The only workaround added so far instructs the translator to, when faced with one of these two instructions, try and infer which one is intended. It does by inspecting the operands, using rules to establish whether the instruction opcode is correct or is swapped with the other one. This workaround is *enabled* by default, as it's still not fixed upstream. We can toggle this behaviour off later, perhaps.
…-dbg-full [spirv-ll] Fully support OpenCL.DebugInfo.100 instructions
This was left over from a previous implementation, but is no longer used.
[NFC] Remove unused denorm_support.
Clang emits these attributes by default when compiling C, C++, and OpenCL C. It does this through a default-on driver option which sets a default-off codegen option. Since we act as the driver, we weren't setting the codegen option on. This should enable more optimization opportunities.
[compiler] Emit 'noundef' attributes when compiling OpenCL C
Adding riscv testing to OCK repo
In SPIR-V, entry points can only be called from outside the module, and not from other functions inside the module. Given this, we can safely add 'noundef' parameter attributes to entry points with the "kernel" execution mode. Passing undefined or poison values to a kernel is undefined behaviour. We could perhaps extend this to other execution modes but we don't test those quite as well and they aren't as important to us right now. We'd ideally like to do this with *all* function parameters, which would better match the semantics of C, C++, OpenCL C, but as noted in the FIXME in the code, SPIR-V doesn't specify that passing an undefined value to a function call results in undefined behaviour, so we need to tread carefully. We might need to use `freeze` in more places to stop the propagation of undef/poison values around the module.
This change brings a lot of duplicated code into `cl_intel_unified_shared_memory_Test` (the base class of many tests). Specifically, it provides properties to check host and shared memory USM support, and helper functions to generate USM pointers. One of the design changes was adding a way to iterate through all available USM pointer types without having to copy-paste code. This is used to add shared allocations to many places alongside device and host allocations. This results in the tests doing more work, but this only causes a 10% reduction in the time taken for testing, which is about 300ms on my system.
…f-attrs [spirv-ll] Add noundef param attrs to kernel entry points
USM testing rework for better USM support
This commit extends the Uniform Value Analysis with a fourth kind of uniformity: "true" uniformity. This represents a value that is uniform on both active and inactive lanes. The old class of "uniform" has been renamed to "active" uniformity to clarify this. The analysis pass has been extended with a method to query whether a value is truly uniform. It processes this recursively on demand and caches the result. This is because the initial analysis run works from varying "roots" and marks all dependent values as varying - uniform values aren't handled at all. Rather than negatively affecting the performance of all Uniform Value Analysis runs, this on-demand method keeps costs the same except for users that need to query uniformity. The query is still conservative, but less so. This can be seen in the test changes, where some cases which were previously conservatively handled as possibly varying/active uniform are now seen as truly uniform.
…analysis [compiler] Improve analysis of 'true uniform' values
RISC-V supports these just fine.
[RISC-V] Report denormal support.
Remove debug code that snuck in
`compiler-utils` library has been split into `compiler-pipeline` and `compiler-binary-metadata` to allow use of compiler pipeline utilities without the binary metadata requirements. Both will be needed for `mux` targets.
…_utils_metadata Split compiler-utils into compiler-pipeline and compiler-binary-metadata
This commit lets LLVM know that the pointer to the packed argument structure may not be null, must not be undef/poison, and is dereferenceable. It also transfers `noundef` and `nonnull` attributes from the old parameters to the new loads from the argument struct. Those loads can take `!noundef` and `!nonnull` metadata. This should improve performance in certain cases, as this pass typically runs before the final O3 optimization pipeline and any extra information we can give LLVM should help.
[compiler] Set attributes on packed args and on loads from them
Fix missing "frees" on USM pointers
clc's driver::saveBinary and oclc's oclc::Driver::BuildProgram and oclc::Driver::WriteToFile have a result that allows them to indicate failure, but several I/O functions were not checked for errors. Additionally, driver::saveBinary would close stdin when it had not opened it, which is not its responsibility.
Improve error handling.
…lare Update findDbgDeclares for LLVM 18
coldav
approved these changes
Jan 30, 2024
PietroGhg
merged commit Jan 30, 2024
558b76c
into
uxlfoundation:sycl_native_experimental
4 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Merges the latest commits from
main
intosycl_native_experimental
.