Update sycl_native_experimental branch #322

PietroGhg · 2024-01-30T10:26:17Z

Overview

Merges the latest commits from main into sycl_native_experimental.

nightly tartan job using Intel nightly release

[Tartan CI]: Add separate actions to nightly Tartan build to build portBLAS and portDNN

LLVM 18 already sets everything we should need in LLVMConfig.cmake, which is included already. We can just skip DetectLLVMMSVCCRT entirely.

LLVM commit 7b9d73c2f90c0ed8497339a16fc39785349d9610 removes this function, as with opaque pointers all pointers are equivalent. While we're at it, we don't need to do any bitcasts between pointer types in this pass.

Avoid ChooseMSVCCRT on LLVM 18+.

[compiler] Remove use of Type::getInt8PtrTy

This commit aims to simplify how we handle the management of different types of debug scope. Debug info is now attached to instructions when we close a range. We no longer store all of the ranges and process them at the end of the module. This keeps debug information better contained within the builder, and means we have to track less volatile data. The code should be simpler as a result, and hopefully easier to maintain. We also introduce a new concept to help manage debug info. In addition to the old 'line' range which is built in to SPIR-V, we introduce another: a 'lexical scope'. This will be used for the translation of the various DebugInfo extended instruction sets. The lexical scopes and line ranges do interact, in the same way that the DebugInfo instruction sets interact with the score spec: the DebugInfo instruction sets still rely on line number information provided by line ranges. A lexical scope is of no use without line information. We may define lexical scopes in one of two ways, in priority order: 1. The DebugInfo extended instruction sets generate them using dedicated instructions 2. We generate them on the fly when attaching debug info when we process line ranges Thus, when we close a line range or a lexical scope, we apply debug info to all instructions within the range. For the scope information, we take the lexical scope information, if set, and else we generate one on the fly.

…info-scopes [spirv-ll] Refactor how debug ranges/scopes are handled

* LLVM 18 moves clang::CodeGenOptions::VectorLibrary to llvm::driver::VectorLibrary. Use decltype to handle both. * LLVM 18 drops CallingConv::WebKit_JS. * LLVM 18 drops <llvm/Transforms/Vectorize.h> which we include needlessly. * LLVM 18 requires us to link in libLLVMFrontendDriver, libclangAPINotes, libclangBasic. * LLVM 18 adds a disjoint flag to or instructions which we need to account for in tests. * LLVM 18 moves <llvm/Support/Host.h> to <llvm/TargetParser/Host.h>. * LLVM 18 is able to infer that we could potentially end up with a subvector size of zero, in which case we would end up with a division by zero. A subvector size of zero would be a bug elsewhere in OCK, so add an assert that it is not zero.

In the refsi simulator, there are two memory regions: * "Main" memory, starting at 0x10000 * "Local" memory, starting at 0x10000000 When emitting the sections, we move the output cursor to location 0x10000000 before writing the sections for local memory. This has the effect of, in some unknown circumstances, resulting in the generated ELF file including padding in the file itself for 255MiB. Besides wasting memory, this also made the refsi simulator unable to load the binaries due to its 128MiB limit. This patch updates the linker script to be more explicit about the memory layout, which also has the added advantage of having the linker verify that the binary isn't too large. It also reverts 73c6af2, which was a workaround for this issue.

[refsi] Update Refsi memory specifications

More LLVM 18 fixups.

The SYCL CTS generates binaries larger than 1MiB, so this patch increases the limit.

Increase main memory size limit in memory map

This removes any difference in behaviour depending on whether or not you create the new target within a git directory. With `git apply` the patch files are applied relative to the current git directory, which means the patch files could silently be skipped.

This wasn't updated after we changed/refactored the compiler utility APIs and kernel ABI. The wrapper function produced by the `RefSiWrapperPass` does indeed match the kernel ABI expected by the RefSi HAL. Furthermore, the compiler is now better at generating parameter names and parameter attributes than it once was, so the LLVM IR should be more legible.

We were returning a newly allocated kernel handle each time a kernel was found, while never deleting them. While this is okay for a tutorial, provided we documented/explained what's going on, the code to better manage memory isn't too much to handle and shouldn't detract from the tutorial itself. Another benefit is that the updated tutorial code better follows the reference HAL implementation (though not exactly), hopefully meaning there's less dissonance between the two targets.

The previous behaviour would be that the `patch` commands would enter `-R` mode and the symlink command would fail noisily. It now provides a warning explaining what's happened: that the patch files weren't applied (assumed to be already applied) and that the symlink already exists, so doesn't need to be set.

The helper function was unnecessarily padding the size of already aligned buffers by the align amount. For example, if a buffer was of size 64 and was to be aligned to 8, it would increase the size to 72. This isn't incorrect but is unintuitive behaviour.

Passing relative paths to `--external-dir` would do strange things. I've observed it not creating all of the directories during the first step, and one of the post-gen hooks would raise an exception (possibly as a result of that). The simplest fix is to ensure the external directory is made absolute before working with it.

LLVM's own type legalization promotes floating point operations without truncation of intermediate results. We rely on that truncation, so run our own pass to legalize manually before LLVM's legalization runs.

…tion Manual type legalization.

[tutorial] Fix various issues in the HAL tutorial and new-target scripts

This commit extends spirv-ll to consume OpenCL.DebugInfo.100 instructions and translate them to meaningful LLVM debug info. Although the legacy DebugInfo instruction set is similar, there are differences, and we aren't seeing real-world binaries with this extended instruction set and so lack tests to confidently enable it. The translation process differs slightly from the translation of other extended instruction sets. There is a small core of 'root' instructions in the debug info hierarchy, and many more 'leaves' which don't meaningfully add debug information on their own. Thus the translator only processes the roots on demand as we visit them in program order. Those roots may then reference other instructions (roots or leaves) which we translate on the fly. We cache the translation of each instruction as, in practice, they are usually referenced multiple times. Generally, instructions are translated to `llvm::Metadata*`. The special instruction `DebugInfoNone` translates to `nullptr`: this is not an error; each instruction has to decide which operands may or may not be `nullptr`. This usually ends up depending on what the LLVM APIs accept - some accept `nullptr` as a valid value, others crash or assert. One thing to note about the `OpenCL.DebugInfo.100` instruction set is that there are numerous bugs all over the ecosystem: * Even recent versions of `spirv-val` complain about what should be valid SPIR-V binaries * `llvm-spirv` (and thus DPC++) contains known bugs that won't be fixed soon. We try to work around these as best as possible * Mixed instruction opcodes, explain below * Extra dummy operands which shouldn't be there * Forward references where none should be allowed * etc * `llvm-spirv` (and thus DPC++) support undocumented extensions of their own, such as the ability to translate more LLVM expression opcodes than the spec permits. The most egregious bug is that `llvm-spirv` has mixed up two instruction opcodes and thus generates binaries which a standards-compliant SPIR-V consumer will struggle with. To cope with this, we add a bitfield of 'workarounds' to the DebugInfoBuilder, which can toggle on and off behaviour. The only workaround added so far instructs the translator to, when faced with one of these two instructions, try and infer which one is intended. It does by inspecting the operands, using rules to establish whether the instruction opcode is correct or is swapped with the other one. This workaround is *enabled* by default, as it's still not fixed upstream. We can toggle this behaviour off later, perhaps.

…-dbg-full [spirv-ll] Fully support OpenCL.DebugInfo.100 instructions

This was left over from a previous implementation, but is no longer used.

[NFC] Remove unused denorm_support.

Clang emits these attributes by default when compiling C, C++, and OpenCL C. It does this through a default-on driver option which sets a default-off codegen option. Since we act as the driver, we weren't setting the codegen option on. This should enable more optimization opportunities.

[compiler] Emit 'noundef' attributes when compiling OpenCL C

Adding riscv testing to OCK repo

In SPIR-V, entry points can only be called from outside the module, and not from other functions inside the module. Given this, we can safely add 'noundef' parameter attributes to entry points with the "kernel" execution mode. Passing undefined or poison values to a kernel is undefined behaviour. We could perhaps extend this to other execution modes but we don't test those quite as well and they aren't as important to us right now. We'd ideally like to do this with *all* function parameters, which would better match the semantics of C, C++, OpenCL C, but as noted in the FIXME in the code, SPIR-V doesn't specify that passing an undefined value to a function call results in undefined behaviour, so we need to tread carefully. We might need to use `freeze` in more places to stop the propagation of undef/poison values around the module.

This change brings a lot of duplicated code into `cl_intel_unified_shared_memory_Test` (the base class of many tests). Specifically, it provides properties to check host and shared memory USM support, and helper functions to generate USM pointers. One of the design changes was adding a way to iterate through all available USM pointer types without having to copy-paste code. This is used to add shared allocations to many places alongside device and host allocations. This results in the tests doing more work, but this only causes a 10% reduction in the time taken for testing, which is about 300ms on my system.

…f-attrs [spirv-ll] Add noundef param attrs to kernel entry points

USM testing rework for better USM support

This commit extends the Uniform Value Analysis with a fourth kind of uniformity: "true" uniformity. This represents a value that is uniform on both active and inactive lanes. The old class of "uniform" has been renamed to "active" uniformity to clarify this. The analysis pass has been extended with a method to query whether a value is truly uniform. It processes this recursively on demand and caches the result. This is because the initial analysis run works from varying "roots" and marks all dependent values as varying - uniform values aren't handled at all. Rather than negatively affecting the performance of all Uniform Value Analysis runs, this on-demand method keeps costs the same except for users that need to query uniformity. The query is still conservative, but less so. This can be seen in the test changes, where some cases which were previously conservatively handled as possibly varying/active uniform are now seen as truly uniform.

…analysis [compiler] Improve analysis of 'true uniform' values

RISC-V supports these just fine.

[RISC-V] Report denormal support.

Remove debug code that snuck in

`compiler-utils` library has been split into `compiler-pipeline` and `compiler-binary-metadata` to allow use of compiler pipeline utilities without the binary metadata requirements. Both will be needed for `mux` targets.

…_utils_metadata Split compiler-utils into compiler-pipeline and compiler-binary-metadata

This commit lets LLVM know that the pointer to the packed argument structure may not be null, must not be undef/poison, and is dereferenceable. It also transfers `noundef` and `nonnull` attributes from the old parameters to the new loads from the argument struct. Those loads can take `!noundef` and `!nonnull` metadata. This should improve performance in certain cases, as this pass typically runs before the final O3 optimization pipeline and any extra information we can give LLVM should help.

[compiler] Set attributes on packed args and on loads from them

Fix missing "frees" on USM pointers

clc's driver::saveBinary and oclc's oclc::Driver::BuildProgram and oclc::Driver::WriteToFile have a result that allows them to indicate failure, but several I/O functions were not checked for errors. Additionally, driver::saveBinary would close stdin when it had not opened it, which is not its responsibility.

Improve error handling.

…lare Update findDbgDeclares for LLVM 18

MaryaSharf and others added 30 commits December 1, 2023 12:31

Add portBLAS and portDNN as separate actions to the

f928108

nightly tartan job using Intel nightly release

Merge pull request uxlfoundation#178 from MaryaSharf/marya/portBLAS

59aee4c

[Tartan CI]: Add separate actions to nightly Tartan build to build portBLAS and portDNN

Avoid ChooseMSVCCRT on LLVM 18+.

4ed567e

LLVM 18 already sets everything we should need in LLVMConfig.cmake, which is included already. We can just skip DetectLLVMMSVCCRT entirely.

[compiler] Remove use of Type::getInt8PtrTy

912e763

LLVM commit 7b9d73c2f90c0ed8497339a16fc39785349d9610 removes this function, as with opaque pointers all pointers are equivalent. While we're at it, we don't need to do any bitcasts between pointer types in this pass.

Merge pull request uxlfoundation#244 from hvdijk/llvm18-msvc

0b13d59

Avoid ChooseMSVCCRT on LLVM 18+.

Merge pull request uxlfoundation#243 from frasercrmck/remove-int8ptr

343a428

[compiler] Remove use of Type::getInt8PtrTy

Merge pull request uxlfoundation#235 from frasercrmck/spirv-ll-debug-…

1fde0b3

…info-scopes [spirv-ll] Refactor how debug ranges/scopes are handled

Merge pull request uxlfoundation#237 from RossBrunton/linkerfile

6d697de

[refsi] Update Refsi memory specifications

Merge pull request uxlfoundation#245 from hvdijk/llvm18

f8ff552

More LLVM 18 fixups.

Increase main memory size limit in memory map

94fbd3c

The SYCL CTS generates binaries larger than 1MiB, so this patch increases the limit.

Merge pull request uxlfoundation#247 from RossBrunton/sizeincrease

d0f09c8

Increase main memory size limit in memory map

[tutorial] Update patch line numbers to account for license headers

2c2ad21

[tutorial] Fix HAL tutorial

64bc9eb

[tutorial] Fix a couple of typos in the new-mux-target tutorial

fd3b422

[tutorial] Fix mismatched namespaces

fc70087

[tutorial] Fix a typo

6b677bd

Manual type legalization.

b3ad276

LLVM's own type legalization promotes floating point operations without truncation of intermediate results. We rely on that truncation, so run our own pass to legalize manually before LLVM's legalization runs.

Merge pull request uxlfoundation#236 from hvdijk/manual-type-legaliza…

379f43d

…tion Manual type legalization.

Merge pull request uxlfoundation#248 from frasercrmck/fix-hal-tutorial

2ee7cd0

[tutorial] Fix various issues in the HAL tutorial and new-target scripts

Merge pull request uxlfoundation#246 from frasercrmck/spirv-ll-opencl…

050d91a

…-dbg-full [spirv-ll] Fully support OpenCL.DebugInfo.100 instructions

hvdijk and others added 28 commits January 22, 2024 14:24

[NFC] Remove unused denorm_support.

b0468a1

This was left over from a previous implementation, but is no longer used.

Merge pull request uxlfoundation#308 from hvdijk/remove-denorm-support

58b063d

[NFC] Remove unused denorm_support.

Merge pull request uxlfoundation#309 from frasercrmck/opencl-c-noundef

d26a496

[compiler] Emit 'noundef' attributes when compiling OpenCL C

Adding riscv testing to OCK repo

4ac6ab4

Merge pull request uxlfoundation#299 from MaryaSharf/marya/add_riscv

6ac3ad7

Adding riscv testing to OCK repo

Merge pull request uxlfoundation#310 from frasercrmck/spirv-ll-nounde…

26a67bd

…f-attrs [spirv-ll] Add noundef param attrs to kernel entry points

Merge pull request uxlfoundation#298 from RossBrunton/usmtestup

9a111ee

USM testing rework for better USM support

Merge pull request uxlfoundation#307 from frasercrmck/better-uniform-…

b12be1f

…analysis [compiler] Improve analysis of 'true uniform' values

[RISC-V] Report denormal support.

a43516b

RISC-V supports these just fine.

Remove debug code that snuck in

755f862

Merge pull request uxlfoundation#312 from hvdijk/riscv-denorm

5d530be

[RISC-V] Report denormal support.

Merge pull request uxlfoundation#313 from RossBrunton/removedebug

43961e7

Remove debug code that snuck in

Merge pull request uxlfoundation#314 from coldav/colin/split_compiler…

3b9b883

…_utils_metadata Split compiler-utils into compiler-pipeline and compiler-binary-metadata

Fix missing "frees" on USM pointers

e1314ae

Merge pull request uxlfoundation#316 from frasercrmck/kernel-args-attrs

6efc9b8

[compiler] Set attributes on packed args and on loads from them

Merge pull request uxlfoundation#317 from RossBrunton/fixfree

60302e5

Fix missing "frees" on USM pointers

Update findDbgDeclare for LLVM 18

af4e37d

Merge pull request uxlfoundation#319 from hvdijk/error-handling

1dda1f5

Improve error handling.

Merge pull request uxlfoundation#320 from PietroGhg/pietro/finddbgdec…

49722f9

…lare Update findDbgDeclares for LLVM 18

Merge branch 'main' into pietro/update_30_jan

639d28e

compiler-utils -> compiler-pipeline

542ee12

coldav approved these changes Jan 30, 2024

View reviewed changes

PietroGhg merged commit 558b76c into uxlfoundation:sycl_native_experimental Jan 30, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update sycl_native_experimental branch #322

Update sycl_native_experimental branch #322

PietroGhg commented Jan 30, 2024

Update sycl_native_experimental branch #322

Update sycl_native_experimental branch #322

Conversation

PietroGhg commented Jan 30, 2024

Overview