Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sycl_native_experimental branch #322

Merged

Conversation

PietroGhg
Copy link
Collaborator

Overview

Merges the latest commits from main into sycl_native_experimental.

MaryaSharf and others added 30 commits December 1, 2023 12:31
nightly tartan job using Intel nightly release
[Tartan CI]: Add separate actions to nightly Tartan build to build portBLAS and portDNN
LLVM 18 already sets everything we should need in LLVMConfig.cmake,
which is included already. We can just skip DetectLLVMMSVCCRT entirely.
LLVM commit 7b9d73c2f90c0ed8497339a16fc39785349d9610 removes this
function, as with opaque pointers all pointers are equivalent.

While we're at it, we don't need to do any bitcasts between pointer
types in this pass.
[compiler] Remove use of Type::getInt8PtrTy
This commit aims to simplify how we handle the management of different
types of debug scope.

Debug info is now attached to instructions when we close a range. We no
longer store all of the ranges and process them at the end of the
module. This keeps debug information better contained within the
builder, and means we have to track less volatile data. The code should
be simpler as a result, and hopefully easier to maintain.

We also introduce a new concept to help manage debug info. In addition
to the old 'line' range which is built in to SPIR-V, we introduce
another: a 'lexical scope'. This will be used for the translation of the
various DebugInfo extended instruction sets.

The lexical scopes and line ranges do interact, in the same way that the
DebugInfo instruction sets interact with the score spec: the DebugInfo
instruction sets still rely on line number information provided by line
ranges. A lexical scope is of no use without line information. We may
define lexical scopes in one of two ways, in priority order:

1. The DebugInfo extended instruction sets generate them using dedicated
   instructions
2. We generate them on the fly when attaching debug info when we process
   line ranges

Thus, when we close a line range or a lexical scope, we apply debug info
to all instructions within the range. For the scope information, we take
the lexical scope information, if set, and else we generate one on the
fly.
…info-scopes

[spirv-ll] Refactor how debug ranges/scopes are handled
* LLVM 18 moves clang::CodeGenOptions::VectorLibrary to
  llvm::driver::VectorLibrary. Use decltype to handle both.
* LLVM 18 drops CallingConv::WebKit_JS.
* LLVM 18 drops <llvm/Transforms/Vectorize.h> which we include
  needlessly.
* LLVM 18 requires us to link in libLLVMFrontendDriver,
  libclangAPINotes, libclangBasic.
* LLVM 18 adds a disjoint flag to or instructions which we need to
  account for in tests.
* LLVM 18 moves <llvm/Support/Host.h> to <llvm/TargetParser/Host.h>.
* LLVM 18 is able to infer that we could potentially end up with a
  subvector size of zero, in which case we would end up with a division
  by zero. A subvector size of zero would be a bug elsewhere in OCK, so
  add an assert that it is not zero.
In the refsi simulator, there are two memory regions:
* "Main" memory, starting at 0x10000
* "Local" memory, starting at 0x10000000

When emitting the sections, we move the output cursor to location
0x10000000 before writing the sections for local memory. This has
the effect of, in some unknown circumstances, resulting in the
generated ELF file including padding in the file itself for 255MiB.
Besides wasting memory, this also made the refsi simulator unable
to load the binaries due to its 128MiB limit.

This patch updates the linker script to be more explicit about the
memory layout, which also has the added advantage of having the
linker verify that the binary isn't too large.

It also reverts 73c6af2, which was a workaround for this issue.
[refsi] Update Refsi memory specifications
The SYCL CTS generates binaries larger than 1MiB, so this patch
increases the limit.
Increase main memory size limit in memory map
This removes any difference in behaviour depending on whether or not you
create the new target within a git directory. With `git apply` the patch
files are applied relative to the current git directory, which means the
patch files could silently be skipped.
This wasn't updated after we changed/refactored the compiler utility
APIs and kernel ABI.

The wrapper function produced by the `RefSiWrapperPass` does indeed
match the kernel ABI expected by the RefSi HAL.

Furthermore, the compiler is now better at generating parameter names
and parameter attributes than it once was, so the LLVM IR should be more
legible.
We were returning a newly allocated kernel handle each time a kernel was
found, while never deleting them.

While this is okay for a tutorial, provided we documented/explained
what's going on, the code to better manage memory isn't too much to
handle and shouldn't detract from the tutorial itself.

Another benefit is that the updated tutorial code better follows the
reference HAL implementation (though not exactly), hopefully meaning
there's less dissonance between the two targets.
The previous behaviour would be that the `patch` commands would enter
`-R` mode and the symlink command would fail noisily.

It now provides a warning explaining what's happened: that the patch
files weren't applied (assumed to be already applied) and that the
symlink already exists, so doesn't need to be set.
The helper function was unnecessarily padding the size of already
aligned buffers by the align amount. For example, if a buffer was of
size 64 and was to be aligned to 8, it would increase the size to 72.

This isn't incorrect but is unintuitive behaviour.
Passing relative paths to `--external-dir` would do strange things.

I've observed it not creating all of the directories during the first
step, and one of the post-gen hooks would raise an exception (possibly
as a result of that).

The simplest fix is to ensure the external directory is made absolute
before working with it.
LLVM's own type legalization promotes floating point operations without
truncation of intermediate results. We rely on that truncation, so run
our own pass to legalize manually before LLVM's legalization runs.
[tutorial] Fix various issues in the HAL tutorial and new-target scripts
This commit extends spirv-ll to consume OpenCL.DebugInfo.100
instructions and translate them to meaningful LLVM debug info.

Although the legacy DebugInfo instruction set is similar, there are
differences, and we aren't seeing real-world binaries with this extended
instruction set and so lack tests to confidently enable it.

The translation process differs slightly from the translation of other
extended instruction sets. There is a small core of 'root' instructions
in the debug info hierarchy, and many more 'leaves' which don't
meaningfully add debug information on their own. Thus the translator
only processes the roots on demand as we visit them in program order.
Those roots may then reference other instructions (roots or leaves)
which we translate on the fly. We cache the translation of each
instruction as, in practice, they are usually referenced multiple times.

Generally, instructions are translated to `llvm::Metadata*`. The special
instruction `DebugInfoNone` translates to `nullptr`: this is not an
error; each instruction has to decide which operands may or may not be
`nullptr`. This usually ends up depending on what the LLVM APIs accept -
some accept `nullptr` as a valid value, others crash or assert.

One thing to note about the `OpenCL.DebugInfo.100` instruction set is
that there are numerous bugs all over the ecosystem:

* Even recent versions of `spirv-val` complain about what should be
  valid SPIR-V binaries
* `llvm-spirv` (and thus DPC++) contains known bugs that won't be fixed
  soon. We try to work around these as best as possible
  * Mixed instruction opcodes, explain below
  * Extra dummy operands which shouldn't be there
  * Forward references where none should be allowed
  * etc
* `llvm-spirv` (and thus DPC++) support undocumented extensions of their
  own, such as the ability to translate more LLVM expression opcodes
  than the spec permits.

The most egregious bug is that `llvm-spirv` has mixed up two instruction
opcodes and thus generates binaries which a standards-compliant SPIR-V
consumer will struggle with. To cope with this, we add a bitfield of
'workarounds' to the DebugInfoBuilder, which can toggle on and off
behaviour.

The only workaround added so far instructs the translator to, when faced
with one of these two instructions, try and infer which one is intended.
It does by inspecting the operands, using rules to establish whether the
instruction opcode is correct or is swapped with the other one. This
workaround is *enabled* by default, as it's still not fixed upstream. We
can toggle this behaviour off later, perhaps.
…-dbg-full

[spirv-ll] Fully support OpenCL.DebugInfo.100 instructions
hvdijk and others added 28 commits January 22, 2024 14:24
This was left over from a previous implementation, but is no longer
used.
Clang emits these attributes by default when compiling C, C++, and
OpenCL C. It does this through a default-on driver option which sets a
default-off codegen option. Since we act as the driver, we weren't
setting the codegen option on. This should enable more optimization
opportunities.
[compiler] Emit 'noundef' attributes when compiling OpenCL C
In SPIR-V, entry points can only be called from outside the module, and
not from other functions inside the module.

Given this, we can safely add 'noundef' parameter attributes to entry
points with the "kernel" execution mode. Passing undefined or poison
values to a kernel is undefined behaviour. We could perhaps extend this
to other execution modes but we don't test those quite as well and they
aren't as important to us right now.

We'd ideally like to do this with *all* function parameters, which would
better match the semantics of C, C++, OpenCL C, but as noted in the
FIXME in the code, SPIR-V doesn't specify that passing an undefined
value to a function call results in undefined behaviour, so we need to
tread carefully. We might need to use `freeze` in more places to stop
the propagation of undef/poison values around the module.
This change brings a lot of duplicated code into
`cl_intel_unified_shared_memory_Test` (the base class of many
tests). Specifically, it provides properties to check host and
shared memory USM support, and helper functions to generate USM
pointers.

One of the design changes was adding a way to iterate through all
available USM pointer types without having to copy-paste code. This
is used to add shared allocations to many places alongside device
and host allocations.

This results in the tests doing more work, but this only causes a
10% reduction in the time taken for testing, which is about 300ms
on my system.
…f-attrs

[spirv-ll] Add noundef param attrs to kernel entry points
USM testing rework for better USM support
This commit extends the Uniform Value Analysis with a fourth kind of
uniformity: "true" uniformity. This represents a value that is uniform
on both active and inactive lanes. The old class of "uniform" has been
renamed to "active" uniformity to clarify this.

The analysis pass has been extended with a method to query whether a
value is truly uniform. It processes this recursively on demand and
caches the result. This is because the initial analysis run works from
varying "roots" and marks all dependent values as varying - uniform
values aren't handled at all. Rather than negatively affecting the
performance of all Uniform Value Analysis runs, this on-demand method
keeps costs the same except for users that need to query uniformity.

The query is still conservative, but less so. This can be seen in the
test changes, where some cases which were previously conservatively
handled as possibly varying/active uniform are now seen as truly
uniform.
…analysis

[compiler] Improve analysis of 'true uniform' values
RISC-V supports these just fine.
`compiler-utils` library has been split into `compiler-pipeline` and
`compiler-binary-metadata` to  allow use of compiler pipeline utilities
without the binary metadata requirements. Both will be needed for `mux`
targets.
…_utils_metadata

Split compiler-utils into compiler-pipeline and compiler-binary-metadata
This commit lets LLVM know that the pointer to the packed argument
structure may not be null, must not be undef/poison, and is
dereferenceable.

It also transfers `noundef` and `nonnull` attributes from the old
parameters to the new loads from the argument struct. Those loads can
take `!noundef` and `!nonnull` metadata.

This should improve performance in certain cases, as this pass typically
runs before the final O3 optimization pipeline and any extra information
we can give LLVM should help.
[compiler] Set attributes on packed args and on loads from them
clc's driver::saveBinary and oclc's oclc::Driver::BuildProgram and
oclc::Driver::WriteToFile have a result that allows them to indicate
failure, but several I/O functions were not checked for errors.
Additionally, driver::saveBinary would close stdin when it had not
opened it, which is not its responsibility.
@PietroGhg PietroGhg merged commit 558b76c into uxlfoundation:sycl_native_experimental Jan 30, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants