Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with ce7c828e (Aug 30) (16) #369

Open
wants to merge 92 commits into
base: bump_to_fc110202
Choose a base branch
from

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

philnik777 and others added 30 commits August 29, 2024 17:05
They're never used in `constexpr` functions, so we can simply use
`std::isnan` and `std::isfinite` instead.
- Use formatv() and raw string literals to simplify emission code.
- Use range based for loops and structured bindings to simplify loops.
- Use const Pointers to Records.
- Rename `ComputeFixedEncoding` to `ComputeTypeSignature` to reflect
  what the function actually does, cnd change it to return a vector.
- Use reverse() and range based for loop to pack 8 nibbles into 32-bits.
- Rename some variables to follow LLVM coding standards.
- For function memory effects, print human readable effects in comment.
There were only codegen tests for the fadd vector case,
so round out the test coverage for the scalar cases
and all the other operations.
- The subtest, if enabled correctly, will fail with assert in Debug
  builds and validation is disabled in Release builds.
- Hence deleting the test to fix test failures in CI.
…03702)

Use LLSC or cmpxchg in the same cases as for the unsupported
integer operations. This required some fixups to the LLSC
implementatation to deal with the fp128 case.

The comment about floating-point exceptions was wrong,
because floating-point exceptions are not really exceptions at all.
fneg/fabs are not supposed to canonicalize nans. Promoting to f32 will
go through an fp_extend which will canonicalize. The generic Promote
handler needs to be removed from LegalizeDAG.

We need to use integer bit manip to clear the bit instead.

Unfortunately, this is going through the stack due to i16 not being a
legal type. Fixing that will require custom legalization or some other
generic SelectionDAG change.
These are getting different output on some build hosts for some reason.
The stack offsets of temporaries are different.
SCF loops now can operate on integer-typed IV, thus I'm changing the
loop unroller correspondingly.
- Eliminate comma at end of a MemoryEffects print.
- Added basic unit test to validate that.
…st checks for fallback to llvm intrinsics

Check for cases where there isn't a amdlib call but it still vectorises the math call
The new class implements a deduplication table to convert import list
elements:

  {SourceModule, GUID, Definition/Declaration}

into 32-bit integers, and vice versa.  This patch adds a unit test but
does not add a use yet.

To be precise, the deduplication table holds {SourceModule, GUID}
pairs.  We use the bottom one bit of the 32-bit integers to indicate
whether we have a definition or declaration.

A subsequent patch will collapse the import list hierarchy --
FunctionsToImportTy holding many instances of FunctionsToImportTy --
down to DenseSet<uint32_t> with each element indexing into the
deduplication table above.  This will address multiple sources of
space inefficiency.
…06440)

We'd previously just deferred to the base implementation, but that more
or less always returns 1. This underestimates the cost of the
insert/extract, biases the SLP vectorizer towards forming illegally
typed vectors, and underestimates the cost of scalarized operations
(like unaligned scatter/gather).
This is a partial revert of c66e1d6.  Even though that
allowed us to declare v9.2-a support without picking up SVE2
in both the backend and the driver, the frontend itself still
enabled SVE via the arch version's default extensions.

Avoid that by reverting back to v8.7-a while we look into
longer-term solutions.
llvm#86149)

This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <[email protected]>
)

These lists are quite static and several of the parameters are actually
constant across all users. Heavy use of macros is undesirable, and not
idiomatic in LLVM, so let's just use the naive switch cases.

I'll probably continue with removing the other property macros. These
two just happened to be the two I actually had to figure out for an
unrelated change.
This reverts commit 42d3ccc which
caused a test failure.
…and (llvm#98414)

This patch addresses an issue with lit's internal shell when env is
without any arguments, it fails with exit code 127 because `env`
requires a subcommand. This patch addresses the issue by encoding the
command to properly return environment variables even when no arguments
are provided.

The error occurred when running the command 
` LIT_USE_INTERNAL_SHELL=1 ninja check-llvm`.

fixes: llvm#102383
This is part of the test cleanups proposed in the RFC: [[RFC] Enabling
the Lit Internal Shell by
Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)
Previously, functions named "main" got the NoRecurse attribute
consistent with the behavior of C++, which HLSL largely follows.
However, standard recursion is not allowed in HLSL, so all functions
should really have this attribute. This doesn't prevent recursion, but
rather signals that these functions aren't expected to recurse.

Practically, this was done so that entry point functions named "main"
would have all have the same attributes as otherwise identical entry
points with other names.

This required small changes to the this assignment tests because they no
longer generate so many attribute sets since more of them match.

related to llvm#105244
but done to simplify testing for llvm#89806
…llvm#106437)

These tests have been failing since db279c7 "[HLSL] Change default
linkage of HLSL functions to internal (llvm#95331)". This presumably went
unnoticed because they're not run by default since they rely on an
external tool (dxil-dis).
llvm#105574)

These lists are quite static. Heavy use of macros is undesirable, and
not idiomatic in LLVM, so let's just use the naive switch cases.

Note that the first two fields in the CONSTRAINEDFP property were
utterly unused (aside from a C++ test).

In the same vien as llvm#105551.

Once both changes have landed, we'll be left with _BINARYOP which needs
a bit of additional untangling, and the actual opcode mappings.
armv7a and armv8a are common names for the application subarch for arm.

These names in particular are used in ChromeOS, Android, and a few other
known applications. In ChromeOS, we encountered a bug where armv7a arch
was not recognised and segfaulted when starting an executable on an
arm32 device.

Google Issue Tracker:
https://issuetracker.google.com/361414339
)

This is a split-off from llvm#96023, where this change has already been
reviewed by libcxx maintainers.

This will prevent that PR from triggering libcxx-ci from now on.
…atv()"" (llvm#106592)

Reverts llvm#106589
The fix for bot failures caused by the reverted commit was committed
already, so this revert is not needed.
Several tests for the new fake use intrinsic are failing on NVPTX
buildbots due to relying on behaviour for their expected triple;
this commit adds that triple to each of them to prevent failures.

Fixes commit 3d08ade (llvm#86149).

Example buildbot failures:
https://lab.llvm.org/buildbot/#/builders/160/builds/4175
https://lab.llvm.org/buildbot/#/builders/180/builds/4173
…106582)

This isn't quite just code motion as the four different versions we had
of this routine differed in whether they ignored the "size" marker used
to represent undef. I doubt this matters in practice, but it is a
functional change.

---------

Co-authored-by: Alexey Bataev <[email protected]>
Add ResourceType, ResourceKind and ResourceFlag enum class for PSV
resource.

This is for llvm#103275
arichardson and others added 29 commits August 29, 2024 15:59
The interceptor types are supposed to match size_t (and the non-Windows
ssize_t) exactly, but on 32-bit Windows `size_t` uses `unsigned int`
whereas `SIZE_T` is `unsigned long`. The current definition results in
`uptr` not matching `uintptr_t` since we otherwise get typedef
redefinition errors. Work around this by using a #define instead of
a typedef when defining SIZE_T.

It would probably be cleaner to stop using these uppercase types, but
that is a rather invasive change and this one is the minimal change to
allow uptr to match uintptr_t on Windows.

To ensure this compiles on Windows, we also remove the interceptor.h
defines of uptr (that do not always match __sanitizer::uptr) and rely
on __sanitizer::uptr instead. The interceptor types most likely predate
those other types so clean up the unnecessary definition while here.

This also reverts commit 18e06e3 and
commit bb27dd8.

Reviewed By: mstorsjo, vitalybuka

Pull Request: llvm#106311
These functions in interception_win.cpp already exist in
sanitizer_common. Use those instead.

Reviewed By: mstorsjo

Pull Request: llvm#106488
…lvm#106517)

In llvm#106110 we had to mark v[f]slide1down.vx as
ActiveElementsAffectResult since the elements in the body depend on VL.
However it doesn't depend on the mask, so this was overly conservative
and broke the vmerge peephole.

We can recover this by splitting up ActiveElementsAffectResult into VL
and Mask bits, so we can more accurately model v[f]slide1down.vx and
re-enable the peephole.
This patch implements sandboxir::Type, a thin wrapper of llvm::Type.
This is designed very similarly to sandbox::Value. Context owns all
sandboxir::Type objects and maintains a map between llvm::Type and
sandboxir::Type.

There are a couple of reasons for migrating from llvm::Type to
sandboxir::Type:
- Creating an llvm::Type from within SandboxIR-only code doesn't work
well because it requires you to pass llvm::Context to functions like
llvm::Type::getInt32Ty(C), but you wouldn't normally have access to
llvm::Context C. In unit tests this is not such a big deal because you
have access to both, but it will become an issue in SandboxIR-only code.
- Not being able to get the sandboxir::Context from llvm::Type results
in awkward sandboir APIs with additional sandboxir::Context arguments.
- llvm::Type::getContext() can basically give you access to the whole
LLVM IR, which we should try to avoid.
This works for MinGW, but the MSVC linker apparently doens't pull in
those symbols. Reverting for now since I won't be able to reproduce it today.

https://lab.llvm.org/buildbot/#/builders/107/builds/2337

This reverts commit 9df92cb.
When [lowering return
values](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L3422)
from LLVM IR to SelectionDAG, we check that [the number of values
`SelectionDAG` tells us to return is equal to the number of values that
`ComputePTXValueVTs()` tells us to
return](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L3441).
However, this check can fail on valid IR. For example:

```
define <6 x half> @foo() {
  ret <6 x half> zeroinitializer
}
```

`ComputePTXValueVTs()` tells us to return ***3*** `v2f16` values, while
`SelectionDAG` tells us to return ***6*** `f16` values. Thus, the
compiler will crash.

`ComputePTXValueVTs()` [supports all `half` element vectors with an even
number of
elements](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L213).
Whereas `SelectionDAG` [only supports power-of-2 sized
vectors](https://github.com/llvm/llvm-project/blob/4e078e3797098daa40d254447c499bcf61415308/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1580).
This is the root of the discrepancy.

Assuming that the developers who added the code to
`ComputePTXValueVTs()` overlooked this, I've restricted
`ComputePTXValueVTs()` to compute the same number of return values as
`SelectionDAG`, instead of extending `SelectionDAG` to support
non-power-of-2 sized vectors.
…nctionPropertiesAnalysis (llvm#104867) (llvm#106309)

Reverts c992690.

The problem is that if there is a sequence "{delete A->B} {delete A->B}
{insert A->B}" the net result is "{delete A->B}", which is not what we
want.

Duplicate successors may happen in cases like switch statements (as
shown in the unit test).

The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore.

The fix is to sanitize the list of deletes, just like we do for inserts.
…er. (llvm#106481)

This patch add a legality check that checks if target machine support
vector of address in `isLegalMaskedGatherScatter()`.
Solve clangd/clangd#2094

Due clangd will enable PCH automatically, the previous mechanism to skip
ODR check in GMF may be invalid. This patch fixes this for a case.
…m#106477)

Address space information may be encoded anywhere along the use-def
chain. Take advantage of this by traversing the chain until we find a
non-generic addrspace.
Introducing `HLSLAttributedResourceType` - a new type that is similar to
`AttributedType` but with additional data specific to HLSL resources.
`AttributeType` currently only stores an attribute kind and no
additional data from the type attribute parameters. This does not really
work for HLSL resources since its type attributes contain non-boolean
values that need to be retained as well.

For example:

```
template <typename T> class RWBuffer {
  __hlsl_resource_t  [[hlsl::resource_class(uav)]] [[hlsl::is_rov]] handle;
};
```

The data `HLSLAttributedResourceType` needs to eventually store are:
- resource class (SRV, UAV, CBuffer, Sampler)
- texture dimension(1-3)
- flags is_rov, is_array, is_feedback and is_multisample
- contained type

All of these values except contained type will be stored in
`HLSLAttributedResourceType::Attributes` struct and accessed
individually via the fields. There is also `Data` alias that covers all
of these values as a `unsigned` which is used for hashing and the AST
type serialization.

During type attribute processing all HLSL type attributes will be
validated and collected by SemaHLSL (by
`SemaHLSL::handleResourceTypeAttr`) and in the end combined into a
single `HLSLAttributedResourceType` instance (in
`SemaHLSL::ProcessResourceTypeAttributes`). `SemaHLSL` will also need to
short-term store the `TypeLoc` information for the new type that will be
grabbed by `TypeSpecLocFiller` soon after the type is created.

Part 1/2 of llvm#104861
…106628)

ALLOCATE and DEALLOCATE statements can be inlined in device function.
This patch updates the condition that determined to inline these actions
in lowering.

This avoid runtime calls in device function code and can speed up the
execution.

Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used
elsewhere.
Allow some interaction between LLVM and FIR dialect by allowing
conversion between FIR memory types and llvm.ptr type.
This is meant to help experimentation where FIR and LLVM dialect
coexists, and is useful to deal with cases where LLVM type makes it
early into the MLIR produced by flang, like when inserting LLVM stack
intrinsic here:
https://github.com/llvm/llvm-project/blob/0a00d32c5f88fce89006dcde6e235bc77d7b495e/flang/lib/Optimizer/Transforms/StackReclaim.cpp#L57
The current pattern was failing OpenACC semantics in acc parse tree
canonicalization:

```
!acc loop
!dir vector aligned
do i=1,n
...
```

Fix it by moving the directive before the OpenACC construct node.

Note that I think it could make sense to propagate the $dir info to the
acc.loop, at least with classic flang, the $dir seems to make a
difference. This is not done here since few directives are supported
anyway.
…oDate function

This patch extracts ModuleFile class from StandalonePrerequisiteModules
so that we can reuse it further. And also we implement
IsModuleFileUpToDate function to implement
StandalonePrerequisiteModules::CanReuse. Both of them aims to ease the
future improvements to the support of modules in clangd. And both of
them should be NFC.
__getauxval is a libgcc function that doesn't exist on Android.
Also on Linux let's use getauxval as it is anyway used other places in compiler-rt.
…Date to reference

It is better to use references instead of pointers as the argument type
of IsModuleFileUpToDate. Since the PrerequisiteModules is always
expected to exist.
…#106564)

shortloop is a non standard OpenACC extension
(https://docs.nvidia.com/hpc-sdk/pgi-compilers/2015/pgirn157.pdf) that
can be found on loop directives.

f18 parser was choking when seeing it. Since it can be found in existing
apps and is mainly an optimization hint, parse it on loop directives and
ignore it with a warning.

For the records, here is shortloop meaning according to the manual linked above:

"If the shortloop clause appears on a loop directive with the vector clause, it tells the compiler that the
loop trip count is less than or equal to the number of vector lanes created for that loop. This means the
value of the vector() clause on the loop directive in a kernels region, or the value of the
vector_length() clause on the parallel directive in a parallel region will be greater than or
equal to the loop trip count. This allows the compiler to generate more efficient code for the loop"
…lvm#106621)

Without these explicit includes, removing other headers, who implicitly
include llvm-config.h, may have non-trivial side effects.
Similarly to the existing range attribute inference, also infer the
nonnull attribute on function return values.

I think in practice FunctionAttrs will handle nearly all cases, the main
one I think it doesn't is cases involving branch conditions. But as we
already have the information here, we may as well materialize it.
…intrinsic (llvm#103037)

Adds support for wider-than-legal vector types for the histogram
intrinsic (llvm.experimental.vector.histogram.add) by splitting the
vector. Also adds integer promotion for the Inc operand.
…-bundles`. NFC. (llvm#106661)

With `-view-edge-bundles`, before the change, the dot file output is
kinda like
```dot
digraph {
        "%bb.0" [ shape=box ]
        0 -> "%bb.0"
        "%bb.0" -> 1
        "%bb.0" -> "%bb.1" [ color=lightgray ]
        "%bb.0" -> "%bb.6" [ color=lightgray ]
        "%bb.1" [ shape=box ]
        1 -> "%bb.1"
        "%bb.1" -> 1
        "%bb.1" -> "%bb.2" [ color=lightgray ]
        "%bb.1" -> "%bb.6" [ color=lightgray ]
        "%bb.2" [ shape=box ]
        1 -> "%bb.2"
        "%bb.2" -> 1
        "%bb.2" -> "%bb.3" [ color=lightgray ]
        "%bb.3" [ shape=box ]
        1 -> "%bb.3"
        "%bb.3" -> 2
        "%bb.3" -> "%bb.4" [ color=lightgray ]
        "%bb.4" [ shape=box ]
        2 -> "%bb.4"
        "%bb.4" -> 2
        "%bb.4" -> "%bb.4" [ color=lightgray ]
        "%bb.4" -> "%bb.5" [ color=lightgray ]
        "%bb.5" [ shape=box ]
        2 -> "%bb.5"
        "%bb.5" -> 1
        "%bb.5" -> "%bb.6" [ color=lightgray ]
        "%bb.5" -> "%bb.3" [ color=lightgray ]
        "%bb.6" [ shape=box ]
        1 -> "%bb.6"
        "%bb.6" -> 3
}
```
However, the graph output by graphviz is

![t](https://github.com/user-attachments/assets/24056c0a-3ba9-49c3-a5da-269f3140e619)
The node name corresponding to the MBB is incorrect.
After the change, the node name is consistent with MBB's name.

![s](https://github.com/user-attachments/assets/38c649d1-7222-4de1-971c-56f7721ab64c)
[D156118](https://reviews.llvm.org/D156118) states that this note is
always present, but it is better to check it explicitly, as otherwise
`lldb` may crash when trying to read registers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.