Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 51365212 (Aug 25) (10) #363

Open
wants to merge 427 commits into
base: bump_to_b96f18b2
Choose a base branch
from

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

jayfoad and others added 30 commits August 22, 2024 11:46
…05549)

Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.
…lvm#83807)

This fixes an odd problem with the regex when `CMAKE_INSTALL_LIBDIR` is
not defined:

`string sub-command REGEX, mode REPLACE: regex "$" matched an empty
string.`

Fixes llvm#83802
…eclare} (llvm#105570)

Constify debug DbgVariableRecord::{isDbgValue,isDbgDeclare}.
This reverts commit 6528157.

I'm reverting llvm#104523
(llvm@f01f80c)
and this fixup belongs to the same series of changes.
This reverts commit 6f45602, which
depends on llvm#104523, which I'm
reverting.
llvm#104523)"

This reverts commit f01f80c.

This commit introduces an msan violation. See the discussion on llvm#104523.
…#105544)

- Refactor SetTheory code to use const pointers when possible.
- Use auto for variables initialized using dyn_cast<>.
- Use range based for loops and early continue.
There was a duplicate link target.
This region is intended to separate alloca operations from reduction
variable initialization. This makes it easier to hoist allocas to the
entry block before control flow and complex code for initialization.

The verifier checks that there is at most one block in the alloc region.
This is not sufficient to avoid control flow in general MLIR, but by the
time we are converting to LLVMIR structured control flow should already
have been lowered to the cf dialect.

1/3
Part 2: llvm#102524
Part 3: llvm#102525
The intention of this change is to ensure that allocas end up in the
entry block not spread out amongst complex reduction variable
initialization code.

The tests we have are quite minimized for readability and
maintainability, making the benefits less obvious. The use case for this
is when there are multiple reduction variables each will multiple blocks
inside of the init region for that reduction.

2/3
Part 1: llvm#102522
Part 3: llvm#102525
I removed the `*-hlfir*` tests because they are duplicate now that the
other tests have been updated to use the HLFIR lowering.

3/3
Part 1: llvm#102522
Part 2: llvm#102524
…finitions and partial specializations (llvm#104030)

We need to rebuild the template parameters of out-of-line
definitions/specializations of member templates in the context of the
current instantiation for the purposes of declaration matching. We
already do this for function templates and class templates, but not
variable templates, partial specializations of variable template, and
partial specializations of class templates. This patch fixes the latter
cases.
)

Convert them to Pointers, do the offset calculation and then convert
them back to function pointers.
…105644)

This can be handled in ODS instead of writing custom parsing/printing
code.

Thanks for the idea @skatrak
…vm#102752)

Currently `mlir.llvm.constant` of structure types restricts that the
structure type effectively represents a complex type -- it must have
exactly two fields of the same type and the field type must be either an
integer type or a float type.

This PR relaxes this restriction and it allows the structure type to
have an arbitrary number of fields.
…ble objects (llvm#104778)

Whilst dealing with review comments on

llvm#96752

I discovered that SCEV does not know about the dereferenceable attribute
on function arguments so I have updated getRangeRef to make use of it
by calling getPointerDereferenceableBytes.
These builtins are currently returning CR0 which will have the format
[0, 0, flag_true_if_saved, XER].
We only want to return flag_true_if_saved. This patch adds a shift to
remove the XER bit before returning.
This aligns the transform with what foldLogOpOfMaskedICmp() does.
- Landing page: add link to the libc++ Discord channel
- Landing page: reorder "Getting Involved" above "Design documents"
- Landing page: remove "Notes and Known Issues" which was completely outdated
- Rename "Using Libc++" to "User Documentation" and update contents
- Rename "Building Libc++" to "Vendor Documentation" and update contents

The "BuildingLibcxx" and "UsingLibcxx" pages have basically been used for
vendor and user documentation respectively. However, they were named in
a way that doesn't really make that clear. Renaming the pages now gives
us a location to clearly document what we target at vendors and what we
target at users, and to do that separately.
…ns (llvm#105455)

This allows the use a single wider operation with a restricted EVL
instead of padding the vector with the neutral element.

For RISCV specifically, it's worth noting that an alternate padded
lowering is available when VL is one less than a power of two, and LMUL
<= m1. We could slide the vector operand up by one, and insert the
padding via a vslide1up. We don't currently pattern match this, but we
could. This form would arguably be better iff the surrounding code
wanted VL=4. This patch will force a VL toggle in that case instead.

Basically, it comes down to a question of whether we think odd sized
vectors are going to appear clustered with odd size vector operations,
or mixed in with larger power of two operations.

Note there is a potential downside of using vp nodes; we loose any
generic DAG combines which might have applied to the widened form.
…lvm#104689)

This is a fairly narrow transform (at the moment) to reduce the VLs of
instructions feeding a store with a smaller VL. Note that the goal of
this transform isn't really to reduce VL - it's to reduce VL *toggles*.
To our knowledge, small reductions in VL without also changing LMUL are
generally not profitable on existing hardware.

For a single use instruction without side effects, fp exceptions, or a
result dependency on VL, reducing VL is legal if only a subset of
elements are legal. We'd already implemented this logic for vmv.v.v, and
this patch simply applies it to stores as an alternate root.

Longer term, I plan to extend this to other root instructions (i.e.
different kind of stores, reduces, etc..), and add a more general
recursive walkback through operands.

One risk with the dataflow based approach is that we could be reducing
VL of an instruction scheduled in a region with the wider VL (i.e. mixed
mode computations) forcing an additional VL toggle. An example of this
is the @insert_subvector_dag_loop test case, but it doesn't appear to
happen widely. I think this is a risk we should accept.
This patch extends llvm#73964 and
optimises SVE cmp intrinsics to zero vector when predicate is zero.
This patch removes obsolete status pages for projects that were
completed: LLVM 18 release, C++20 Ranges and Spaceship support.

Co-authored-by: Hristo Hristov <[email protected]>
tbaederr and others added 29 commits August 24, 2024 09:23
Since this must be true, add an assertion instead of just documenting it
via the comment.
…and has the `nuw` or `nsw` property. (llvm#105914)

This patch updates the select operand when the cond has the nuw or nsw
property. Considering the semantics of the nuw and nsw flag, if there is
no poison value in this expression, this code assumes that X can only be
0, 1 or -1.

close: llvm#96765
alive2: https://alive2.llvm.org/ce/z/3n3n2Q
The intent is that the tests should not be running on PowerPC as the fp128 type
will differ. This attempts to fix the bots by using __powerpc__ instead, which
appears to be defined in godbolt.
…ephole (llvm#105792)

Currently we move the source down to where vmv.v.v to make sure that the
new passthru dominates, but we do this even if it already does.

This adds a simple local dominance check (taken from
X86FastPreTileConfig.cpp) and avoids doing the move if it can.

It also modifies the move to only move it to just past the passthru
definition, and not all the way down to the vmv.v.v.

This allows folding to succeed in some edge cases, which prevents
regressions in an upcoming patch.
TLI might not be valid for all contexts that constant folding is performed. Add
a quick guard that it is not null.
On macOS the dynamic loader prunes dyld specific environment variables
such as `DYLD_INSERT_LIBRARIES`, `DYLD_LIBRARY_PATH`, etc. If these are
set in the lit config it's safe to assume that the user actually wanted
their subprocesses to run with these variables, versus the python
interpreter that gets executed with them before they are pruned. This
change exports all known variables in the shell script instead of
relying on them being passed through.
Followup to llvm#90109.

In Microsoft, our automated scans are warning that LLVM has vulnerable
dependencies. Specifically:

* [CVE-2024-35195](https://nvd.nist.gov/vuln/detail/CVE-2024-35195) was
fixed in `requests` 2.32.0.
* [CVE-2024-37891](https://nvd.nist.gov/vuln/detail/CVE-2024-37891) was
fixed in `urllib3` 2.2.2.

I've updated LLVM's dependencies by running the following commands in
`llvm/utils/git`:

```
pip-compile --upgrade --generate-hashes --output-file=requirements.txt requirements.txt.in
pip-compile --upgrade --generate-hashes --output-file=requirements_formatting.txt requirements_formatting.txt.in
```

Note that for `requirements_formatting.txt` this adds
`--generate-hashes` (according to my vague understanding, it's highly
desirable and was already used for `requirements.txt`) and was locally
run within `llvm/utils/git` (changing the recorded command, which
apparently was originally run from the repo root - again,
`requirements.txt` was already being regenerated with a locally run
command, so this increases consistency).

I observe that this has updated the relevant components to pick up the
CVE fixes. Note that I am largely clueless in this area, so I hope that
(like llvm#90109) no other changes will be necessary.
Followup to llvm#99570.

* `TEST_COMPILER_MSVC` must be tested for `defined`ness, as it is
everywhere else.
+ Definition:
https://github.com/llvm/llvm-project/blob/52a7116f5c6ada234f47f7794aaf501a3692b997/libcxx/test/support/test_macros.h#L71-L72
+ Example usage:
https://github.com/llvm/llvm-project/blob/52a7116f5c6ada234f47f7794aaf501a3692b997/libcxx/test/std/utilities/function.objects/func.not_fn/not_fn.pass.cpp#L248
+ Fixes: `llvm-project\libcxx\test\support\atomic_helpers.h(33): fatal
error C1017: invalid integer constant expression`
* Fix bogus return type: `msvc_is_lock_free_macro_value()` returns `2`
or `0`, so it needs to return `int`.
+ Fixes: `llvm-project\libcxx\test\support\atomic_helpers.h(41): warning
C4305: 'return': truncation from 'int' to 'bool'`
* Clarity improvement: also add parens when mixing bitwise with
arithmetic operators.
Fix bug introduced in llvm#105730

The bug is in how the batch RAUW is implemented. If we have 

```
%0 = mov %src
%1 = mov %0

use %0
use %1
```

The use of `%1` is rewritten to `%0`, not `%src`. This PR just looks for
a replacement when it maps to the src register, which should
transitively propagate the replacements.
…tions (llvm#105840)

This is a follow up to llvm#105455 which updates the VPIntrinsic mappings
for the fadd and fmul cases, and supports both ordered and unordered
reductions. This allows the use a single wider operation with a
restricted EVL instead of padding the vector with the neutral element.

This has all the same tradeoffs as the previous patch.
… mangling for MSVC 1920+ / VS2019+ (llvm#104722)

Reapply llvm#102848.

The description in this PR will detail the changes from the reverted
original PR above.

For `auto&&` return types that can partake in reference collapsing we
weren't properly handling that mangling that can arise.
When collapsing occurs an inner reference is created with the collapsed
reference type. If we return `int&` from such a function then an inner
reference of `int&` is created within the `auto&&` return type.
`getPointeeType` on a reference type goes through all inner references
before returning the pointee type which ends up being a builtin type,
`int`, which is unexpected.

We can use `getPointeeTypeAsWritten` to get the `AutoType` as expected
however for the instantiated template declaration reference collapsing
already occurred on the return type. This means `auto&&` is turned into
`auto&` in our example above.
We end up mangling an lvalue reference type.
This is unintended as MSVC mangles on the declaration of the return
type, `auto&&` in this case, which is treated as an rvalue reference.
```
template<class T>
auto&& AutoReferenceCollapseT(int& x) { return static_cast<int&>(x); }

void test() 
{
    int x = 1;
    auto&& rref = AutoReferenceCollapseT<void>(x); // "??$AutoReferenceCollapseT@X@@ya$$QEA_PAEAH@Z"
    // Mangled as an rvalue reference to auto
}
```

If we are mangling a template with a placeholder return type we want to
get the first template declaration and use its return type to do the
mangling of any instantiations.

This fixes the bug reported in the original PR that caused the revert
with libcxx `std::variant`.
I also tested locally with libcxx and the following test code which
fails in the original PR but now works in this PR.
```
#include <variant>

void test()
{
    std::variant<int> v{ 1 };
    int& r = std::get<0>(v);
    (void)r;
}
```
)

Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, #56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL #2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, #58
        ldr w0, [x2, x8, lsl #2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
v16i8 VECTOR_REG_CAST (v16i8 Op) can use v16i8 Op directly, as the
VECTOR_REG_CAST is a noop.
We assign I->getNumOperands() to J and immediately print that out as a
debug message.  We don't need to keep J across iterations.
…uble (llvm#104929)"

ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`.
LLVM should not change the behavior depending on host configurations.

This reverts commit 14c7e4a.
(llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)
Current VCIX ISDs are placed after FIRST_TARGET_STRICTFP_OPCODE which is
not expected, it should be in normal OPCODE area.
…iveIn/removeLiveIn. NFC

We already used it for addLiveIn.
If we fail to initialize the ASTContext builtins, LLDB
may crash in non-obvious ways down-the-line, e.g., when
it tries to call `ASTContext::getTypeSize` on a builtin like
`ast.UnsignedCharTy`, which would derefernce a `null` `QualType`.

The initialization can fail if we either didn't set the
`TypeSystemClang` target triple, or if the embedded clang isn't
enabled for a certain target.

This patch attempts to help pin-point the failure case post-mortem
by adding a log message here that prints the triple.

rdar://134260837
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.