Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) #451

Open
wants to merge 859 commits into
base: feature/fused-ops
Choose a base branch
from

Conversation

jorickert
Copy link

@jorickert jorickert commented Jan 14, 2025

kazutakahirata and others added 30 commits October 21, 2024 06:50
llvm#113119)

StringMap::find takes StringRef.  We don't need to create an instance
of std::string from StringRef only to convert it right back to
StringRef.
This code intentionally discards the high bits, so set
implicitTrunc=true. This is currently NFC but will enable an
APInt assertion in the future.
…vm#111575)

Adds a new mlir-opt test-only pass, -test-spirv-cpu-runner-pipeline,
which runs the set of MLIR passes needed for the mlir-spirv-cpu-runner,
and removes them from the runner. The tests are changed to invoke
mlir-opt with this flag before running the runner. The eventual goal is
to move all host/device code generation steps out of the runner, like
with some of the other runners.
6bac414 added this opcode with the wrong
number of operands. It didn't fail on check-llvm for me or on pre-commit CI,
but once committed we got buildbot failures. This patch fixes the definition
of the instruction and fixes the failing test.
With the truncssat nodes these are relatively simple tablegen patterns to add.
The existing intrinsics are converted to shift+truncsat to they can lower using
the new patterns.

Fixes llvm#112925.
This patch adds assembly/disassembly for the following instructions:
   ldfadd{a,al,l,}, ldbfadd{a,al,l,}
   ldfmax{a,al,l,}, ldbfmax{a,al,l,}
   ldfmaxnm{a,al,l,}, ldbfmaxnm{a,al,l,}
   ldfmin{a,al,l,}, ldbfmin{a,al,l,}
   ldfminnm{a,al,l,} ldbfminnm{a,al,l,}
   stfadd{l,}, stbfadd{l,}
   stfmax{l,}, stbfmax{l,}
   stfmaxnm{l,}, stbfmaxnm{l,}
   stfmin{l,}, stbfmin{l,}
   stfminnm{l,}, stbfminnm{l,}

According to [1]

[1]https://developer.arm.com/documentation/ddi0602

Co-authored-by: Spencer Abson
[[email protected]](mailto:[email protected])
Co-authored-by: Caroline Concatto
[[email protected]](mailto:[email protected])
Add new register classes/operands and their encoder/decoder behaviour
required for the new Armv9.6 instructions (see
https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension).

This work is the basis ofthe 2024 Armv9.6 architecture update effort for
SME.

Co-authored-by: Caroline Concatto [email protected]
Co-authored-by: Marian Lukac [email protected]
Co-authored-by: Momchil Velikov [email protected]
Improve the codegen for uaddo node for i64 in 64-bit mode and i32 in
32-bit mode by custom lowering.
Test added by commit 47a6da2 fails on the AIX bot. So XFAIL for now to investigate further.
Previously we were attempting to remove the memprof-related metadata
when iterating through instructions in the LTO backend. However, we
missed some as there are a number of cases where we skip instructions,
or even entire functions. Simplify the cleanup and ensure all is removed
by doing a full sweep over all instructions after completing cloning.

This is largely NFC except with -memprof-report-hinted-sizes enabled,
because we were propagating and simplifying the metadata after inlining
in the LTO backend, which caused some stray messages as metadata was
re-converted to attributes.
  movzx  r11d,BYTE PTR [rdx]

is four bytes long.

Follow-up to llvm#111638
Remove the unused functions and register classes from the change below
llvm@4679583
Root gather/buildvector node should be ignored when SLP vectorizer tries
to find matching gather nodes, vectorized earlier. This node is
definitely the last one in the pipeline and it does not have users. It
may cause the compiler crash

Fixes llvm#113143
This patch adds a LegalityResultWithReason class for describing the
reason why legality decided not to vectorize the code.
…llvm#109850)

Special case small int constant in the PPC custom lowering of
scalar_to_vector.
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on.
Built by GCC 11.

Fix warning

In destructor ‘llvm::APInt::~APInt()’,
inlined from ‘llvm::APInt::~APInt()’ at
llvm-project/llvm/include/llvm/ADT/APInt.h:190:3,
inlined from ‘llvm::APSInt::~APSInt()’ at
llvm-project/llvm/include/llvm/ADT/APSInt.h:23:21,
inlined from ‘bool
checkOMPArraySectionConstantForReduction(clang::ASTContext&, const
clang::ArraySectionExpr*, bool&, llvm::SmallVectorImpl<llvm::APSInt>&)’
at llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18357:45,
inlined from ‘bool actOnOMPReductionKindClause(clang::Sema&,
{anonymous}::DSAStackTy*, clang::OpenMPClauseKind,
llvm::ArrayRef<clang::Expr*>, clang::SourceLocation,
clang::SourceLocation, clang::SourceLocation, clang::SourceLocation,
clang::CXXScopeSpec&, const clang::DeclarationNameInfo&,
llvm::ArrayRef<clang::Expr*>, {anonymous}::ReductionData&)’ at
llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18715:68:
llvm-project/llvm/include/llvm/ADT/APInt.h:192:18: error: ‘void operator
delete [](void*)’ called on a pointer to an unallocated object ‘1’
[-Werror=free-nonheap-object]
  192 |       delete[] U.pVal;
      |                  ^~~~
This patch fixes:

  llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/Legality.h:85:16:
  error: private field 'Reason' is not used
  [-Werror,-Wunused-private-field]
…2765)

MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for VGPR limited
kernels. A kernel becomes VGPR limited if a total number of VGPRs per
SIMD / number of used VGPRs is more than a number of wave slots.
…#112990)

Renames LegalizeData to LegalizeDataValues since this pass fixes up SSA
values. LegalizeData suggested that it fixed data mapping.

This change also adds support to fix up ssa values for data clause
operations. Effectively, compute regions within a data region use the
ssa values from data operations also. The ssa values within data regions
but not within compute regions are not updated.

This change is to support the requirement in the OpenACC spec which
notes that a visible data clause is not just one on the current compute
construct but on the lexically containing data construct or visible
declare directive.
Based on this RFC:
https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306

LLVM intrinsics do not support out params. To get around this limitation
implementers will make intrinsics return structs to capture a return
type and an out param. This implementation detail should not impact
scalarization since these cases should be elementwise operations.

## Three changes are needed. 
- The CallInst visitor needs to be updated to handle Structs
- A new visitor is needed for `ExtractValue` instructions
- finsh needs to be update to handle structs so that insert elements are
properly propogated.

## Testing changes
- Add support for `llvm.frexp`
- Add support for `llvm.dx.splitdouble`

fixes llvm#111437
…vm#112997)

The x86-fold-tables.td has been failing for me and [in
CI](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4)
if Git happens to decide to check out the baseline file with Windows
line endings.

This fix for this is to add the `--strip-trailing-cr` option to diff to
normalize the line endings before comparing them.
…lvm#112995)

llvm#98060 introduced a warning for unterminated string constants, however
it was only checking for `\n` which means that it produced strange
results on Windows (always blaming column 1) including having the
[associated test
fail](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4)
if Git happened to use Windows newlines when creating the file.

This fix for this is to detect both `\r` and `\n`, but don't double-warn
for Windows newlines.
cjacek and others added 17 commits October 23, 2024 13:10
…lazy archive symbol to the symbol table on ARM64EC (llvm#113284)

On ARM64EC, a function symbol may appear in both mangled and demangled
forms:
- ARM64EC archives contain only the mangled name, while the demangled
symbol is defined by the object file as an alias.
- x86_64 archives contain only the demangled name (the mangled name is
usually defined by an object referencing the symbol as an alias to a
guess exit thunk).
- ARM64EC import files contain both the mangled and demangled names for
thunks.

If more than one archive defines the same function, this could lead to
different libraries being used for the same function depending on how
they are referenced. Avoid this by checking if the paired symbol is
already defined before adding a symbol to the table.
…m#112928)

Member pointers refer to data or function members of a `CXXRecordDecl` and
require a `MSInheritanceAttr` in order to be complete. Without that we
cannot calculate their size in memory. The attempt has been causing a crash
further down in the clang AST context. In order to implement the feature,
DWARF will need a new attribtue to convey the information. For the moment,
this patch teaches LLDB to handle to situation and avoid the crash.
…lvm#111130)

Before this patch, redundant COPY couldn't be removed for the following
case:
```
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
```
This PR merges large offsets into the base address loading.
llvm#113309)

llvm-cxxfilt can demangle names of data symbols, in addition to function
names.

    $ llvm-cxxfilt _ZN6garden5gnomeE
    garden::gnome

And type names too, on request:

    $ llvm-cxxfilt -t i
    int

Update some overly specific the wording in the --help and documentation
that suggests otherwise.
This patch adds these new vector sizes for neon:
   mfloat8x16_t and mfloat8x8_t

    According to the ARM ACLE PR#323[1].

    [1] ARM-software/acle#323
llvm#111531)

Bot maintainers should be aware and it became too much of a burden
for developers. In particular on Windows, where make.exe won't be
found in Path typically.
…2867)

The Intel C++ Compiler (ICX) passes linker flags through the driver
unlike MSVC and clang-cl, and therefore needs them to be prefixed with
`/Qoption,link` (the equivalent of -Wl, for gcc on *nix).

Use the `LINKER:` prefix for the `/EXPORT:` options in clang-repl, this
expands to the correct flag for ICX and nothing for MSVC / clang-cl.

RFC:
https://discourse.llvm.org/t/rfc-cmake-linker-flags-need-wl-equivalent-for-intel-c-icx-on-windows/82446
These two veclibs are only available for AArch64 targets, and as
mentioned in https://discourse.llvm.org/t/rfc-should-fveclib-imply-fno-math-errno-for-all-targets/81384,
we (Arm) think that `-fveclib` should imply `-fno-math-errno`. By
setting `-fveclib` the user shows they intend to use the vector math
functions, which implies they don't care about errno. However,
currently, the vector mappings won't be used in many cases without
setting `-fno-math-errno` separately.

Making this change would also help resolve some inconsistencies in how
vector mappings are applied (see llvm#108980 (comment)).

Note: Both SLEEF and ArmPL state that they do not set `errno`:

- https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions
  * "The vector functions in libamath which are available on Linux may not set errno nor raise exceptions"
- https://sleef.org/2-references/libm/
  *  "These functions do not set errno nor raise an exception."
…llvm#113167)

Define `OmpIteratorSpecifier` and `OmpIteratorModifier` parser classes,
and add parsing for them. Those are reusable between any clauses that
use iterator modifiers.

Add support for iterator modifiers to the MAP clause up to lowering,
where a TODO message is emitted.
…at container-inserter does (llvm#113103)

This patch implements LWG4016: container-insertable checks do not match
what container-inserter does.
When compiling for an SVE target we can use INDEX to generate constant
fixed-length step vectors, e.g.:
```
uint32x4_t foo() {
  return (uint32x4_t){0, 1, 2, 3};
}
```
Currently:
```
foo():
        adrp    x8, .LCPI1_0
        ldr     q0, [x8, :lo12:.LCPI1_0]
        ret
```
With INDEX:
```
foo():
        index   z0.s, #0, #1
        ret
```

The logic for this was already in `LowerBUILD_VECTOR`, though it was
hidden under a check for `!Subtarget->isNeonAvailable()`. This patch
refactors this to enable the corresponding code path unconditionally for
constant step vectors (as long as we can use SVE for them).
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump) Jan 14, 2025
Base automatically changed from bump_to_f035d9f0 to feature/fused-ops January 23, 2025 14:10
@mgehre-amd
Copy link
Collaborator

Does a corresponding PR exist on Xilinx/onnx-mlir? If yes, could you please link them?

@jorickert
Copy link
Author

Xilinx/onnx-mlir#272 I will update it today

@jorickert
Copy link
Author

I update the onnx-bump, but we will need to include #455 in the same bump too

[AutoBump] Merge with a8d506b (Oct 22) (15)
[AutoBump] Merge with 8a9921f (Oct 23) (17)
[AutoBump] Merge with fixes of 519eef3 (Oct 22) (16) (Needs downstream changes)
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump) [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment