[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) #451

jorickert · 2025-01-14T09:57:15Z

Onnx Bump [AutoBump] Merge with fixes of b8000366 (Needs LLVM bump)(Dec 19) (10) onnx-mlir#272
As we are blocked by the onnx windows issue, this can be worked around by removing the test "parallel/krnl_parallel_clause_to_omp.mlir"

llvm#113119) StringMap::find takes StringRef. We don't need to create an instance of std::string from StringRef only to convert it right back to StringRef.

This code intentionally discards the high bits, so set implicitTrunc=true. This is currently NFC but will enable an APInt assertion in the future.

…vm#111575) Adds a new mlir-opt test-only pass, -test-spirv-cpu-runner-pipeline, which runs the set of MLIR passes needed for the mlir-spirv-cpu-runner, and removes them from the runner. The tests are changed to invoke mlir-opt with this flag before running the runner. The eventual goal is to move all host/device code generation steps out of the runner, like with some of the other runners.

6bac414 added this opcode with the wrong number of operands. It didn't fail on check-llvm for me or on pre-commit CI, but once committed we got buildbot failures. This patch fixes the definition of the instruction and fixes the failing test.

With the truncssat nodes these are relatively simple tablegen patterns to add. The existing intrinsics are converted to shift+truncsat to they can lower using the new patterns. Fixes llvm#112925.

…eturn value is different in pointer / lvalue ref / rvalue ref (llvm#112853) Per https://cplusplus.github.io/CWG/issues/960.html.

This patch adds assembly/disassembly for the following instructions: ldfadd{a,al,l,}, ldbfadd{a,al,l,} ldfmax{a,al,l,}, ldbfmax{a,al,l,} ldfmaxnm{a,al,l,}, ldbfmaxnm{a,al,l,} ldfmin{a,al,l,}, ldbfmin{a,al,l,} ldfminnm{a,al,l,} ldbfminnm{a,al,l,} stfadd{l,}, stbfadd{l,} stfmax{l,}, stbfmax{l,} stfmaxnm{l,}, stbfmaxnm{l,} stfmin{l,}, stbfmin{l,} stfminnm{l,}, stbfminnm{l,} According to [1] [1]https://developer.arm.com/documentation/ddi0602 Co-authored-by: Spencer Abson [[email protected]](mailto:[email protected]) Co-authored-by: Caroline Concatto [[email protected]](mailto:[email protected])

…llvm#113164) Fixes llvm#113123 Alive proof: https://alive2.llvm.org/ce/z/hnqeLC

…m#112935) Done in preparation of exploring rtsan on windows.

Add new register classes/operands and their encoder/decoder behaviour required for the new Armv9.6 instructions (see https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension). This work is the basis ofthe 2024 Armv9.6 architecture update effort for SME. Co-authored-by: Caroline Concatto [email protected] Co-authored-by: Marian Lukac [email protected] Co-authored-by: Momchil Velikov [email protected]

…-opt" (llvm#113176) Reverts llvm#111575 This caused build failures: https://lab.llvm.org/buildbot/#/builders/138/builds/5244

Improve the codegen for uaddo node for i64 in 64-bit mode and i32 in 32-bit mode by custom lowering.

Test added by commit 47a6da2 fails on the AIX bot. So XFAIL for now to investigate further.

Previously we were attempting to remove the memprof-related metadata when iterating through instructions in the LTO backend. However, we missed some as there are a number of cases where we skip instructions, or even entire functions. Simplify the cleanup and ensure all is removed by doing a full sweep over all instructions after completing cloning. This is largely NFC except with -memprof-report-hinted-sizes enabled, because we were propagating and simplifying the metadata after inlining in the LTO backend, which caused some stray messages as metadata was re-converted to attributes.

movzx r11d,BYTE PTR [rdx] is four bytes long. Follow-up to llvm#111638

Remove the unused functions and register classes from the change below llvm@4679583

Root gather/buildvector node should be ignored when SLP vectorizer tries to find matching gather nodes, vectorized earlier. This node is definitely the last one in the pipeline and it does not have users. It may cause the compiler crash Fixes llvm#113143

This patch adds a LegalityResultWithReason class for describing the reason why legality decided not to vectorize the code.

…llvm#109850) Special case small int constant in the PPC custom lowering of scalar_to_vector.

This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on. Built by GCC 11. Fix warning In destructor ‘llvm::APInt::~APInt()’, inlined from ‘llvm::APInt::~APInt()’ at llvm-project/llvm/include/llvm/ADT/APInt.h:190:3, inlined from ‘llvm::APSInt::~APSInt()’ at llvm-project/llvm/include/llvm/ADT/APSInt.h:23:21, inlined from ‘bool checkOMPArraySectionConstantForReduction(clang::ASTContext&, const clang::ArraySectionExpr*, bool&, llvm::SmallVectorImpl<llvm::APSInt>&)’ at llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18357:45, inlined from ‘bool actOnOMPReductionKindClause(clang::Sema&, {anonymous}::DSAStackTy*, clang::OpenMPClauseKind, llvm::ArrayRef<clang::Expr*>, clang::SourceLocation, clang::SourceLocation, clang::SourceLocation, clang::SourceLocation, clang::CXXScopeSpec&, const clang::DeclarationNameInfo&, llvm::ArrayRef<clang::Expr*>, {anonymous}::ReductionData&)’ at llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18715:68: llvm-project/llvm/include/llvm/ADT/APInt.h:192:18: error: ‘void operator delete [](void*)’ called on a pointer to an unallocated object ‘1’ [-Werror=free-nonheap-object] 192 | delete[] U.pVal; | ^~~~

This patch fixes: llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/Legality.h:85:16: error: private field 'Reason' is not used [-Werror,-Wunused-private-field]

…2765) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots.

…#112990) Renames LegalizeData to LegalizeDataValues since this pass fixes up SSA values. LegalizeData suggested that it fixed data mapping. This change also adds support to fix up ssa values for data clause operations. Effectively, compute regions within a data region use the ssa values from data operations also. The ssa values within data regions but not within compute regions are not updated. This change is to support the requirement in the OpenACC spec which notes that a visible data clause is not just one on the current compute construct but on the lexically containing data construct or visible declare directive.

Based on this RFC: https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306 LLVM intrinsics do not support out params. To get around this limitation implementers will make intrinsics return structs to capture a return type and an out param. This implementation detail should not impact scalarization since these cases should be elementwise operations. ## Three changes are needed. - The CallInst visitor needs to be updated to handle Structs - A new visitor is needed for `ExtractValue` instructions - finsh needs to be update to handle structs so that insert elements are properly propogated. ## Testing changes - Add support for `llvm.frexp` - Add support for `llvm.dx.splitdouble` fixes llvm#111437

…vm#112997) The x86-fold-tables.td has been failing for me and [in CI](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4) if Git happens to decide to check out the baseline file with Windows line endings. This fix for this is to add the `--strip-trailing-cr` option to diff to normalize the line endings before comparing them.

…lvm#112995) llvm#98060 introduced a warning for unterminated string constants, however it was only checking for `\n` which means that it produced strange results on Windows (always blaming column 1) including having the [associated test fail](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4) if Git happened to use Windows newlines when creating the file. This fix for this is to detect both `\r` and `\n`, but don't double-warn for Windows newlines.

…lazy archive symbol to the symbol table on ARM64EC (llvm#113284) On ARM64EC, a function symbol may appear in both mangled and demangled forms: - ARM64EC archives contain only the mangled name, while the demangled symbol is defined by the object file as an alias. - x86_64 archives contain only the demangled name (the mangled name is usually defined by an object referencing the symbol as an alias to a guess exit thunk). - ARM64EC import files contain both the mangled and demangled names for thunks. If more than one archive defines the same function, this could lead to different libraries being used for the same function depending on how they are referenced. Avoid this by checking if the paired symbol is already defined before adding a symbol to the table.

…m#112928) Member pointers refer to data or function members of a `CXXRecordDecl` and require a `MSInheritanceAttr` in order to be complete. Without that we cannot calculate their size in memory. The attempt has been causing a crash further down in the clang AST context. In order to implement the feature, DWARF will need a new attribtue to convey the information. For the moment, this patch teaches LLDB to handle to situation and avoid the crash.

…lvm#111130) Before this patch, redundant COPY couldn't be removed for the following case: ``` $R0 = OP ... ... // Read of %R0 $R1 = COPY killed $R0 ``` This patch adds support for tracking the users of the source register during backward propagation, so that we can remove the redundant COPY in the above case and optimize it to: ``` $R1 = OP ... ... // Replace all uses of %R0 with $R1 ```

This PR merges large offsets into the base address loading.

llvm#113309) llvm-cxxfilt can demangle names of data symbols, in addition to function names. $ llvm-cxxfilt _ZN6garden5gnomeE garden::gnome And type names too, on request: $ llvm-cxxfilt -t i int Update some overly specific the wording in the --help and documentation that suggests otherwise.

This patch adds these new vector sizes for neon: mfloat8x16_t and mfloat8x8_t According to the ARM ACLE PR#323[1]. [1] ARM-software/acle#323

llvm#111531) Bot maintainers should be aware and it became too much of a burden for developers. In particular on Windows, where make.exe won't be found in Path typically.

…2867) The Intel C++ Compiler (ICX) passes linker flags through the driver unlike MSVC and clang-cl, and therefore needs them to be prefixed with `/Qoption,link` (the equivalent of -Wl, for gcc on *nix). Use the `LINKER:` prefix for the `/EXPORT:` options in clang-repl, this expands to the correct flag for ICX and nothing for MSVC / clang-cl. RFC: https://discourse.llvm.org/t/rfc-cmake-linker-flags-need-wl-equivalent-for-intel-c-icx-on-windows/82446

These two veclibs are only available for AArch64 targets, and as mentioned in https://discourse.llvm.org/t/rfc-should-fveclib-imply-fno-math-errno-for-all-targets/81384, we (Arm) think that `-fveclib` should imply `-fno-math-errno`. By setting `-fveclib` the user shows they intend to use the vector math functions, which implies they don't care about errno. However, currently, the vector mappings won't be used in many cases without setting `-fno-math-errno` separately. Making this change would also help resolve some inconsistencies in how vector mappings are applied (see llvm#108980 (comment)). Note: Both SLEEF and ArmPL state that they do not set `errno`: - https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions * "The vector functions in libamath which are available on Linux may not set errno nor raise exceptions" - https://sleef.org/2-references/libm/ * "These functions do not set errno nor raise an exception."

…llvm#113167) Define `OmpIteratorSpecifier` and `OmpIteratorModifier` parser classes, and add parsing for them. Those are reusable between any clauses that use iterator modifiers. Add support for iterator modifiers to the MAP clause up to lowering, where a TODO message is emitted.

Reverts llvm#112603

…llvm#113452)

…at container-inserter does (llvm#113103) This patch implements LWG4016: container-insertable checks do not match what container-inserter does.

…lvm#111236) The underlying issue with msan was fixed by llvm#113200

When compiling for an SVE target we can use INDEX to generate constant fixed-length step vectors, e.g.: ``` uint32x4_t foo() { return (uint32x4_t){0, 1, 2, 3}; } ``` Currently: ``` foo(): adrp x8, .LCPI1_0 ldr q0, [x8, :lo12:.LCPI1_0] ret ``` With INDEX: ``` foo(): index z0.s, #0, #1 ret ``` The logic for this was already in `LowerBUILD_VECTOR`, though it was hidden under a check for `!Subtarget->isNeonAvailable()`. This patch refactors this to enable the corresponding code path unconditionally for constant step vectors (as long as we can use SVE for them).

mgehre-amd · 2025-01-23T14:11:05Z

Does a corresponding PR exist on Xilinx/onnx-mlir? If yes, could you please link them?

jorickert · 2025-01-23T14:17:18Z

Xilinx/onnx-mlir#272 I will update it today

jorickert · 2025-01-24T08:00:44Z

I update the onnx-bump, but we will need to include #455 in the same bump too

[AutoBump] Merge with a8d506b (Oct 22) (15)

[AutoBump] Merge with 8a9921f (Oct 23) (17)

[AutoBump] Merge with fixes of 519eef3 (Oct 22) (16) (Needs downstream changes)

kazutakahirata and others added 30 commits October 21, 2024 06:50

[tools] Don't call StringRef::str() when calling StringMap::find (NFC) (

61a286a

llvm#113119) StringMap::find takes StringRef. We don't need to create an instance of std::string from StringRef only to convert it right back to StringRef.

[lldb] Avoid repeated map lookups (NFC) (llvm#113121)

1bf1e92

[mlir] Avoid repeated map lookups (NFC) (llvm#113122)

af6e188

[AArch64] Use implicitTrunc in isBitfieldDstMask() (NFC)

e2074c6

This code intentionally discards the high bits, so set implicitTrunc=true. This is currently NFC but will enable an APInt assertion in the future.

[AArch64] Add some basic patterns for qshrn.

bd861d0

With the truncssat nodes these are relatively simple tablegen patterns to add. The existing intrinsics are converted to shift+truncsat to they can lower using the new patterns. Fixes llvm#112925.

[clang] Add covariance tests that make sure we return an error when r…

1dfdbf7

…eturn value is different in pointer / lvalue ref / rvalue ref (llvm#112853) Per https://cplusplus.github.io/CWG/issues/960.html.

[InstCombine] Preserve the flag from RHS only if the and is bitwise (…

a2ba438

…llvm#113164) Fixes llvm#113123 Alive proof: https://alive2.llvm.org/ce/z/hnqeLC

[rtsan][NFC] Rename *interceptors.cpp to *interceptors_posix.cpp (llv…

1e07c48

…m#112935) Done in preparation of exploring rtsan on windows.

[mlir] Fix shared build. NFC

e730231

Revert "[mlir][mlir-spirv-cpu-runner] Move MLIR pass pipeline to mlir…

17e9752

…-opt" (llvm#113176) Reverts llvm#111575 This caused build failures: https://lab.llvm.org/buildbot/#/builders/138/builds/5244

[PPC] Add custom lowering for uaddo (llvm#110137)

c5ca1b8

Improve the codegen for uaddo node for i64 in 64-bit mode and i32 in 32-bit mode by custom lowering.

[AIX][test] XFAIL constant folding log1p test

900b636

Test added by commit 47a6da2 fails on the AIX bot. So XFAIL for now to investigate further.

[win/asan] Fix instruction size for 44 0f b6 1a

8417f6a

movzx r11d,BYTE PTR [rdx] is four bytes long. Follow-up to llvm#111638

[NFC] Fix -WError for unused Encode/Decode ZK methods

42ba452

Remove the unused functions and register classes from the change below llvm@4679583

[SandboxVec][Legality] Scaffolding for Legality (llvm#112623)

54c93aa

This patch adds a LegalityResultWithReason class for describing the reason why legality decided not to vectorize the code.

[PowerPC] special case small int constant for custom scalar_to_vector (…

fc59f2c

…llvm#109850) Special case small int constant in the PPC custom lowering of scalar_to_vector.

[Vectorize] Fix a warning

d4630ae

This patch fixes: llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/Legality.h:85:16: error: private field 'Reason' is not used [-Werror,-Wunused-private-field]

[gn build] Port 54c93aa

006fb09

cjacek and others added 17 commits October 23, 2024 13:10

[CodeGen][NewPM] Port OptimizePHIs to NPM (llvm#113433)

c4c60c0

[LoongArch] Merge base and offset for large offsets (llvm#113277)

b225b15

This PR merges large offsets into the base address loading.

[CLANG][AArch64]Add Neon vectors for mfloat8_t (llvm#99865)

6dad29a

This patch adds these new vector sizes for neon: mfloat8x16_t and mfloat8x8_t According to the ARM ACLE PR#323[1]. [1] ARM-software/acle#323

[lldb][CMake] If make isn't found, print a warning but don't error out (

ba19e98

llvm#111531) Bot maintainers should be aware and it became too much of a burden for developers. In particular on Windows, where make.exe won't be found in Path typically.

Revert "[PowerPC] Expand global named register support" (llvm#113457)

a19f05b

Reverts llvm#112603

[PS5][Driver] Query OPT_r/OPT_shared/OPT_static just once (NFC) (…

5560f7e

…llvm#113452)

[libc++][ranges] LWG4016: container-insertable checks do not match wh…

7c72199

…at container-inserter does (llvm#113103) This patch implements LWG4016: container-insertable checks do not match what container-inserter does.

Reapply "[InstCombine] Folding (icmp eq/ne (and X, -P2), INT_MIN)" (l…

294726d

…lvm#111236) The underlying issue with msan was fixed by llvm#113200

[AutoBump] Merge with fixes of 0a17bdf (Oct 15)

60a0124

jorickert changed the title ~~[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14)~~ [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump) Jan 14, 2025

jorickert added 3 commits January 14, 2025 05:31

[AutoBump] Merge with a8d506b (Oct 22)

1b28c53

[AutoBump] Merge with fixes of 519eef3 (Oct 22)

dc1a1db

[AutoBump] Merge with 8a9921f (Oct 23)

3f5d12c

Base automatically changed from bump_to_f035d9f0 to feature/fused-ops January 23, 2025 14:10

jorickert added 3 commits January 24, 2025 09:43

Merge pull request #452 from Xilinx/bump_to_a8d506b3

d3ab502

[AutoBump] Merge with a8d506b (Oct 22) (15)

Merge pull request #454 from Xilinx/bump_to_8a9921f5

cc2a236

[AutoBump] Merge with 8a9921f (Oct 23) (17)

Merge pull request #453 from Xilinx/bump_to_519eef3b

ee69b74

[AutoBump] Merge with fixes of 519eef3 (Oct 22) (16) (Needs downstream changes)

jorickert changed the title ~~[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)~~ [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) Jan 24, 2025

mgehre-amd mentioned this pull request Jan 24, 2025

[AutoBump] Merge with fixes of aca33f17 (Oct 22, needs LLVM Oct 19) (92) Xilinx/torch-mlir#476

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) #451

[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) #451

jorickert commented Jan 14, 2025 •

edited by mgehre-amd

Loading

mgehre-amd commented Jan 23, 2025

jorickert commented Jan 23, 2025

jorickert commented Jan 24, 2025

[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) #451

Are you sure you want to change the base?

[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) #451

Conversation

jorickert commented Jan 14, 2025 • edited by mgehre-amd Loading

mgehre-amd commented Jan 23, 2025

jorickert commented Jan 23, 2025

jorickert commented Jan 24, 2025

jorickert commented Jan 14, 2025 •

edited by mgehre-amd

Loading