[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

) The original implementation provided a simple method to check whether the forest of nested cycles is well-formed. This is now augmented with other methods to check well-formedness of all cycles, either invdividually, or as the entire forest. These will be used by future transforms that modify CycleInfo.

Use the nuw attribute of GEPs to prove that pointers do not alias, in cases matching the following: + + + | BaseOffset | +<nuw> Indices | ---------------->|-------------------->| |-->V2Size | |-------> V1Size LHS RHS If the difference between pointers is Offset +<nuw> Indices then we know that the addition does not wrap the pointer index type (add nuw) and the constant Offset is a lower bound on the distance between the pointers. We can then prove NoAlias via Offset u>= V2Size.

…102613) After decomposition of OpenMP compound constructs and assignment of applicable clauses to each leaf construct, composite constructs are then combined again into a single element in the construct queue. This helped later lowering stages easily identify composite constructs. However, as a result of the re-composition stage, the same list of clauses is used to produce all MLIR operations corresponding to each leaf of the original composite construct. This undoes existing logic introducing implicit clauses and deciding to which leaf construct(s) each clause applies. This patch removes construct re-composition logic and updates Flang lowering to be able to identify composite constructs from a list of leaf constructs. As a result, the right set of clauses is produced for each operation representing a leaf of a composite construct. PR stack: - llvm#102612 - llvm#102613

…vm#104595) This new interface is supposed to capture the core functionality of DLTI: querying for values at keys. As such this new interface unifies the ability to query DLTI attributes in a single method: query(). All existing DLTI interfaces exposing their own query methods now 1) now extend this new interface and 2) provide a default implementation for `query()`. As DLTIQueryInterface::query() returns an attribute, it naturally enables recursive queries on nested DLTI attrs. A utility function, `dlti::query()`, implements the logic for nested lookups. A new `#dlti.map` attribute is introduced to capture the most generic form of a finite DLTI-mapping. One of the benefits is that it allows for more easily encoding hierachical information that is suitably queryable, i.e. by means of nested attributes. In line with the above, `transform.dlti.query` is modified so as to take an arbitrary number of keys and to perform a nested lookup using the above utility function.

…atalyst (llvm#104872) Mac Catalyst is the iOS platform, but it builds against the macOS SDK and so it needs to be checking the macOS SDK version instead of the iOS one. Add tests against a greater-than SDK version just to make sure this works beyond the initially supporting SDKs.

…lvm#102300)" This reverts commit b432afc. Reverted due to linker failures in expensive-checks.

…#104805) This extends SimplifyCFG hoisting to also hoist instructions with commuted operands, for example a+b on one side and b+a on the other side. This should address the issue mentioned in: llvm#91185 (comment)

Avoids implicit sint_to_fp which wasn't occurring on strict fp codegen Fixes llvm#104848

…cy zero (llvm#102915) A long time ago (back in 2009) there was a commit 52d4d82 that changed the scheduler to not dirty height/depth when adding or removing SUnit predecessors when the latency on the edge was zero. That commit message is claiming that the depth or height isn't affected when the latency is zero. As a matter of fact, the depth/height can change even with a zero latency on the edge. If for example adding a new SUnit A, with zero latency, but as a predecessor to a SUnit B, then both height of A and depth of B should be marked as dirty. If for example B has a greater height than A, then the height of A needs to be adjusted even if the latency is zero. I think this has been wrong for many years. Downstream we have had commit 52d4d82 reverted since back in 2016. There is no motivating lit test for 52d4d82 (only an incomplete C level reproducer in llvm#3613). After commit 13d04fa there finally appeared an upstream lit test that shows that we get better code if marking height/depth as dirty (llvm/test/CodeGen/AArch64/abds.ll).

…lvm#104781) E.g.: https://godbolt.org/z/G8zK5svjK Based on Evgenii's work.

This change does two kinds of splits: - Splits each target into a different file. Some targets are left in the same files, such as riscv32/64 and x86/_64 as these tests and lists are very similar. - Splits up the very long 'note:' lines which contain a list of CPUs, using `CHECK-SAME`. There was a note about this not being possible before, but with `{{^}}`, this is now possible -- I have verified that this does the right thing if a single CPU anywhere in the list is left out. These tests had become quite annoying to change when adding a CPU, and I believe this change makes these easier to maintain, and should cut down on conflicts in these files (or at least makes conflicts easier to resolve). I apologise in advance for downstream conflicts, but hopefully that's a small amount of short term pain, in return for fewer conflicts in future.

Small PR to add additional getters for LLVMContextRef in the C API.

…lvm#104775) Another upstreaming of C API extensions we have in Julia/LLVM.jl. Although [we went](maleadt/LLVM.jl#431) with a string-based API there, here I'm proposing something that's similar to existing metadata/attribute APIs: - explicit functions to map syncscope names to IDs, and back - `LLVM*SyncScope` versions of builder APIs that already take a `SingleThread` argument: atomic rmw, atomic xchg, fence - `LLVMGetAtomicSyncScopeID` and `LLVMSetAtomicSyncScopeID` for other atomic instructions - testing through `llvm-c-test`'s `--echo` functionality

Add a hint to use the no-verify-fixpoint option.

These are annoying to update, and are redundant since the tests in clang/test/Driver/print-enabled-extensions/ were added.

@davemgreen

) Patterns were previously added to allow the following reductions - fminimum(abs(a), abs(b)) -> famin(a, b) - fmaximum(abs(a), abs(b)) -> famax(a, b) - llvm#103027 It was suggested by @davemgreen that the following reductions are also possible - fminnum[nnan](abs(a), abs(b)) -> famin(a, b) - fmaxnum[nnan](abs(a), abs(b)) -> famax(a, b) ('nnan' documenatation: https://llvm.org/docs/LangRef.html#fast-math-flags) The 'no NaNs' flag allows optimisations to assume that neither argument is a NaN, and so the differing NaN propagation semantics of llvm.maxnum/llvm.minnum and FAMAX/FAMIN can be ignored in this reduction. (llvm.maxnum/llvm.minnum: https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic) - Changes to LLVM - lib/target/AArch64/AArch64InstrInfo.td - add 'fminnm_nnan' and 'fmaxnm_nnan'; patfrags on fminnm/fmaxnm that are predicated on the instrinsic call having the 'nnan' flag. - add AArch64famin and AArch64famax patfrags, containing the new and existing reductions. - test/CodeGen/AArch64/aarch64-neon-faminmax.ll - add positive and negative tests for the new reduction, based on the presence of 'nnan' in the IR intrinsic call.

This patch moves utilities from `offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h` to `llvm/Frontend/Offloading/Utility.h` to be reused by other projects. Concretely the following changes were made: - Rename `KernelMetaDataTy` to `AMDGPUKernelMetaData`. - Remove unused fields `KernelObject`, `KernelSegmentSize`, `ExplicitArgumentCount` and `ImplicitArgumentCount` from `AMDGPUKernelMetaData`. - Return the produced error if `ELFObj.sections()` failed instead of using `cantFail`. - Added `AGPRCount` field to `AMDGPUKernelMetaData`. - Added a default invalid value to all the fields in `AMDGPUKernelMetaData`.

…vm#104692) Inline asm operands could contain any kind of relocation, so remove the checks. Fixes llvm#103493

…e_map (llvm#104918) This test is already disabled for Windows because of symlinks. Disable it for cross build on Windows host too.

This change looks for instructions of storing symmetric constants instruction 32-bit units. usually consisting of several 'MOV' and one or less 'ORR'. If found, load only the lower 32-bit constant and change it to copy and save to the upper 32-bit using the 'STP' instruction. For example: renamable $x8 = MOVZXi 49370, 0 renamable $x8 = MOVKXi $x8, 320, 16 renamable $x8 = ORRXrs $x8, $x8, 32 STRXui killed renamable $x8, killed renamable $x0, 0 becomes $w8 = MOVZWi 49370, 0 $w8 = MOVKWi $w8, 320, 16 STPWi killed renamable $w8, killed renamable $w8, killed renamable $x0, 0 related issue : llvm#51483

…lvm#102300)" This reverts commit 4aacc60. The original implementation provided a simple method to check whether the forest of nested cycles is well-formed. This is now augmented with other methods to check well-formedness of every cycle, either individually, or as the entire forest. These will be used by future transforms that modify CycleInfo.

This extends the existing sxtw peephole optimization (llvm#96293) to uxtw, which in llvm is a ORRWrr which clears the top bits. Fixes llvm#98481

…odel=aggressive (llvm#100453) This change modifies -ffp-model=fast to select options that more closely match -funsafe-math-optimizations, and introduces a new model, -ffp-model=aggressive which matches the existing behavior (except for a minor change in the fp-contract behavior). The primary motivation for this change is to make -ffp-model=fast more user friendly, particularly in light of LLVM's aggressive optimizations when -fno-honor-nans and -fno-honor-infinites are used. This was previously proposed here: https://discourse.llvm.org/t/making-ffp-model-fast-more-user-friendly/78402

Baed off worst case llvm-mca numbers for CTLZ/CTTZ(+ZERO_UNDEF) codegen Prep work for llvm#102885

Reland [CGData] llvm-cgdata llvm#89884 using `Opt` instead of `cl` - Action options are required, `--convert`, `--show`, `--merge`. This was similar to sub-commands previously implemented, but having a prefix `--`. - `--format` option is added, which specifies `text` or `binary`. --------- Co-authored-by: Kyungwoo Lee <[email protected]>

…tadata Analysis (llvm#104828) Add Validator Version to information collected by Module Metadata Analysis pass. An earlier change (llvm#104040) added a default hardcoded value for validator version to be associated with DXIL module created during HLSL source compilation. Add tests to verify validator version info collected - Updated existing tests - Added a test with validator version specified in DXIL metadata

Add back missing includes and revert revert "[clang][ExtractAPI] Stop dropping fields of nested anonymous record types when they aren't attached to variable declaration (llvm#104600)"

/llvm-project/llvm/tools/llvm-cgdata/llvm-cgdata.cpp:349:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] default: ^ 1 error generated.

Failure with -Werror buildbot caused by llvm#104587

It is only used by CodeGen so does not need to be shared with the assembler/disassembler.

This patch adds an NVVM intrinsic and NVPTX codegen for the elect.sync PTX instruction. Lit tests are added in elect.ll and verified through ptxas. PTX ISA reference: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync Signed-off-by: Durgadoss R <[email protected]>

…m#102618) This explicit listing of the bitcodes is out of date, and had fallen out of date in the past as well. Delete the explicit listing and point users to where they can find it.

This explicitly allows the `emitc.ptrdiff_t` type for the result of substrating two pointers and changes the example accordingly.

Delete an unnecessary test added in an earlier PR.

…#104430)

…lvm#104148) `hasOperands` does not always execute matchers in the order they are written. This can cause issue in code using bindings when one operand matcher is relying on a binding set by the other. With this change, the first matcher present in the code is always executed first and any binding it sets are available to the second matcher. Simple example with current version (1 match) and new version (2 matches): ```bash > cat tmp.cpp int a = 13; int b = ((int) a) - a; int c = a - ((int) a); > clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here int a = 13; ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here int b = ((int)a) - a; ^~~~~~~~~~~~ 1 match. > ./build/bin/clang-query tmp.cpp clang-query> set traversal IgnoreUnlessSpelledInSource clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d")))))) Match #1: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:2:9: note: "root" binds here 2 | int b = ((int)a) - a; | ^~~~~~~~~~~~ Match #2: tmp.cpp:1:1: note: "d" binds here 1 | int a = 13; | ^~~~~~~~~~ tmp.cpp:3:9: note: "root" binds here 3 | int c = a - ((int)a); | ^~~~~~~~~~~~ 2 matches. ``` If this should be documented or regression tested anywhere please let me know where.

This is a C++17 feature implemented in all supported compilers. Pull Request: llvm#104898

We can remove the variable from https://reviews.llvm.org/D5610 since link.h is available on Linux (glibc/musl/Bionic), FreeBSD, and NetBSD. Use `__has_include(<link.h>)` before including it. Pull Request: llvm#104893

Updates some of the irdl documentation to be in line with the current state of IRDL. Also removes some trailing spaces in this documentation.

If we can't transform the region to SPMD, we should not wait till the end to decide that. Other AAs might assume SPMD, and we did set the constant initializer to indicate SPMD, but we did not change the code properly.

This fixes a build break from [llvm/llvm-project] Reland [CGData] llvm-cgdata llvm#89884 (PR llvm#101461)

Was missing remainder on `Start` value. Also changed logic as as nikic suggested (getting loop from `PN` instead of `Rem`). The prior impl increased the complexity of the code and made debugging it more difficult. Closes llvm#104877

This already works, just adding coverage to show that before a change which depends on this functionality.

https://cplusplus.github.io/CWG/issues/722.html nullptr passed to a variadic function now converted to void* in C++. This does not affect C23 nullptr. Also fixes -Wformat-pedantic so that it no longer warns for nullptr passed to %p (because it is converted to void* in C++ and it is allowed for va_arg(ap, void*) in C23)

Add dep on ControlFlowInterfaces for arith td files

…gression from llvm#101751. (llvm#104114) If c1 is a shifted mask with c3 leading zeros and c4 trailing zeros. If c2 is greater than c3, we can use (srli (srai y, c2 - c3), c3 + c4) followed by a SHXADD with c4 as the X amount. Without Zba we can use (slli (srli (srai y, c2 - c3), c3 + c4), c4). Alive2: https://alive2.llvm.org/ce/z/AwhheR

Implement support for HLSL intrinsic saturate. Implement DXIL codegen for the intrinsic saturate by lowering it to DXIL Op dx.saturate. Implement SPIRV codegen by transforming saturate(x) to clamp(x, 0.0f, 1.0f). Add tests for DXIL and SPIRV CodeGen.

…4837) - Replace use of std::isalpha, std::isdigit, std:isxdigit with LLVM's StringExtras versions, to avoid possibly locale dependent behavior (e.g. glibc). - Create helper function for common checks for valid identifier characters.

Summary: Currently, we assign this to private memory. This causes failures on some SOLLVE tests. The standard isn't clear on the semantics of this allocation type, but there seems to be a consensus that it's supposed to be shared memory.

…104713) The function template<class Duration> requires three_way_comparable_with<sys_seconds, sys_time<Duration>> constexpr auto operator<=>(const leap_second& x, const sys_time<Duration>& y) noexcept; Has a recursive constrained. This caused an infinite loop in GCC and is now hit by llvm#102857. A fix would be to make this function a hidden friend, this solution is propsed in LWG4139. For consistency all comparisons are made hidden friends. Since the issue causes compilation failures no additional test are needed. Fixes: llvm#104700

This PR adds conversion patterns for GPU to the `convert-to-spirv` pass, introduced in llvm#95942. Now the pass is able to convert each `gpu.module` and its ops within a `builtin.module` into a `spirv.module`. **Future Plans** - Use `gpu.launch_func` to invoke kernel from host functions - Potentially integrate into the `mlir-vulkan-runner` for e2e testing

…#104851) This enables performing several reductions in parallel, each smaller than the size of the subgroup. One potential application is flash attention with subgroup-wide matrix multiplication and reduction combined in one kernel. The multiplication operation requires a 2D matrix to be distributed over the lanes of the subgroup, which then constrains the shape the following reduction can have if we want to keep data in registers.

…ernalASTSources (llvm#104799) When we use `SemaSourceWithPriorities` as the `ASTContext`s ExternalASTSource, we allocate a `ClangASTSourceProxy` (via `CreateProxy`) and two `ExternalASTSourceWrapper`. Then we push these sources into a vector in `SemaSourceWithPriorities`. The allocated `SemaSourceWithPriorities` itself will get properly deallocated because the `ASTContext` wraps it in an `IntrusiveRefCntPtr`. But the three sources we allocated earlier will never get released. This patch fixes this by mimicking what `MultiplexExternalSemaSource` does (which is what `SemaSourceWithPriorities` is based on anyway). I.e., when `SemaSourceWithPriorities` gets constructed, it increments the use count of its sources. And on destruction it decrements them. Similarly, to make sure we dealloacted the `ClangASTProxy` properly, the `ExternalASTSourceWrapper` now assumes shared ownership of the underlying source.

…lvm#105162) Reverts llvm#99953

…inedButUsed (llvm#104817) While parsing an expression, Clang tries to diagnose usage of decls (with possibly non-external linkage) for which it hasn't been provided with a definition. This is the case, e.g., for functions with parameters that live in an anonymous namespace (those will have `UniqueExternal` linkage, this is computed [here in computeTypeLinkageInfo](https://github.com/llvm/llvm-project/blob/ea8bb4d633683f5cbfd82491620be3056f347a02/clang/lib/AST/Type.cpp#L4647-L4653)). Before diagnosing such situations, Clang calls `ExternalSemaSource::ReadUndefinedButUsed`. The intended use of this API is to extend the set of "used but not defined" decls with additional ones that the external source knows about. However, in LLDB's case, we never provide `FunctionDecl`s with a definition, and instead rely on the expression parser to resolve those symbols by linkage name. Thus, to avoid the Clang parser from erroring out in these situations, this patch implements `ReadUndefinedButUsed` which just removes the "undefined" non-external `FunctionDecl`s that Clang found. We also had to add an `ExternalSemaSource` to the `clang::Sema` instance LLDB creates. We previously didn't have any source on `Sema`. Because we add the `ExternalASTSourceWrapper` here, that means we'd also technically be adding the `ClangExpressionDeclMap` as an `ExternalASTSource` to `Sema`, which is fine because `Sema` will only be calling into the `ExternalSemaSource` APIs (though nothing currently strictly enforces this, which is a bit worrying). Note, the decision for whether to put a function into `UndefinedButUsed` is done in [Sema::MarkFunctionReferenced](https://github.com/llvm/llvm-project/blob/ea8bb4d633683f5cbfd82491620be3056f347a02/clang/lib/Sema/SemaExpr.cpp#L18083-L18087). The `UniqueExternal` linkage computation is done in [getLVForNamespaceScopeDecl](https://github.com/llvm/llvm-project/blob/ea8bb4d633683f5cbfd82491620be3056f347a02/clang/lib/AST/Decl.cpp#L821-L833). Fixes llvm#104712

This patch tries to fix an issue with the windows debug builds where the PDB file for python scripted interfaces cannot be opened since its path length exceed the windows `MAX_PATH` limit: llvm#101672 (comment) This patch addresses the issue by building all the interfaces as a single library plugin that initiliazes each component as part of its `Initialize` method, instead of building each interface as its own library plugin. This keeps the build artifact path length smaller while respecting the naming convention and without making any exception in the build system. Fixes llvm#104895. Signed-off-by: Med Ismail Bennani <[email protected]>

This will be needed when maintaining the contextual profile for ICP or inlining - we'll need to first fetch the ID of a callsite, which is in an instrumentation instruction (intrinsic) preceding the callsite.

This introduces an anonymous class "OpLowerer" to help with lowering DXIL ops, and moves the DXILOpBuilder there instead of creating a new one for every operation. DXILOpBuilder is also changed to own its IRBuilder, since that makes it simpler to ensure that it isn't misused. Pull Request: llvm#104248

The `Complex` class in tablegen tries to store its element type, but due to a name collision it actually ends up storing the `type` field of the `ConfinedType` superclass, and so `elementType` is always set to `AnyComplex`. This renames the field so that it gets correctly set.

…102546)

…vm#98950) Both IRVerifier and Machine Verifier are updated

This test causes the assert in clang CodeGen and python crashes with the error code 0x80000003. See llvm#105019 for more details. Note the similar test lldb/test/API/lang/c/bitfields/TestBitfields.py is already disabled on Windows.

This patch hardens the "test iterators" we use to test algorithms by ensuring that they don't get double-moved. As a result of this hardening, the tests started reporting multiple failures where we would double-move iterators, which are being fixed in this patch. In particular: - Fixed a double-move in pstl.partition - Add coverage for begin()/end() in subrange tests - Fix tests for ranges::ends_with and ranges::contains, which were incorrectly calling begin() twice on the same subrange containing non-copyable input iterators. Fixes llvm#100709

…104650) In a mach_header, the cpusubtype is a 32-bit field, but it's split in 2 subfields: - the low 24 bits containing the cpu subtype proper, (e.g., CPU_SUBTYPE_ARM64E 2) - the high 8 bits containing a capability field used for additional feature flags. Notably, it's only the subtype subfield that participates in fat file slice discrimination: the caps are ignored. arm64e uses the caps subfield to encode a ptrauth ABI version: - 0x80 (CPU_SUBTYPE_PTRAUTH_ABI) denotes a versioned binary - 0x40 denotes a kernel-ABI binary - 0x00-0x0F holds the ptrauth ABI version This teaches the basic obj tools to decode that (or ignore it when unneeded). It also teaches the MachO writer to default to emitting versioned binaries, but with a version of 0 (and without the kernel ABI flag). Modern arm64e requires versioned binaries: a binary with 0x00 caps in cpusubtype is now rejected by the linker and everything after. We can live without the sophistication of specifying the version and kernel ABI for now. Co-authored-by: Francis Visoiu Mistrih <[email protected]>

Expand the accepted types for gpu.shuffle to any integer, float or 1d vector of integers or floats. Also updated the gpu-to-llvm-spv pass to support those types.

This patch makes changes to improve syntax in tests and to add more strict checks on cat output. This is a prequisite for llvm#101530.

…ernal shell (llvm#104878) This patch changes the test that uses the `cat -e` option to `cat -v` so that the test can be run using lit's internal shell. For `cat`, the `-v` option prints non-printing characters in ^ and M- notation, while the `-e` option adds `$` to the end of lines in addition to printing non-printing characters in ^ and M- notation. This is an alternative patch to llvm#102061, opting to rewrite the test that uses `cat -e` instead of extending support to the `-e` option. Fixes llvm#102377

…ltins (llvm#96873)

llvm#102642) …uctors A non-conforming extension to Fortran present in a couple other compilers is allowing a anonymous component in a structure constructor to initialize a parent (or greater ancestor) component. This was working in this compiler only for direct parents, and only when the type was not use-associated. Fixes llvm#102557.

Interfaces don't inherit the IMPLICIT typing rules of their enclosing scope, and separate MODULE PROCEDUREs inherit the IMPLICIT typing rules of submodule in which they are defined, not the rules from their interface. Fixes llvm#102558.

A bare ALLOCATE statement with no SOURCE= rightly earns a warning about an undefined function result, if that result is an allocatable that appears in the ALLOCATE. But in the case of a pointer, where the warning should care more about the pointer's association status than the value of its target, a bare ALLOCATE should suffice to silence the warning.

Don't complain about a local object with an impure final procedure in a pure subprogram when the local object is a named constant. Fixes llvm#104796.

Conversions of infinities from other kinds to real(10) were incorrect, and comparisons of real(2) vs real(3) are dicey as conversions in one direction can overflow and conversions in the other can lose precision. Use real(16) as the common type for comparisons in IEEE_NEAREST_AFTER.

- Allocas and GlobalValues cannot be simplified, so we should not try. - If we never used any assumed state, the AAUnderlyingObjects doesn't require an additional update. - If we have seen an object (or it's underlying object) before, we do not need to inspect it anymore. The original logic for "SeenObjects" was flawed and caused us to add intermediate values to the underlying object list if a PHI or select instruction referenced the same underlying object twice. The test changes are all instances of this situation and we now correctly derive `memory(none)` for the functions that only access stack memory. --------- Co-authored-by: Shilei Tian <[email protected]>

…m#96874)

…vm#96875)

…ins (llvm#96876)

This patch implements sandboxir::CatchSwitchInst mirroring llvm::CatchSwitchInst.

AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it should be soon removable.

This is not complete, but gets AtomicExpand running. I was able to get further than I expected; we're quite close to having all the IR codegen passes ported.

Co-authored-by: Fangrui Song <[email protected]>

Fixed two typos: 1. `__builin_va_list` --> `__builtin_va_list` 2. `__builin_suspend` --> `__builtin_suspend`

This part of the manual describes uses of `ActOnXXX` and `BuildXXX`.

flang/test/Evaluate/fold-nearest.f90 is failing oddly on ppc64le; disable it for now while I sort things out.

) Based on experience with SelectionDAG and experimental-rv64-legal-i32, I don't believe making s32 a legal type is viable without introducing an invariant that s32 values are always sign extended like Mips64 does. Mips64 does this with a separate 32-bit register class. `experimental-rv64-legal-i32` was removed in #llvm#102509. This patch is part of a series to remove s32 support so we can remove the isel patterns that SelectionDAG is no longer using. To restore code quality, we will need to add custom W nodes like SelectionDAG.

@efriedma-quic

Introduce "-fsanitize-undefined-ignore-overflow-pattern=" which can be used to disable sanitizer instrumentation for common overflow-dependent code patterns. For a wide selection of projects, proper overflow sanitization could help catch bugs and solve security vulnerabilities. Unfortunately, in some cases the integer overflow sanitizers are too noisy for their users and are often left disabled. Providing users with a method to disable sanitizer instrumentation of common patterns could mean more projects actually utilize the sanitizers in the first place. One such project that has opted to not use integer overflow (or truncation) sanitizers is the Linux Kernel. There has been some discussion[1] recently concerning mitigation strategies for unexpected arithmetic overflow. This discussion is still ongoing and a succinct article[2] accurately sums up the discussion. In summary, many Kernel developers do not want to introduce more arithmetic wrappers when most developers understand the code patterns as they are. Patterns like: if (base + offset < base) { ... } or while (i--) { ... } or #define SOME -1UL are extremely common in a code base like the Linux Kernel. It is perhaps too much to ask of kernel developers to use arithmetic wrappers in these cases. For example: while (wrapping_post_dec(i)) { ... } which wraps some builtin would not fly. This would incur too many changes to existing code; the code churn would be too much, at least too much to justify turning on overflow sanitizers. Currently, this commit tackles three pervasive idioms: 1. "if (a + b < a)" or some logically-equivalent re-ordering like "if (a > b + a)" 2. "while (i--)" (for unsigned) a post-decrement always overflows here 3. "-1UL, -2UL, etc" negation of unsigned constants will always overflow The patterns that are excluded can be chosen from the following list: - add-overflow-test - post-decr-while - negated-unsigned-const These can be enabled with a comma-separated list: -fsanitize-undefined-ignore-overflow-pattern=add-overflow-test,negated-unsigned-const "all" or "none" may also be used to specify that all patterns should be excluded or that none should be. [1] https://lore.kernel.org/all/202404291502.612E0A10@keescook/ [2] https://lwn.net/Articles/979747/ CCs: @efriedma-quic @kees @jyknight @fmayer @vitalybuka Signed-off-by: Justin Stitt <[email protected]> Co-authored-by: Bill Wendling <[email protected]>

Summary: This test mysteriously fails on the bots but not locally, disable until I can figure out why.

This breaks using -passes=atomic-expand (but only sometimes?). Somehow an AtomicExpand pass ends up running without a TargetMachine, despite always being constructed with one.

A new section of a test is failing on aarch64 and ppc64le; disable it while I sort things out.

CFI programs may have more saves than restores and this is completely benign from BOLT's perspective. Reduce the verbosity and print the warning only under `-v=1` and above.

…ich can trap (llvm#105214) This allows the use a single wider operation with a restricted EVL instead of having to split and cover via decreasing powers-of-two sizes. On RISCV, this avoids the need for a bunch of vslidedown and vslideup instructions to extract subvectors, and VL toggles to switch between the various widths. Note there is a potential downside of using vp nodes; we loose any generic DAG combines which might have applied to the split form.

Previously the libc startup code was marked `EXCLUDE_FROM_ALL` due to build issues. This patch removes that as no longer necessary.

Nowadays, an ASM_MASM tool is required for building the BLAKE3 assembly in llvm/lib/Support - the llvm-ml tool can do this.

In llvm#100024 we moved from safe_load to load for reading the yaml in newheadergen due to dependency issues. Those should be resolved by now so this should be a simple safety improvement.

Rework `IntrinsicEmitter::EmitIntrinsicToBuiltinMap` for improved peformance as well as refactor the code. Performance: - Current generated code does a linear search on the TargetPrefix, followed by a binary search on the builtin names for that target's builtins. - Improve the performance of this code in 2 ways: (a) Use binary search on the target prefix to lookup the builtin table for the target. (b) Improve the (common) case of when all builtins for a target share a common prefix. Check this common prefix first, and then do the binary search in the builtin table using the builtin name with the common prefix removed. This should help both data size (by creating a smaller static string table) and runtime (by reducing the cost of binary search on smaller strings). Refactor: - Use range based for loops for iterating over maps. - Use formatv() and C++ raw string literals to simplify the emission code. - Change the generated `getIntrinsicForClangBuiltin` and `getIntrinsicForMSBuiltin` to take a `StringRef` instead of `const char *` for the prefix.

Don't try to fold x87 extended precision operations in a test unless it's targeting x86-64.

The mismatch between the comment on this test and the test itself was pointed out in llvm#100699 (comment), but apparently I failed to actually commit the fix.

…a `cold` function Closes llvm#101298

This recently added test is failing on Windows with: ``` c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe --no-lldbinit -S C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\Expr\Output\TestAnonNamespaceParamFunc.cpp.tmp -o run -o "expression func(a)" -o exit | c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\Expr\TestAnonNamespaceParamFunc.cpp executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe' --no-lldbinit -S 'C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\Expr\Output\TestAnonNamespaceParamFunc.cpp.tmp' -o run -o 'expression func(a)' -o exit .---command stderr------------ | TestAnonNamespaceParamFunc.cpp.tmp :: Class 'tagARRAYDESC' has a member 'tdescElem' of type 'tagTYPEDESC' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'tagARRAYDESC' has a member 'tdescElem' of type 'tagTYPEDESC' which does not have a complete definition. | (lldb) TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::partial_ordering' has a member 'less' of type 'std::partial_ordering' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::partial_ordering' has a member 'less' of type 'std::partial_ordering' which does not have a complete definition. | (lldb) TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::strong_ordering' has a member 'less' of type 'std::strong_ordering' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::strong_ordering' has a member 'less' of type 'std::strong_ordering' which does not have a complete definition. | (lldb) TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::weak_ordering' has a member 'less' of type 'std::weak_ordering' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::weak_ordering' has a member 'less' of type 'std::weak_ordering' which does not have a complete definition. | (lldb) error: Couldn't look up symbols: | int func(struct `anonymous namespace'::InAnon) | Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler. `----------------------------- executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\Expr\TestAnonNamespaceParamFunc.cpp' .---command stderr------------ | C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\Expr\TestAnonNamespaceParamFunc.cpp:10:11: error: CHECK: expected string not found in input | // CHECK: (int) $0 = 15 | ^ | <stdin>:16:26: note: scanning from here | (lldb) expression func(a) | ^ ``` So the function is still not callable. But AFAICT, this is not a regression, since this function wasn't callable prior to the patch anyway. I currently do not have a Windows setup to test this on, so XFAIL for now.

Remove widenToNextPow2 from StoreActions. Reorder clampScalar and lowerIfMemSizeNotByteSizePow2 for StoreActions. These match AArch64 and got me further on a test case I was playing with that contained a i129 store.

…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```

…entwiseOpFusion (llvm#104409) This commit changes the getPreservedProducerResults function so that it takes the consumer into account along with the producer, in order to predict which of the producer’s outputs can be dropped during the fusion process. It provides a more accurate prediction, considering that the fusion process also depends on the consumer.

This wires up dxil-op-lower, dxil-intrinsic-expansion, dxil-translate-metadata, and dxil-pretty-printer to the new pass manager, both as a matter of future proofing the backend and so that they can be used more flexibly in tests. A few arbitrary tests are updated in order to test the new PM path, and we drop the "print-dxil-resource-md" pass since it's redundant with the pretty printer. Pull Request: llvm#104250

It didn't crash so I thought this worked now, but upon further review it miscalculates the stack address for the return.

Specifically, to illustrate our general lowering strategy for non-power of two vectors.

…vm#104826) With -fsanitize-cfi-icall-experimental-normalize-integers, Clang appends ".normalized" to KCFI types in CodeGenModule::CreateKCFITypeId, which changes type hashes also for functions that don't have integer types in their signatures. However, llvm::setKCFIType does not take integer normalization into account, which means LLVM generated functions with KCFI types, e.g. sanitizer constructors, will fail KCFI checks when integer normalization is enabled in Clang. Add a cfi-normalize-integers module flag to indicate integer normalization is used, and append ".normalized" to KCFI types also in llvm::setKCFIType to fix the type mismatch.

For the tests I just added +sve instead of what actual hardware has, which is only SME, since otherwise all the test functions need to be marked as streaming mode. rdar://121864771

…Lowering::lowerReturn. NFC This is similar to X86 and AArch64 structure.

/llvm-project/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp:124:7: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result] opOperandsToIgnore.pop_back_val(); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated.

Since 2.2, `fmin.s/fmax.s` instructions follow the IEEE754-2019, if F extension is avaiable; and `fmin.d/fmax.d` also follow the IEEE754-2019 if D extension is avaiable. So, let's mark them as Legal.

This reverts commit 6476a1d. d3fb41d relanded in 9bb5556. ...amended to incorporate changes from the reland.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

Commits on Aug 20, 2024

Commits on Aug 21, 2024

Commits on Sep 20, 2024

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

Are you sure you want to change the base?

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

Commits on Aug 20, 2024

Commits on Aug 21, 2024

Commits on Sep 20, 2024