Allow link to llvm shared library for current distros #68

…lvm#90486) Noticed that there already was a function in APInt that updated a FoldingSet so there was no need for me to add it in llvm#84617.

This ensures the explicit value is generated (and not a load into the values array). Note that actually not storing values array at all is still TBD, this is just the very first step.

…imental late parsing mode "extension" (llvm#88596) This patch changes the `LateParsed` field of `Attr` in `Attr.td` to be an instantiation of the new `LateAttrParseKind` class. The instation can be one of the following: * `LateAttrParsingNever` - Corresponds with the false value of `LateParsed` prior to this patch (the default for an attribute). * `LateAttrParseStandard` - Corresponds with the true value of `LateParsed` prior to this patch. * `LateAttrParseExperimentalExt` - A new mode described below. `LateAttrParseExperimentalExt` is an experimental extension to `LateAttrParseStandard`. Essentially this allows `Parser::ParseGNUAttributes(...)` to distinguish between these cases: 1. Only `LateAttrParseExperimentalExt` attributes should be late parsed. 2. Both `LateAttrParseExperimentalExt` and `LateAttrParseStandard` attributes should be late parsed. Callers (and indirect callers) of `Parser::ParseGNUAttributes(...)` indicate the desired behavior by setting a flag in the `LateParsedAttrList` object that is passed to the function. In addition to the above, a new driver and frontend flag (`-fexperimental-late-parse-attributes`) with a corresponding LangOpt (`ExperimentalLateParseAttributes`) is added that changes how `LateAttrParseExperimentalExt` attributes are parsed. * When the flag is disabled (default), in cases where only `LateAttrParsingExperimentalOnly` late parsing is requested, the attribute will be parsed immediately (i.e. **NOT** late parsed). This allows the attribute to act just like a `LateAttrParseStandard` attribute when the flag is disabled. * When the flag is enabled, in cases where only `LateAttrParsingExperimentalOnly` late parsing is requested, the attribute will be late parsed. The motivation behind this change is to allow the new `counted_by` attribute (part of `-fbounds-safety`) to support late parsing but **only** when `-fexperimental-late-parse-attributes` is enabled. This attribute needs to support late parsing to allow it to refer to fields later in a struct definition (or function parameters declared later). However, there isn't a precedent for supporting late attribute parsing in C so this flag allows the new behavior to exist in Clang but not be on by default. This behavior was requested as part of the `-fbounds-safety` RFC process (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/68). This patch doesn't introduce any uses of `LateAttrParseExperimentalExt`. This will be added for the `counted_by` attribute in a future patch (llvm#87596). A consequence is the new behavior added in this patch is not yet testable. Hence, the lack of tests covering the new behavior. rdar://125400257

In new pass system, `MachineFunction` could be an analysis result again, machine module pass can now fetch them from analysis manager. `MachineModuleInfo` no longer owns them. Remove `FreeMachineFunctionPass`, replaced by `InvalidateAnalysisPass<MachineFunctionAnalysis>`. Now `FreeMachineFunction` is replaced by `InvalidateAnalysisPass<MachineFunctionAnalysis>`, the workaround in `MachineFunctionPassManager` is no longer needed, there is no difference between `unittests/MIR/PassBuilderCallbacksTest.cpp` and `unittests/IR/PassBuilderCallbacksTest.cpp`.

This is used when -march=native run on an unknown CPU to old version of LLVM.

Skip updating references for operands that do not directly refer to jump table symbols but fall within a jump table's address range to prevent unintended modifications.

within module purview Close llvm#90259 Technically, the static declarations shouldn't be leaked from the module interface, otherwise it is an illegal program according to the spec. So we can get rid of the static declarations from the reduced BMI technically. Then we can close the above issue. However, there are too many `static inline` codes in existing headers. So it will be a pretty big breaking change if we do this globally.

…g/ec/llvm-project into amd-staging

Change-Id: Icf8748fff11482f16cbeb1f19baf5a3404b57c6e

Disable this test on x86_64h for LSan. This test is failing with malformed object only on x86_64h. Disabling for now. rdar://125052424

…elism with multi-frame parallelism https://reviews.llvm.org/D133679 utilizes zstd's multithread API to create one single frame. This provides a higher compression ratio but is significantly slower than concatenating multiple frames. With manual parallelism, it is easier to parallelize memcpy in OutputSection::writeTo for parallel memcpy. In addition, as the individual allocated decompression buffers are much smaller, we can make a wild guess (compressed_size/4) without worrying about a resize (due to wrong guess) would waste memory.

…ng-parentheses` (llvm#90279) When a binary operator is the last operand of a macro, the end location that is past the `BinaryOperator` will be inside the macro and therefore an invalid location to insert a `FixIt` into, which is why the check bails when encountering such a pattern. However, the end location is only required for the `FixIt` and the diagnostic can still be emitted, just without an attached fix.

…te module file for C++20 modules instead of PCHGenerator Previously we're re-using PCHGenerator to generate the module file for C++20 modules. But this is slighty more or less odd. This patch tries to use a new class 'CXX20ModulesGenerator' to generate the module file for C++20 modules.

llvm#90522) LegalizeVectorType is responsible for legalizing nodes that perform an operation on each element may need to scalarize. This is not true for nodes like VP_REDUCE.*, BUILD_VECTOR, SHUFFLE_VECTOR, EXTRACT_SUBVECTOR, etc. This patch drops any nodes with a scalar result from LegalizeVectorOps and handles them in LegalizeDAG instead. This required moving the reduction promotion to LegalizeDAG. I have removed the support integer promotion as it was incorrect for integer min/max reductions. Since it was untested, it was best to assert on it until it was really needed. There are a couple regressions that can be fixed with a small DAG combine which I will do as a follow up.

Close llvm#75057 Previously, I thought the diagnostic mappings is not meaningful with modules incorrectly. And this problem get revealed by another change recently. So this patch tried to rever the previous "optimization" partially.

…uctions Marking them as `hasSideEffects=1` stops some optimizations. According to `Target.td`: > // Does the instruction have side effects that are not captured by any > // operands of the instruction or other flags? > bit hasSideEffects = ?; It seems we don't need to set `hasSideEffects` for vleNff since we have modelled `vl` as an output operand. As for saturating instructions, I think that explicit Def/Use list is kind of side effects captured by any operands of the instruction, so we don't need to set `hasSideEffects` either. And I have just investigated AArch64's implementation, they don't set this flag and don't add `Def` list. These changes make optimizations like `performCombineVMergeAndVOps` and MachineCSE possible for these instructions. As a consequence, `copyprop.mir` can't test what we want to test in https://reviews.llvm.org/D155140, so we replace `vssra.vi` with a VCIX instruction (it has side effects). Reviewers: jacquesguan, topperc, preames, asb, lukel97 Reviewed By: topperc, lukel97 Pull Request: llvm#90049

and "[NFC] [C++20] [Modules] Use new class CXX20ModulesGenerator to generate module file for C++20 modules instead of PCHGenerator" This reverts commit fb21343. and commit 18268ac. It looks like there are some problems about linking the compiler

) We can use the original vector as long as the type of X matches the result type of the vmv_s_x_vl.

Simultaneously implemented parsing support for the `%desc_*` modifiers. Reviewers: SixWeining, heiher, xen0n Reviewed By: xen0n, SixWeining Pull Request: llvm#90158

Close llvm#75057 Previously, I thought the diagnostic mappings is not meaningful with modules incorrectly. And this problem get revealed by another change recently. So this patch tried to rever the previous "optimization" partially.

Extends `omp.private` with a new region: `dealloc` where deallocation logic for Fortran deallocatables will be outlined (this will happen in later PRs).

…89247)

This removes various subtitles or converts them to bold text so that the table of contents is less cluttered. This includes "Example", "Notes", "Priority To Implement" and "Response".

The implementation only enables when the `-enable-tlsdesc` option is passed and the TLS model is `dynamic`. LoongArch's GCC has the same option(-mtls-dialet=) as RISC-V. Reviewers: heiher, MaskRay, SixWeining Reviewed By: SixWeining, MaskRay Pull Request: llvm#90159

…ixed. (llvm#90484) The original PR llvm#90083 had to be reverted in PR llvm#90444 as it caused one of the gfortran tests to fail. The issue was using `isIntOrIndex` for checking for integer type. It allowed index type which later caused assertion when calling `getIntOrFloatBitWidth`. I have now replaced it with `isInteger` which should fix this regression.

…90471) In the debug intrinsic class heirachy, a dbg.assign is a (inherits from) dbg.value, so `findDbgValues` returns dbg.values and dbg.assigns (by design). That hierarchy doesn't exist for DbgRecords - fix findDbgValues to return dbg_assign records as well as dbg_values and add unittest.

…9361) Some of the online sync-ups on our Getting Involved page seem to no longer be happening. Document them as no longer happening, so that people don't get confused when dialing in to one of these.

@Bigcheese

This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @Bigcheese in the tokyo's WG21 meeting. The idea comes from @jyknight posted on LLVM discourse. That for: ``` // A.cppm export module A; ... // B.cppm export module B; import A; ... //--- C.cppm export module C; import C; ``` Almost every time A.cppm changes, we need to recompile `B`. Due to we think the source location is significant to the semantics. But it may be good if we can avoid recompiling `C` if the change from `A` wouldn't change the BMI of B. # Motivation Example This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test. ``` //--- A.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- A.v1.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- B.cppm export module B; import A; export int funcB() { return funcA(); } //--- C.cppm export module C; import A; export void testD() { C<int> c; c.func(); } ``` Here the only difference between `A.cppm` and `A.v1.cppm` is that `A.v1.cppm` has an additional blank line. Then the test shows that two BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same contents. However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from `A` and `A.v1`, so it is expected that the BMI `C.pcm` and `C.v1.pcm` can and should differ. # Internal perspective of status quo To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them. For source locations, we encoded them as: ``` | | | _____ base offset of an imported module | | | |_____ base offset of another imported module | | | | | ___ 0 ``` As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset. For example, for, ``` // a.cppm export module a; ... // b.cppm export module b; import a; ... ``` Assuming the offset of a source location (let's name the location as `S`) in a.cppm is 45 and we will record the value `45` into the BMI `a.pcm`. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example. Then when we want to get the location in the current source location space for `S`, we can get it simply by adding `45` to `90` to `135`. Finally we can get the source location for `S` in module B as `135`. And when we want to write module `b`, we would also write the source location of `S` as `135` directly in the BMI. And to clarify the location `S` comes from module `a`, we also need to record the base offset of module `a`, 90 in the BMI of `b`. Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve. Let's continue with the existing design to understand what's going on. Another interesting case is: ``` // c.cppm export module c; import whatever; import a; import b; ... ``` In `c.cppm`, when we import `a`, we still need to allocate a base location offset for it, let's say the value becomes to `200` somehow. Then when we reach the location `S` recorded in module `b`, we need to translate it into the current source location space. The solution is quite simple, we can get it by `135 + (200 - 90) = 245`. In another word, the offset of a source location in current module can be computed as `Recorded Offset + Base Offset of the its module file - Recorded Base Offset`. Then we're almost done about how we handle the offset of source locations in serializers. # The high level design of current patch From the abstract level, what we want to do is to remove the hardcoded base offset of imported modules and remain the ability to calculate the source location in a new module unit. To achieve this, we need to be able to find the module file owning a source location from the encoding of the source location. So in this patch, for each source location, we will store the local offset of the location and the module file index. For the above example, in `b.pcm`, the source location of `S` will be recorded as `135` directly. And in the new design, the source location of `S` will be recorded as `<1, 45>`. Here `1` stands for the module file index of `a` in module `b`. And `45` means the offset of `S` to the base offset of module `a`. So the trade-off here is that, to make the BMI more independent, we need to record more abstract information. And I feel it is worthy. The recompilation problem of modules is really annoying and there are still people complaining this. But if we can make this (including stopping other changes transitively), I think this may be a killer feature for modules. And from @Bigcheese , this should be helpful for clang explicit modules too. And the benchmarking side, I tested this patch against https://github.com/alibaba/async_simple/tree/CXX20Modules. No significant change on compilation time. The size of .pcm files becomes to 204M from 200M. I think the trade-off is pretty fair. # Some low level details I didn't use another slot to record the module file index. I tried to use the higher 32 bits of the existing source location encodings to store that information. This design may be safe. Since we use `unsigned` to store source locations but we use uint64_t in serialization. And generally `unsigned` is 32 bit width in most platforms. So it might not be a safe problem. Since all the bits we used to store the module file index is not used before. So the new encodings may be: ``` |-----------------------|-----------------------| | A | B | C | * A: 32 bit. The index of the module file in the module manager + 1. The +1 here is necessary since we wish 0 stands for the current module file. * B: 31 bit. The offset of the source location to the module file containing it. * C: The macro bit. We rotate it to the lowest bit so that we can save some space in case the index of the module file is 0. ``` (The B and C is the existing raw encoding for source locations) Another reason to reuse the same slot of the source location is to reduce the impact of the patch. Since there are a lot of places assuming we can store and get a source location from a slot. And if I tried to add another slot, a lot of codes breaks. I don't feel it is worhty. Another impact of this decision is that, the existing small optimizations for encoding source location may be invalided. The key of the optimization is that we can turn large values into small values then we can use VBR6 format to reduce the size. But if we decided to put the module file index into the higher bits, then maybe it simply doesn't work. An example may be the `SourceLocationSequence` optimization. This will only affect the size of on-disk .pcm files. I don't expect this impact the speed and memory use of compilations. And seeing my small experiments above, I feel this trade off is worthy. # Correctness The mental model for handling source location offsets is not so complex and I believe we can solve it by adding module file index to each stored source location. For the practical side, since the source location is pretty sensitive, and the patch can pass all the in-tree tests and a small scale projects, I feel it should be correct. # Future Plans I'll continue to work on no transitive decl change and no transitive identifier change (if matters) to achieve the goal to stop the propagation of unnecessary changes. But all of this depends on this patch. Since, clearly, the source locations are the most sensitive thing. --- The release nots and documentation will be added seperately.

Should hopefully help shave some minutes off developer debugging time in the future.

For all means and purposes llvm.mlir.addressof acts like a constant, and should be treated as such by passes. In particular, the operation should be propagated rather than passed whenever possible.

…#90200) Adds an example that combines loop peeling and scalable vectorisation of `linalg.depthwise_conv_2d_nhwc_hwc`. This is similar to transform-op-peel-and-vectorize.mlir and is meant to demonstrate how to avoid masking when vectorising using scalable vectors.

We need to adjust the RHS to account for the LHS bitwidth.

Guard object destroyed immediately after creation without naming.

attempt to fix llvm#68885 (comment) Deduction of NTTP whose type is `decltype(auto)` would create an implicit cast expression to dependent type and makes the type of primary template definition (`InjectedClassNameSpecialization`) and its partial specialization different. Prevent emitting cast expression to make clang knows their types are identical by removing `CTAK == CTAK_Deduced` when the type is `decltype(auto)`. Co-authored-by: huqizhi <[email protected]>

… on deduction of nontype template parameter (llvm#90376) Fix llvm#68885 When build expression from a deduced argument whose kind is `Declaration` and `NTTPType`(which declared as `decltype(auto)`) is deduced as a reference type, `BuildExpressionFromDeclTemplateArgument` just create a `DeclRef`. This is incorrect while we get type from the expression since we can't get the original reference type from `DeclRef`. Creating a `SubstNonTypeTemplateParmExpr` expression and make the deduction correct. `Replacement` expression of `SubstNonTypeTemplateParmExpr` just helps the deduction and may not be same with the original expression. Co-authored-by: huqizhi <[email protected]>

…sions (llvm#84387) Depends on llvm#84384 and llvm#90329 This adds support for `DW_TAG_LLVM_ptrauth_type` entries corresponding to explicitly signed types (e.g. free function pointers) in lldb user expressions. Applies PR swiftlang#8239 from Apple's downstream and also adds tests and related code. --------- Co-authored-by: Jonas Devlieghere <[email protected]>

…m#90413) This also removes the member overload in TypeSwitch. All other users have been removed in fac349a and bd9fdce.

modules After llvm#86912, for the following example, ``` export module A; export import B; ``` The generated BMI of `A` won't change if the source location in `A` changes. Further, we plan avoid more such changes. However, it is slightly problematic since `export import` should propagate all the changes. So this patch adds a signature to the BMI of C++20 modules so that we can propagate the changes correctly.

llvm#90570) …te module file for C++20 modules instead of PCHGenerator Previously we're re-using PCHGenerator to generate the module file for C++20 modules. But this is slighty more or less odd. This patch tries to use a new class 'CXX20ModulesGenerator' to generate the module file for C++20 modules.

91a8cb7 was originally written before 8d53866 landed. The latter changed how main is emitted which changed the numbering of the suprograms in the test output. To fix this I've added a check for the new _QQmain and renumbered the existing checks.

In the test of clang/test/Modules/no-transitive-source-location-change.cppm, there were reports about invalid directory names in windows. The reason may be that we may remove and create the same directory. This patch tries to avoid such patterns for that.

Added line which has been dropped from the 'deinitRuntime()' during merge-conflict resolution. Change-Id: Iee2c8b2fe63d8cd36cdb9befca2e8c93384087d9

…vm#90467) The AltiVec (POWER) and ZVector (IBM Z) language extensions do not support using the "vector" keyword when the element type is a complex type, but current code does not verify this. Add a Sema check and diagnostic for this case. Fixes: llvm#88399

image_msaa_load is actually encoded as a VSAMPLE instruction and requires the appropriate waitcnt variant.

Current dir can be read-only. Use a temp path instead.

…ut repeated SDLoc(). NFC.

…ffsets. As noted on llvm#66991 - we sometimes share vector constant pool entries, referencing subvectors within them via pointer offsets

…g/ec/llvm-project into amd-staging

@src

…90408) Fold: ``` llvm define i1 @src(i8 %x, i8 %y) { %xor = xor i8 %x, %y %r = trunc nuw/nsw i8 %xor to i1 ret i1 %r } define i1 @tgt(i8 %x, i8 %y) { %r = icmp ne i8 %x, %y ret i1 %r } ``` Proof: https://alive2.llvm.org/ce/z/dcuHmn

This flag has been enabled by default for almost two years now since 1f06398, and at this stage we probably shouldn't be falling back to the fixups. This removes the flag so we always perform the assertion, as well as making sure that CurInfo is always valid on exit: We shouldn't leave emitVSETVLIs with an uninitialized VSETVLIInfo.

llvm#90577) Once llvm#85613 fixed, we can mark this feature fully supported. Signed-off-by: yronglin <[email protected]>

libLTO parses options late, so at the moment the option is ignored. To fix that, re-set it in optimize(), as at this point the options have been parsed. When LTOCodeGenerator's constructor executes, the options haven't been parsed by the linker to libLTO yet. Note that we keep the value name of `%add = add..` because when the module is imported, DiscardValueNames is still set to false (the default when building with assertions). I tried to improve this in libLTO, but I am not sure if there's a suitable callback when all options have been set. PR: llvm#78705

…m#88890)

As discussed in llvm#88039, support different strides with isSafeDependenceDistance by passing the maximum of both strides. isSafeDependenceDistance tries to prove that |Dist| > BackedgeTakenCount * Step holds. Chosing the maximum stride computes the maximum range accesed by the loop for all strides. PR: llvm#90036

…ets (llvm#90573) In llvm#70452 DAGCombiner::mayAlias was taught to handle scalable sizes, but when it checks via AA->isNoAlias it didn't take into account the case where the size is scalable but there was an offset too. For the fixed length case the offset was just accounted for by adding to the LocationSize, but for the scalable case there doesn't seem to be a way to represent both a scalable and fixed part in it. So this patch works around it by bailing if there is an offset. Fixes llvm#90559

Thanks to ExtensionSet::toLLVMFeatureList, all values of ArchExtKind should correspond to a particular -target-feature. The valid values of -target-feature are in turn defined by SubtargetFeature defs. Therefore we can generate ArchExtKind from the tablegen data. This is done by adding an Extension class which derives from SubtargetFeature. Because the Has* FieldNames do not always correspond to the AEK_ names ("extensions", as defined in TargetParser), and AEK_ names do not always correspond to -march strings, some additional enum entries have been added to remap the names. I have renamed these to make the naming consistent, but split them into a separate PR to keep the diff reasonable (llvm#90320)

Change-Id: I95739002226a44f9c97a6b2ea2e349ec57b7a9f1

…lvm#90441) See https://clang.llvm.org/docs/LibTooling.html

…0440)

…es (llvm#90497) GUID often have content in the higher bits of a 64-bit entry so using the unabbrev encoding is inefficient (lots of VBR control bits). Instead, use an abbrev with two 32-bit fixed width chunks. The abbrev also helps encode the "count" in one place instead of in every record. Reduces size of distributed backend summary files by 8.7% in one example app. Co-authored-by: Jan Voung <[email protected]>

…pp (llvm#90606)

….py (llvm#90607)

This reverts commit 61b2a0e. Reason: AArch64TargetParserDef.inc not found while building clang

… summaries" (llvm#90610) Reverts llvm#90497 Broke some LLD tests.

llvm#90609)

When compiling for thumbv8.1m with +pacbti and making an indirect tail call, the compiler was free to put the function pointer into R12. This is incorrect because R12 is restored to contain authentication code for the caller's return address. This patch excludes R12 from the set of registers the compiler can put the function pointer in. Fixes llvm#75998

This reverts commit 6c31104. Required by the post commit comments: llvm#86912

Added support for memref-normalization for prefetch. Signed-off-by: Alexandre Eichenberger <[email protected]>

Enable MachineCombining for FP add, sub and mul. In order for this to work, the default instruction selection of reg/mem opcodes is disabled for ISD nodes that carry the flags that allow reassociation. The reg/mem folding is instead done after MachineCombiner by PeepholeOptimizer. SystemZInstrInfo optimizeLoadInstr() and foldMemoryOperandImpl() ("LoadMI version") have been implemented for this purpose also by this patch.

…rchString. NFC (llvm#90562) This replaces some starts_with calls wth consume_front. This allows us to remove a later assumption that prefix was 4 characters. We would eventually need to fix this anyway if we ever support rv128. Noticed while reviewing the RISCVISAInfo code for other reasons.

@ImanHosseini

Store of the current induction value to the user IV was not placed correctly in the body of the cuf kernel. @ImanHosseini

This patch introduces fir.cuda_alloc/fir.cuda_free. These operations will be used instead of fir.alloca for local CUDA device, managed and unified variables.

COPY operands are always registers.

Previously we weren't printing expressions correctly, so this patch adds a test to ensure we do, and fixes how expressions are printed.

…vm#88838)" This reverts commit 9d5411f. Breaks aarch64 buildbot: https://lab.llvm.org/buildbot/#/builders/221/builds/22130

…ultiple times in the same instruction. (llvm#89601) Previously, multiple uses of a register within the same instruction were being counted as multiple uses. This has been corrected to only count as a single use as per the specification allowing for more optimisation candidates.

…lvm#90593) Before this patch we crashed lowering intrinsic array reductions. I think this lost during a rebase. I've added a test to make sure it doesn't break again. Also fixed the TODO message to be more accurate.

…vm#90597) We might use polymorphic ops in top-level operations other than functions some time in the future. We need to ensure that these operations can be lowered. See RFC: https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations Some of the changes are from moving declaration and definition of the constructor function into tablegen (as requested in code review when altering another pass).

…0502) This intrinsic is the VP version of `experimental.cttz.elts`.

…90040) This patch generalizes tensor.expand_shape and memref.expand_shape to consume the output shape as a list of SSA values. This enables us to implement generic reshape operations with dynamic shapes using collapse_shape/expand_shape pairs. The output_shape input to expand_shape follows the static/dynamic representation that's also used in `tensor.extract_slice`. Differential Revision: https://reviews.llvm.org/D140821 --------- Signed-off-by: Gaurav Shukla<[email protected]> Signed-off-by: Gaurav Shukla <[email protected]> Co-authored-by: Ramiro Leal-Cavazos <[email protected]>

Based off llvm#90355 - add basic tests for cases when to extend i16 comparisons to i32

We were always calling SDLoc(N) at the top of each visitSHL/SRL/SRA for the FoldConstantArithmetic call, so just reuse this as much as possible.

…NFC (llvm#90086)

llvm#90087) … NFC

This will unify the interface a bit more.

…ation of an upstream patch. This patch refactors the checkIfAPU method. The revised checkIfAPU() method, using the HSA symbols HSA_AGENT_INFO_AMD_MEMORY_PROPERTIES and HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU, will be upstreamed. This patch reduces merge conflicts with the upstream method, as the detection of the GFX90a and MI300x is moved to separate methods. As such, the downstream method can be replaced by the upstream implementation. Change-Id: Id10605e7ea2248538f26ebc717341b1735495a01

…lvm#90110)

Return an SDValue instead of pushing to the Results vector. Let the caller do the push.

Forgot to add vp.cttz.elts into the unittest. Also, I didn't specify the positions of overloaded type parameters.

Expand `arith.minsi`, `arith.minui`, `arith.maxsi`, `arith.maxui` into `arith.cmpi` and `arith.select`. --------- Co-authored-by: Jakub Kuderski <[email protected]>

Ensure it's clear that: - Infinite loops in non-mustprogress functions are well-defined, even if they're called by mustprogress functions. - Infinite recursion in mustprogress functions is not well-defined. Looking at D86233, it's clear this was the intent, but the "transitive" wording is ambiguous. Instead, just explicitly state that infinite loops written in non-mustprogress functions count as progress.

…uts are close to max. (llvm#90558) Fixes llvm#89668

… in a test (llvm#90399) - Adds a status page note for P3142R0 - Fixes a copy & paste error in tuple protocol for `complex`

…BVECTOR (llvm#84107) This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up by the full LMUL amount if we know the exact subregister the subvector will be inserted into. This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two parts: - We handle fixed length subvector types by converting the subvector to a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the vector being inserted into too. - Whenever we don't need a vslideup because either the subvector fits exactly into a vector register group *or* the vector is undef, we need to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly handle fixed length subvectors yet: see d7a28f7 A subvector exactly fits into a vector register group if its size is a known multiple of the size of a vector register, and this adds a new overload for TypeSize::isKnownMultipleOf for scalable to scalable comparisons to help reason about this. I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an invariant), so that the insert_subvector and extract_subvector code paths are the same. We should teach it to properly handle fixed length subvectors in a follow-up patch, so that the "exact subregsiter" logic is handled in one place instead of being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.

…g. NFC

) Noticed while attempting microsoft/STL#4634

Adds support for applying LLVM formatting to variables. The reason for this is to support cases such as the following. Let's say you have two separate bytes that you want to print as a combined hex value. Consider the following summary string: ``` ${var.byte1%x}${var.byte2%x} ``` The output of this will be: `0x120x34`. That is, a `0x` prefix is unconditionally applied to each byte. This is unlike printf formatting where you must include the `0x` yourself. Currently, there's no way to do this with summary strings, instead you'll need a summary provider in python or c++. This change introduces formatting support using LLVM's formatter system. This allows users to achieve the desired custom formatting using: ``` ${var.byte1:x-}${var.byte2:x-} ``` Here, each variable is suffixed with `:x-`. This is passed to the LLVM formatter as `{0:x-}`. For integer values, `x` declares the output as hex, and `-` declares that no `0x` prefix is to be used. Further, one could write: ``` ${var.byte1:x-2}${var.byte2:x-2} ``` Where the added `2` results in these bytes being written with a minimum of 2 digits. An alternative considered was to add a new format specifier that would print hex values without the `0x` prefix. The reason that approach was not taken is because in addition to forcing a `0x` prefix, hex values are also forced to use leading zeros. This approach lets the user have full control over formatting.

We use pwrite() in RewriteInstance to update contents of existing sections. pwrite() requires file position to be set past the written offset which we guarantee at the start of rewriteFile(). Then we had an implicit assumption in patchBuildID() that the file position will be set again in patchELFSymTabs() after being reset in patchELFPHDRTable(). That assumption was broken in llvm#90300. The fix is to save and restore file position in patchELFPHDRTable(). Then we don't have to update it again in patchELFSymTabs().

Fix compiler warning caused by using deprecated interface (llvm#90413)

This adds the preprocessor define for the half-precision feature and also adds preprocessor tests.

…unction declarations (llvm#90517) According to [class.mem.general] p8: > A complete-class context of a class (template) is a > - function body, > - default argument, > - default template argument, > - _noexcept-specifier_, or > - default member initializer > > within the member-specification of the class or class template. When testing llvm#90152, it came to my attention that we do _not_ consider the _noexcept-specifier_ of a friend function declaration to be a complete-class context (something which the Microsoft standard library depends on). Although a comment states that this is "consistent with what other implementations do", the only other implementation that exhibits this behavior is GCC (MSVC and EDG both late-parse the _noexcept-specifier_). This patch changes _noexcept-specifiers_ of friend function declarations to be late parsed, which is in agreement with the standard & majority of implementations. Pre-llvm#90152, our existing implementation falls "in between" the implementation consensus: within non-template classes, we would not find latter declared members (qualified and unqualified), while within class templates we would not find latter declared member when named with a unqualified name, we would find members named with a qualified name (even when lookup context is the current instantiation). Therefore, this _shouldn't_ be a breaking change -- any code that didn't compile will continue to not compile (since a _noexcept-specifier_ is not part of the deduction substitution loci (see [temp.deduct.general] p7), and any code which did compile should continue to do so.

…g non-existent members of the current instantiation prior to instantiation in the absence of dependent base classes (llvm#84050)" (llvm#90152) Reapplies llvm#84050, addressing a bug which cases a crash when an expression with the type of the current instantiation is used as the _postfix-expression_ in a class member access expression (arrow form).

The private clause is the first that takes a 'var-list', thus this has a lot of additional work to enable the var-list type. A 'var' is a traditional variable reference, subscript, member-expression, or array-section, so checking of these is pretty minor. Note: This ran into some issues with array-sections (aka sub-arrays) that will be fixed in a follow-up patch.

…#88440) As mentioned in llvm#68882 and https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699 Gep arithmetic isn't consistent with different types. GVNSink didn't realize this and sank all geps as long as their operands can be wired via PHIs in a post-dominator. Fixes: llvm#85333

Implement - LWG4053 Unary call to `std::views::repeat` does not decay the argument - LWG4054 Repeating a `repeat_view` should repeat the view Signed-off-by: yronglin <[email protected]>

Seemingly some other patch went in that altered how much dependence was printed vs the actual names, and it changed the ast-dump results. Commit to fix this test.

…0637) Due to generalization introduced in llvm#90040

…g/ec/llvm-project into amd-staging

This patch fixes: third-party/unittest/googletest/include/gtest/gtest.h:1379:11: error: comparison of integers of different signs: 'const unsigned int' and 'const int' [-Werror,-Wsign-compare]

…vm#90550) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator== outnumbers StringRef::equals by a factor of 22 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".

Due to generalization introduced in llvm#90040

…" (llvm#90658) Reverts llvm#88440 Test failing on Windows: https://lab.llvm.org/buildbot/#/builders/233/builds/9396 ``` Input file: <stdin> # | Check file: C:\buildbot\as-builder-8\llvm-nvptx-nvidia-win\llvm-project\llvm\test\Transforms\GVNSink\different-gep-types.ll # | # | -dump-input=help explains the following input dump. # | # | Input was: # | <<<<<< # | . # | . # | . # | 42: br label %if.end6 # | 43: # | 44: if.else5: ; preds = %if.else # | 45: br label %if.end6 # | 46: # | 47: if.end6: ; preds = %if.else5, %if.then3, %if.then # | next:67'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found # | next:67'1 with "IF_THEN" equal to "%if\\.then" # | next:67'2 with "IF_THEN3" equal to "%if\\.then3" # | next:67'3 with "IF_ELSE5" equal to "%if\\.else5" # | 48: %.sink1 = phi i32 [ -8, %if.then3 ], [ -4, %if.else5 ], [ 8, %if.then ] # | next:67'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # | next:67'4 ? possible intended match # | 49: %0 = load ptr, ptr %__i, align 4 # | next:67'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # | 50: %incdec.ptr4 = getelementptr inbounds i8, ptr %0, i32 %.sink1 # | next:67'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # | 51: store ptr %incdec.ptr4, ptr %__i, align 4 # | next:67'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # | 52: ret void # | next:67'0 ~~~~~~~~~~ # | 53: } # | next:67'0 ~~ # | >>>>>> # `----------------------------- # error: command failed with exit status: 1 ```

Add pre-commit MIR test for PR "[Promote Pseudo Opcode from 32-bit to 64-bit after eliminating the extsw instruction in PPCMIPeepholes optimization](llvm#85451)" which fixes bug reported in the issue "[Inconsistent Output at -O1 and -O2 Optimization Levels on PowerPC64 Due to Complex Type Casting and Nested Loop Structure](llvm#71030)".

…st. Add inference from V extension. (llvm#90650) We weren't fully checking that we parsed Zve*x/f/d correctly. This could break if new extension is added that starts with Zve. We were assuming the Zve64d is present whenever V is so we only inferred from Zve*. It's more correct to infer ELEN from V itself too.

…6607) Since some places, like SimplifyCFG, work with 64-bit weights, we supply an API in ProfDataUtils to extract the weights accordingly. We change the API slightly to disambiguate the 64-bit version from the 32-bit version.

`CMake` supports [this command](https://cmake.org/cmake/help/latest/manual/cmake.1.html#cmdoption-cmake-E-arg-cat) as of version 3.18. [D151344](https://reviews.llvm.org/D151344) bumped the minimum version to 3.20, so, it is now possible to remove the dependency on the external utility. This helps to cross-compile from Windows to Linux without installing additional tools, such as MSYS2.

This patch implements `__kmp_is_address_mapped()` for AIX by calling `loadquery()` to get the load info of the process and then checking if the address falls within the range of the data segment of one of the loaded modules.

I have no idea if this is correct and I probably swapped the element ordering somewhere.

) UNIFIED variables are accept in program scope. Update the check to allow them.

The Linux kernel expects ORC tables to be sorted by IP address (for binary search to work). Add a post-emit pass in LinuxKernelRewriter that validates the written .orc_unwind_ip against that expectation.

…90549) Resolve test failure on non-x86 linux host

…lvm#90615) A double pointer was being passed to the call to FortranStart rather than just a pointer to the EnvironmentDefaults.list. This now passes `null` directly when there's no EnvironmentDefaults.list and passes the list directly when there is, removing the original global variable which was a pointer to a pointer containing null or the EnvironmentDefaults.list global. Fixes llvm#90537

…ex expr as 0. (llvm#90375) During analysis, we incorrectly leave the offset part of an address info struct as zero, when in actual fact we failed to decompose it into base + offset. This results in incorrectly assuming that the address is adjacent to another store addr. To fix this we wrap the offset in an optional<> so we can distinguish between real zero and unknown. Fixes issue llvm#90242

…g values. Added a limit of 128 incoming values at max for PHIs nodes to be vectorized plus improved performance by using logarithmic search instead of linear if the number of incoming values is > 4.

This changes the handling of -fno-unsafe-fp-math to stop having that option imply -ftrapping-math. In gcc, -fno-unsafe-math-optimizations sets -ftrapping-math, but that dependency is based on the fact the -ftrapping-math is enabled by default in gcc. Because clang does not enable -ftrapping-math by default, there is no reason for -fno-unsafe-math-optimizations to set it. On the other hand, -funsafe-math-optimizations continues to imply -fno-trapping-math because this option necessarily disables strict exception semantics. This fixes llvm#87523

Update the `AreCompatibleCUDADataAttrs` function to return true when one argument has the `PINNED` attribute and the other argument is just host data.

This PR implements a part of WG14 N2653: - Define C23 char8_t - Define C11 char16_t - Define C11 char32_t Missing goals are: - The type of UTF-8 character literals is changed from unsigned char to char8_t. (Since UTF-8 character literals already have type unsigned char, this is not a semantic change). - New mbrtoc8() and c8rtomb() functions declared in <uchar.h> enable conversions between multibyte characters and UTF-8. - A new ATOMIC_CHAR8_T_LOCK_FREE macro. - A new atomic_char8_t typedef name.

This patch fixes: bolt/lib/Rewrite/LinuxKernelRewriter.cpp:855:12: error: variable 'PrevIP' set but not used [-Werror,-Wunused-but-set-variable]

I accidentally named it pr90688 instead of pr90668.

``` SelectionDAG has 17 nodes: t0: ch,glue = EntryToken t6: i64,ch = CopyFromReg t0, Register:i64 %2 t8: i1 = truncate t6 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t7: i1 = truncate t4 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t10: i64,i1 = saddo t2, Constant:i64<1> t11: i1 = or t8, t10:1 t12: i1 = select t7, t8, t11 t13: i64 = any_extend t12 t15: ch,glue = CopyToReg t0, Register:i64 $x10, t13 t16: ch = RISCVISD::RET_GLUE t15, Register:i64 $x10, t15:1 ``` `OtherOpVT` should be i1, but `OtherOp->getValueType(0)` returns `i64`, which ignores `ResNo` in `SDValue`. Fix llvm#90652.

Add validation in the FileList reader to check that the headers exist and use similar diagnostics in Options.cpp

…)" This reverts commit 7a8d15e.

…llvm#90681) If this flag is set, Xor will not be considered AddLike. If an Xor were treated as an Add it may wrap. If we can prove there would be no carry out and thus no wrap, the Xor would be turned into a disjoint Or by DAGCombine. Use this new flag to fix a bug in X86 where an Xor is incorrectly being treated as an NUWAdd. Fixes llvm#90668.

…acks (llvm#90678) Previously if the producer tensor.unpack op had "unpadding" semantics, the folding pattern would construct a destination that does not match with the result type of the transpose. Because both ops are DPS we can just reuse the destination of the transpose. Additionally cleans up a bunch of trailing whitespace in the test file.

If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file` string. The revision can be set with `LLVM_FORCE_VC_REVISION`. Before: `.file "git_revision.cpp",,"LLVM version 19.0.0git"` After: `.file "git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`

…ssociation. (llvm#90642) The new operation is just an abstract attribute that is attached to [hl]fir.declare operations of dummy arguments of a subroutine. Dummy arguments of the same subroutine refer to the same fir.dummy_scope, so they can be recognized as such during FIR AliasAnalysis. Note that the fir.dummy_scope must be specific to the runtime instantiation of a subroutine, so any MLIR inlining/cloning should duplicate and unique it vs using the same fir.dummy_scope for different runtime instantiations. This is why I made it an operation rather than an attribute. The new operation uses a write effect on DebuggingResource, same as [hl]fir.declare, to avoid optimizing it away.

…vm#90672) Previous fix llvm#90549 didn't completely address the Buildbot failures. Some target may not recognize the target triple. This time, only run the test under x86_64-linux.

/llvm-project/cross-project-tests/debuginfo-tests/llvm-prettyprinters/gdb/mlir-support.cpp:41:16: error: 'cast' is deprecated: Use mlir::cast<U>() instead [-Werror,-Wdeprecated-declarations] VectorType.cast<mlir::ShapedType>(), llvm::ArrayRef<float>{2.0f, 3.0f}); ^ /llvm-project/llvm/../mlir/include/mlir/IR/Types.h:345:9: note: 'cast' has been explicitly marked deprecated here U Type::cast() const { ^ /llvm-project/cross-project-tests/debuginfo-tests/llvm-prettyprinters/gdb/mlir-support.cpp:41:16: error: 'cast<mlir::ShapedType>' is deprecated: Use mlir::cast<U>() instead [-Werror,-Wdeprecated-declarations] VectorType.cast<mlir::ShapedType>(), llvm::ArrayRef<float>{2.0f, 3.0f}); ^ /llvm-project/llvm/../mlir/include/mlir/IR/Types.h:112:5: note: 'cast<mlir::ShapedType>' has been explicitly marked deprecated here [[deprecated("Use mlir::cast<U>() instead")]] ^ 2 errors generated.

LoadLibraryW will lookup dlls in user directories if its search path is left unrestricted. This is a security vulnerability as one can name a shared library the same as that of a system dll in order to run arbitrary code when the shared library is loaded from the path in a user directory. This change modifies it to only search within sys32 when loading dbghelp.dll.

… rule (llvm#90679) This patch updates the compatibility checks for CUDA attribute iin preparation to implement the matching rules described in section 3.2.3. We this patch the compiler will still emit an error when there is multiple specific procedures that matches since the matching distances is not yet implemented. This will be done in a separate patch. https://docs.nvidia.com/hpc-sdk/archive/24.3/compilers/cuda-fortran-prog-guide/index.html#cfref-var-attr-unified-data gpu=unified and gpu=managed are not part of this patch since these options are not recognized by flang yet.

Fixes llvm#60040.

…matching rule" (llvm#90696) Reverts llvm#90679

Change-Id: I4f3510230f3d590f3d875dc0cc78d816bce8bff8

…ingType (llvm#90195) A pack indexing type can appear in a larger pack expansion, e.g `Pack...[pack_of_indexes]...` so we need to temporarily disable substitution of pack elements. Besides, this patch also fixes an assertion failure in `PackIndexingExpr::classify`: dependent `PackIndexingExpr`s are always LValues and thus we don't need to consider their `IndexExpr`s. Fixes llvm#88925 --------- Co-authored-by: cor3ntin <[email protected]>

…classes. (llvm#90180)

…atching rule

…0592) When delayed privatization is enabled, this PR emits the deallocation logic to the newly introduced `dealloc` region on `omp.private` ops.

…nk pipeline (llvm#90690) Follow up to llvm#90310, limit the tune up only to ThinLTO pre-link as coroutine passes are not in MonoLTO backend

See `llvm/unittests/IR/BasicBlockDbgInfoTest.cpp` for a test case.

… String classes." (llvm#90701) Reverts llvm#90180

…llvm#90569) Canonicalize getelementptr instructions for scalable vector types into ptradd representation with an explicit llvm.vscale call. This representation has better support in BasicAA, which can reason about llvm.vscale, but not plain scalable GEPs.

…ef. NFC This matches what comes out of isel since a63bd7e. It also adds the undef flag to more closely match the output after regalloc, which will help with the test diffs in llvm#70549

This takes the form of three consecutive but related changes: - Mark the fast path of BumpPtrAllocator as likely-taken. - Move the slow path of BumpPtrAllocator to a separate function. - Mark the slow path of BumpPtrAllocator as noinline. Overall, this saves geomean 0.4% userspace instructions on CTMark -O3, and 0.98% on CTMark -O0 -g. http://llvm-compile-time-tracker.com/compare.php?from=e1622e189e8c0ef457bfac528f90a7a930d9aad2&to=9eb53a4ed3af4a55e769ae1dd22d034b63d046e3&stat=instructions%3Au

…lvm#90600) This patch fix the crash reported in: llvm#90589

For the platform and extension doc. Also add links in the extension doc to the GDB specs we're extending.

There were two diffs that introduced some options useful when you build modules externally and cannot rely on file modification time as the key for detecting input file changes: - [D67249](https://reviews.llvm.org/D67249) introduced the `-fmodules-validate-input-files-content` option, which allows the use of file content hash in addition to the modification time. - [D141632](https://reviews.llvm.org/D141632) propagated the use of `-fno-pch-timestamps` with Clang modules. There is a problem when the size of the input file (header) is not modified but the content is. In this case, Clang cannot detect the file change when the `-fno-pch-timestamps` option is used. The `-fmodules-validate-input-files-content` option should help, but there is an issue with its application: it's not applied when the modification time is stored as zero that is the case for `-fno-pch-timestamps`. The issue can be fixed using the same trick that was applied during the processing of `ForceCheckCXX20ModulesInputFiles`: ``` // When ForceCheckCXX20ModulesInputFiles and ValidateASTInputFilesContent // enabled, it is better to check the contents of the inputs. Since we can't // get correct modified time information for inputs from overriden inputs. if (HSOpts.ForceCheckCXX20ModulesInputFiles && ValidateASTInputFilesContent && F.StandardCXXModule && FileChange.Kind == Change::None) FileChange = HasInputContentChanged(FileChange); ``` The patch suggests the solution similar to the presented above and includes a LIT test to verify it.

The read_exec builtins are implemented with the ballot intrinsic anyway. In the wave32 case, these will optimize down to just use the low 32-bits. This converts a few uses, but others remain. Apparently you can just use exec_hi as a GPR in wave32 though, so I'm not sure we should be treating the raw exec read as assumed 0. Change-Id: Id5621bf31b0bb7fa27456938942138f3dea85a0a

…peline. Previously ObjectLinkingLayer held unique ownership of Plugins, and links always used the Layer's plugin list at each step. This can cause problems if plugins are added while links are in progress however, as the newly added plugin may receive only some of the callbacks for links that are already running. In this patch each link gets its own copy of the pipeline that remains consistent throughout the link's lifetime, and it is guaranteed that Plugin objects (now with shared ownership) will remain valid until the link completes. Coding my way home: 9.80469S, 139.03167W

…vm#90220) Fixes llvm#89544

* Replace "we" with either "you" (when talking to the reader) or "lldb" (when talking about the project). * Refer to lldb as lldb not LLDB, to match what the user sees on the command line (I am going to come back later and put the proper name in places where it's talking about the projects themselves) * Remove a bunch of contractions for example "won't". Which don't (pun intended) seem like a big deal at first but even I as a native English speaker find the text clearer with them expanded. * Use RST's plain text highlighting for keywords and command names. * Split some very long lines for easier editing in future.

…cally. (llvm#90576) Part of <llvm#62629>.

…g/ec/llvm-project into amd-staging

…llvm#90213) We are effectively performing type and operation legalisation very early within the code generation flow. This results in worse code quality because the DAG is not in canonical form, which DAGCombiner corrects through the introduction of operations that are not legal. This patchs splits and moves the code to where type and operation legalisation is typically implemented.

llvm#90716) The autogenerated memory legalizer tests use -O0 so this allows us to see the exact waitcnts that were inserted by the memory legalizer without them being optimized away.

Closes llvm#89193

…0595) Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures).

) llvm#90201 made some fixes for gfx12 image_msaa_load waitcnt insertion. That fix might break in some situations for pre-gfx12 - this fixes that by explitly checking for VSAMPLE which always requires a s_wait_samplecnt and leaves the previous logic intact for non-gfx12.

The intent is to test lowering of vector operations by scalarization, for functions that are streaming-compatible (and thus cannot use NEON) and also don't have the +sve attribute. The generated code is clearly wrong at the moment, but a series of patches will follow to fix up all cases to use scalar instructions. A bit of context: This work will form the base to decouple SME from SVE later on, as it will make sure that no NEON instructions are used in streaming[-compatible] mode. Later this will be followed by a patch that changes `useSVEForFixedLengthVectors` to only return `true` if SVE is available for the given runtime mode, at which point I'll change the `-mattr=+sme -force-streaming-compatible-sve` to `-mattr=+sme -force-streaming-sve` in the RUN lines, so that the tests are considered to be executed in Streaming-SVE mode.

582c6a8 removed a constructor of 'ResourceSegments' that is needed in LLVM unit tests. * Revert 582c6a8 * Update the constructor to take a const reference of `std::list` as pointed out in llvm#89193.

…profitable. Adds transformation of consecutive vector store + reverse to strided stores with stride -1, if it is profitable Reviewers: RKSimon, preames Reviewed By: RKSimon Pull Request: llvm#90464

…improved analysis. Improved detection of const/splat candidates, their matching and analysis of instructions from same nodes. Metric: size..text Program size..text results results0 diff results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 92952.00 93096.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 779832.00 780136.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 839923.00 840179.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 392708.00 392740.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171131.00 1171147.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12352780.00 12352636.00 -0.0% MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE - small reordering External/SPEC/CINT2006/464.h264ref/464.h264ref - small better code after reordering MultiSource/Applications/JM/lencod/lencod - smaller code with less shuffles MultiSource/Applications/JM/ldecod/ldecod - same External/SPEC/CFP2017rate/511.povray_r/511.povray_r - 2 extra loads vectorized, smaller code External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - better code, size increased because of more constant vectors. External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same External/SPEC/CFP2017rate/526.blender_r/526.blender_r - small change in the vectorized code, some code a bit better, some a bit worse. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#87091

…rs (llvm#89995) Update the wrappers for the C std headers so that they always forward to the z/OS system headers.

This is a second attempt to land llvm#84501 which failed on several targets. This patch adds the HAS_IEE754_FLOAT128 define which makes the check for typedef'ing float128 more precise by checking whether __uint128_t is available and checking if the host does not use __ibm128 which is prevalent on power pc targets and replaces IEEE754 float128s.

…ion (llvm#90449) Fixes llvm#84732, llvm#81947, llvm#81946 Note: This is a fix till we enable delayed privatization.

If we want to turn this on on some platforms, we'll also want to define HAS_LOGF128 for AnalysisTest, see llvm/unittests/Analysis/CMakeLists.txt

Setting the correct build flags on z/OS to build LLVM as 64-bit ASCII application.

This reverts commit 088aa81.

This reverts commit 68b863b. 088aa81 was reverted in efce8a0.

Summary: This variable could be unset if not found or when building standalone. We should check for that and set it to true or false. Fixes: llvm#90708

…lvm#90722) This is needed for a workaround to make sure the link later succeeds. I don't know the reason for that but it is definitely needed. llvm#89234 will/wants to correct the triple normalisation for -none- and this means that clang prior to 19, and clang 19 and above will have different answers and therefore different library paths. I don't want to bootstrap a clang just for libcxx CI, or require that anyone building for Arm do the same, so ask the compiler what the triple should be. This will be compatible with 17 and 19 when we do update to that version. I'm assuming $CC is what anyone locally would set to override the compiler, and `cc` is the binary name in our CI containers. It's not perfect but it should cover most use cases.

Re-land 61b2a0e. Some Windows builds were failing because AArch64TargetParserDef.inc is a generated header which is included transitively into some clang components, but this information is not available to the build system and therefore there is a missing edge in the dependency graph. This patch incorporates the fixes described in ac1ffd3/D142403. Thanks to ExtensionSet::toLLVMFeatureList, all values of ArchExtKind should correspond to a particular -target-feature. The valid values of -target-feature are in turn defined by SubtargetFeature defs. Therefore we can generate ArchExtKind from the tablegen data. This is done by adding an Extension class which derives from SubtargetFeature. Because the Has* FieldNames do not always correspond to the AEK_ names ("extensions", as defined in TargetParser), and AEK_ names do not always correspond to -march strings, some additional enum entries have been added to remap the names. I have renamed these to make the naming consistent, but split them into a separate PR to keep the diff reasonable (llvm#90320)

…eBSD (llvm#81355) FreeBSD ports will now install debuginfo under $LOCALBASE/lib/debug/, where $LOCALBASE is typically /usr/local. On FreeBSD search this path in addition to existing debug info paths. Relevant change on the FreeBSD side: https://reviews.freebsd.org/D43515

MSVC linker merges functions having comdat which have identical set of instructions. CUDA uses kernel stub function as key to look up kernels in device executables. If kernel stub function for different kernels are merged by ICF, incorrect kernels will be launched. To prevent ICF from merging kernel stub functions, an unique global variable is created for each kernel stub function having comdat and a store is added to the kernel stub function. This makes the set of instructions in each kernel function unique. Fixes: llvm#88883

map-type change to "default" instead "ultimate" from [OpenMP5.2] The change is allowed map-type to be placed any locations within map modifiers, besides the last location in the modifiers-list, also map-type can be omitted afterward.

…lvm#90092) If the value is not boolean and we are checking for `Undef` or `UndefOrPoison`, we can avoid the potentially expensive IDom walk. This should improve compile time for isGuaranteedNotToBeUndefOrPoison and isGuaranteedNotToBeUndef.

…m#90128) To support auto-conversion on z/OS text files need to be opened as text files. These changes will fix a number of LIT failures due to text files not being converted to the internal code page. update a number of tools so they open the text files as text files add support in the cat.py to open a text file as a text file (Windows will continue to treat all files as binary so new lines are handled correctly) add env var definitions to enable auto-conversion in the lit config file.

…tor (llvm#90447) This test shows a few cases (not at all complete) where the current ArmSME tile allocator produces incorrect results. The plan is to resolve these issues with a future tile allocator that uses liveness information.

change order of fp and sp in kernel prologue also related codegen tests to make it easier to merge code into our downstream branches Signed-off-by: gangc <[email protected]>

- Revert 8009bbe - Revert "Reapply "[Clang][Sema] Diagnose class member access expressions naming non-existent members of the current instantiation prior to instantiation in the absence of dependent base classes (llvm#84050)" (llvm#90152)" - Breaks composable kernels and rocthrust builds - Revert 41f9c78 - Revert "[OpenACC] Fix test failure from fa67986" - Fixes some issues in 8009bbe, so depends on it - Cherry-pick 803e03f from trunk - Fixes unit test failures introduced in trunk earlier Change-Id: I718574c8a26745a52845d0b5a914ed00db611956

…89799) This patch enables parsing and creating modules directly into the new debug info format. Prior to this patch, all modules were constructed with the old debug info format by default, and would be converted into the new format just before running LLVM passes. This is an important milestone, in that this means that every tool will now be exposed to debug records, rather than those that run LLVM passes. As far as I've tested, all LLVM tools/projects now either handle debug records, or convert them to the old intrinsic format. There are a few unit tests that need updating for this patch; these are either cases of tests that previously needed to set the debug info format to function, or tests that depend on the old debug info format in some way. There should be no visible change in the output of any LLVM tool as a result of this patch, although the likelihood of this patch breaking downstream code means an NFC tag might be a little misleading, if not technically incorrect: This will probably break some downstream tools that don't already handle debug records. If your downstream code breaks as a result of this change, the simplest fix is to convert the module in question to the old debug format before you process it, using `Module::convertFromNewDbgValues()`. For more information about how to handle debug records or about what has changed, see the migration document: https://llvm.org/docs/RemoveDIsDebugInfo.html

llvm#89799)" A unit test was broken by the above commit: https://lab.llvm.org/buildbot/#/builders/139/builds/64627 This reverts commit 2f01fd9.

Previously if you passed an ELF binary it would be silently copied with no changes.

…ve (llvm#90581) Fixes: llvm#89376.

Section unification cannot just use names, because it's valid for ELF binaries to have multiple sections with the same name. We should check other section properties too. Fixes llvm#88001. rdar://124467787

I strongly suspect nobody ever used that macro since it wasn't very well known. Furthermore, it only affects a handful of diagnostics and I think it makes sense to either provide them unconditionally, or to not provided them at all.

…n emulation (llvm#89131) This PR builds on llvm#79494 with an additional path for efficient unsigned `i4 ->i8` type extension for 1D/2D operations. This will impact any i4 -> i8/i16/i32/i64 unsigned extensions as well as sitofp i4 -> f8/f16/f32/f64.

…0508) Empty ISG1 and OSG1 parts are generated for compute shader since there's no signature for compute shader. Fixes llvm#88778

The output on eel.is has similar oddities, so I expect this was copy pasted.

…m#89992) The dependency scanner only puts top-level affecting module map files on the command line for explicitly building a module. This is done because any affecting child module map files should be referenced by the top-level one, meaning listing them explicitly does not have any meaning and only makes the command lines longer. However, a problem arises whenever the definition of an affecting module lives in a module map that is not top-level. Considering the rules explained above, such module map file would not make it to the command line. That's why 83973cf started marking the parents of an affecting module map file as affecting too. This way, the top-level file does make it into the command line. This can be problematic, though. On macOS, for example, the Darwin module lives in "/usr/include/Darwin.modulemap" one of many module map files included by "/usr/include/module.modulemap". Reporting the parent on the command line forces explicit builds to parse all the other module map files included by it, which is not necessary and can get expensive in terms of file system traffic. This patch solves that performance issue by stopping marking parent module map files as affecting, and marking module map files as top-level whenever they are top-level among the set of affecting files, not among the set of all known files. This means that the top-level "/usr/include/module.modulemap" is now not marked as affecting and "/usr/include/Darwin.modulemap" is.

In case of functions without a stack frame no "stack" field is serialized into MIR which leads to isCalleeSavedInfoValid being false when reading a MIR file back in. To fix this we should serialize MachineFrameInfo::isCalleeSavedInfoValid() into MIR.

ROTL and ROTR can take a shift amount larger than the element size, in which case the effective shift amount should be the shift amount modulo the element size. This patch adds the modulo step when the shift amount isn't known at compile time. Without it the existing implementation would end up shifting beyond the type size and give incorrect results.

…lvm#90700) Instead of hardcoding the 4 current profile prefixes, treat profile selection as a fallback if we don't find "rv32" or "rv64". Update the error message accordingly.

…function. (llvm#90665) This simplifies the callers.

…O summaries" (llvm#90610) (llvm#90692) This reverts commit 2aabfc8. Add fixes to LLD and Gold tests missed in original change. Co-authored-by: Jan Voung <[email protected]>

It was pointed out in post commit review of llvm#90597 that the pass should never have been run in parallel over all functions (and now other top level operations) in the first place. The mutex used in the pass was ineffective at preventing races since each instance of the pass would have a different mutex.

Fixes llvm#84968. Implements the `fcntl()` function defined in the `fcntl.h` header.

…ons. (llvm#90414) Treat a compound operator such as |=, array subscription, sizeof, and non-type template parameter as trivial so long as subexpressions are also trivial. Also treat true/false boolean literal as trivial.

The function may return Z_MEM_ERROR or Z_STREAM_ERR. The former does not have a good way of testing. The latter will be possible with a pending change that allows setting the compression level, which will come with a test.

zstd excels at scaling from low-ratio-very-fast to high-ratio-pretty-slow. Some users prioritize speed and prefer disk read speed, while others focus on achieving the highest compression ratio possible, similar to traditional high-ratio codecs like LZMA. Add an optional `level` to `--compress-sections` (llvm#84855) to cater to these diverse needs. While we initially aimed for a one-size-fits-all approach, this no longer seems to work. (https://richg42.blogspot.com/2015/11/the-lossless-decompression-pareto.html) When --compress-debug-sections is used together, make --compress-sections take precedence since --compress-sections is usually more specific. Remove the level distinction between -O/-O1 and -O2 for --compress-debug-sections=zlib for a more consistent user experience. Pull Request: llvm#90567

…g/ec/llvm-project into amd-staging

Change-Id: I4968e32ce2fcf8592f4ab65f9b2eb89b5fbb67dc

Change-Id: I8d57fc9053f1ee71230ac48337f73b474581188f

…g/ec/llvm-project into amd-staging

Signed-off-by: "Yiyang Wu <[email protected]>"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow link to llvm shared library for current distros #68

Allow link to llvm shared library for current distros #68

Commits on Apr 30, 2024

Commits on May 1, 2024

Commits on May 2, 2024

Allow link to llvm shared library for current distros #68

Are you sure you want to change the base?

Allow link to llvm shared library for current distros #68

Commits on Apr 30, 2024

Commits on May 1, 2024

Commits on May 2, 2024