[AutoBump] Merge with e55d6f5e (Sep 11) (22) #376

mgehre-amd · 2024-09-26T14:22:41Z

No description provided.

) Migrate CodeGenHWModes to use const RecordKeeper and const Record pointers. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

The `@llvm.dx.typedBufferLoad` intrinsic is lowered to `@dx.op.bufferLoad`. There's some complexity here in translating to scalarized IR, which I've abstracted out into a function that should be useful for samples, gathers, and CBuffer loads. I've also updated the DXILResources.rst docs to match what I'm doing here and the proposal in llvm/wg-hlsl#59. I've removed the content about stores and raw buffers for now with the expectation that it will be added along with the work. Note that this change includes a bit of a hack in how it deals with `getOverloadKind` for the `dx.ResRet` types - we need to adjust how we deal with operation overloads to generate a table directly rather than proxy through the OverloadKind enum, but that's left for a later change here. Part of llvm#91367 Pull Request: llvm#104252

The inconsistency surfaced in llvm#95305. Split off the reduce the diff.

…lvm#107899) This reapplies 8fa66c6 ([asan][windows] Eliminate the static asan runtime on windows) for a second time. That PR bounced off the tests because it caused failures in the other sanitizer runtimes, these have been fixed by only building interception, sanitizer_common, and asan with /MD, and continuing to build the rest of the runtimes with /MT. This does mean that any usage of the static ubsan/fuzzer/etc runtimes will mean you're mixing different runtime library linkages in the same app, the interception, sanitizer_common, and asan runtimes are designed for this, however it does result in some linker warnings. Additionally, it turns out when building in release-mode with LLVM_ENABLE_PDBs the build system forced /OPT:ICF. This totally breaks asan's "new" method of doing "weak" functions on windows, and so /OPT:NOICF was explicitly added to asan's link flags. --------- Co-authored-by: Amy Wishnousky <[email protected]>

Fills in many missing functions from VectorType

This information helps with tuning the heuristic of selecting memory groups to release the unused pages.

Fix a bug that `lto_runtime_lib_symbols_list` is returning the address of a local variable that will be freed when getting out of scope. This is a regression from llvm#98512 that rewrites the runtime libcall function lists into a SmallVector. rdar://135559037

partially fixes llvm#70078 ### Changes - Added `int_spv_sign` intrinsic in `IntrinsicsSPIRV.td` - Added lowering and map to `int_spv_sign in `SPIRVInstructionSelector.cpp` - Added SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/sign.ll` ### Related PRs - llvm#101988 - llvm#101989

…lvm#107858) Change CGIOperandList::OperandInfo::Rec and CGIOperandList::TheDef to const pointer. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

This patch implements the Pass base class and the FunctionPass sub-class that operate on Sandbox IR.

…07919) Fixes generation of invalid loads leading to misaligned access errors. The bug got exposed by SLP vectorizer change ec360d6 which allowed SLP to produce `v16i8` vectors. Also updated the tests to use automatic check generator.

shifts are the same as sub where rhs == 0 is identity. and is the inverted case where: `SELECT (AND(X,1) == 0), (AND Y, Z), Y` -> `(AND Y, (OR NEG(AND(X, 1)), Z))` With -1 as the identity. Closes llvm#107910

…m#107498) Apparently, there are two almost identical implementations: one for MachO and another one for ELF. The ELF bits somehow slipped while llvm#84573 was reviewed. The particular implementation is identical to MachO case.

This patch implements sandboxir::UndefValue mirroring llvm::UndefValue.

Lower `fcopysign` SDNodes into `copysign` PTX instructions where possible. See [PTX ISA: 9.7.3.2. Floating Point Instructions: copysign] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-copysign).

…vm#107499) This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore. Issue [llvm#89287](llvm#89287)

This patch drop redundant rankReductionStrategy in `populateFoldUnitExtentDimsViaSlicesPatterns` and fixes comment typos.

…lvm#107432) After llvm#92205, LoongArch ISel selects `div.w` for `trunc i64 (sdiv i64 3202030857, (sext i32 X to i64)) to i32`. It is incorrect since `3202030857` is not a signed 32-bit constant. It will produce wrong result when `X == 2`: https://alive2.llvm.org/ce/z/pzfGZZ This patch adds additional `sexti32` checks to operands of `PatGprGpr_32`. Alive2 proof: https://alive2.llvm.org/ce/z/AkH5Mp Fix llvm#107414.

…#107909) Make constructors that take const Record * implicit, allowing us to simplify some range based loops to use that class instance as the loop variable. Change remaining constructor calls to use () instead of {} to construct objects.

Fixes: llvm#107355 Reviewed By: SixWeining Pull Request: llvm#107523

After 2773719 e.g. ``` external/llvm-project/libc/test/src/math/smoke/NextTowardTest.h:12:10: error: module llvm-project//libc/test/src/math/smoke:nexttowardf_test does not depend on a module exporting 'src/__support/CPP/bit.h' ```

…lvm#107897) After landing llvm#99285 we found that the call graph update was causing the following crash when expensive checks are turned on ``` llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe tRC)) && "New call edge is not trivial!"' failed. ``` I have to admit I believe that the call graph update process I did for that patch could be wrong. After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced that `CoroAnnotationElidePass` can be a FunctionPass and rely on the adaptor to update the call graph for us, so long as we properly invalidate the caller's analyses. After this patch, `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer fails under expensive checks.

We shouldn't assume that we're using system zlib installation.

This patch tries to infer is-power-of-2 from assumptions. I don't see that this kind of assumption exists in my dataset. Related issue: rust-lang/rust#129795 Close llvm#58996.

Data type conversion between fp16 and bf16 will generate fptrunc and fpextend nodes, but they are actually bitcast nodes.

partially fixes llvm#70078 ### Changes - Implemented `sign` clang builtin - Linked `sign` clang builtin with `hlsl_intrinsics.h` - Added sema checks for `sign` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - Add codegen for `sign` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - Add codegen tests to `clang/test/CodeGenHLSL/builtins/sign.hlsl` - Add sema tests to `clang/test/SemaHLSL/BuiltIns/sign-errors.hlsl` ### Related PRs - llvm#101987 - llvm#101988 ### Discussion - Should there be a `usign` intrinsic that handles the unsigned cases?

…m#108171)

Handle the case of same pointer used as both inputs to the `CompareOptionRecords`, to avoid emitting errors for equivalent options. Follow-up to llvm#107696.

…lvm#108207) We will deref<>() it later, so this is the right check.

…lvm#53045 Add tests for "sub(select(icmp(a,b),a,b),select(icmp(a,b),b,a)) -> abd(a,b)" patterns that still fail to match to abd nodes This will hopefully be helped by llvm#108218

Vector cases are broken, so leave those for later.

Check cost of all instructions in an interleave group, to prepare for follow-up changes.

…uilding (llvm#108076) * Split buildCoroutineFrame into code related to normalization and code related to actually building the coroutine frame. * This will enable future specialization of buildCoroutineFrame for different ABIs while the normalization can be done by splitCoroutine prior to calling buildCoroutineFrame. See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

This is a three deep expression which is deeper than we've otherwise gone for multiple expansions, but I think it's reasonable to do so. This covers mul by 50, 100, and 200 which are reasonably common naturally arising numbers.

…lvm#107785)" This reverts commit 15106c2. Commit does not pass check-flang on x86 host.

This is a large patch includes the MC level support for V_CVT_F16_F32, V_CVT_F32_F16 and V_LDEXP_F16 in true16 format. This patch includes the asm/disasm changes to encode/decode the 16bit vsrc, vdst and src modifieres for vop and dpp format. This patch is a dependency for many 16 bit instructions while only three instructions are updated to make it easier to review. There will be another patch to support these three instructions in the codeGen level, this patch just replaces these two instructions with its fake16 format.

This return is dead code as the return just above will always be taken.

There was a mistake in a comment regarding dyn_cast_or_null deprication. It was suggested to use cast_if_present instead of dyn_cast_or_null, but that was probably a copy paste mistake, and dyn_cast_if_present is the function that should be used instead of dyn_cast_or_null. Authored-by: Ofri Frishman <[email protected]>

…#108215) Right now `describe()`ing a `FunctionDecl` dups the whole code of the function. Dump only its name.

…108020) The check for IV increments in collectUsersInEntryBlock currently triggers for exit-block PHIs which use the IV start value, resulting in us failing to add the input value for the middle block to these PHIs. Fix this by amending the check for IV increments to only include incoming values that are instructions inside the loop. Fixes llvm#108004

…7921) Change CodeGenInstruction::{TheDef, InfereredFrom} to const pointers. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

Print a warning when the debugger detects a mismatch between the MD5 checksum in the DWARF 5 line table and the file on disk. The warning is printed only once per file.

…lvm#108013) Change SubtargetFeatureInfo to use const Record pointers. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…08027) Change CodeGenRegister to use const Record pointer. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…vm#108193) Change ASTTableGen to use const Record pointers. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…lvm#108195) Change Builtins emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

This is a preparation for upcoming changes to Dense[Map|Set] regarding hardening against OOM scenarios (see [this RFC](https://discourse.llvm.org/t/rfc-malfunction-safe-densemap-denseset/81036/7)). We have changed a lot of code inside Dense[Map|Set] and this preparation change helps to isolate the relevant parts from pure formatting stuff.

This makes it slightly easier to see what's different between the two.

…lvm#107889) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls.

alexey-bataev and others added 30 commits September 9, 2024 12:33

[SLP][NFC]Reorder code for better structural complexity, NFC

b3d2d50

[VPlan] Consistently use VTC for vector trip count in vplan-printing.ll.

3403438

The inconsistency surfaced in llvm#95305. Split off the reduce the diff.

[SandboxIR] Add missing VectorType functions (llvm#107650)

6f8d278

Fills in many missing functions from VectorType

[scudo] Add fragmentation info for each memory group (llvm#107475)

d9a9960

This information helps with tuning the heuristic of selecting memory groups to release the unused pages.

[SandboxVec] Implement Pass class (llvm#107617)

f12e10b

This patch implements the Pass base class and the FunctionPass sub-class that operate on Sandbox IR.

[X86] Add tests support shifts + and in LowerSELECTWithCmpZero; NFC

d148a1a

[X86] Handle shifts + and in LowerSELECTWithCmpZero

88bd507

shifts are the same as sub where rhs == 0 is identity. and is the inverted case where: `SELECT (AND(X,1) == 0), (AND Y, Z), Y` -> `(AND Y, (OR NEG(AND(X, 1)), Z))` With -1 as the identity. Closes llvm#107910

[SandboxIR] Implement UndefValue (llvm#107628)

ae02211

This patch implements sandboxir::UndefValue mirroring llvm::UndefValue.

[NFC][sanitizer] Extract GetDTLSRange (llvm#107934)

81ef8e2

[mlir][linalg][NFC] Drop redundant rankReductionStrategy (llvm#107875)

f3b4e47

This patch drop redundant rankReductionStrategy in `populateFoldUnitExtentDimsViaSlicesPatterns` and fixes comment typos.

Fix unintended extra commit in PR llvm#107499

e64a1c0

[LoongArch] Codegen for concat_vectors with LASX

1ca411c

Fixes: llvm#107355 Reviewed By: SixWeining Pull Request: llvm#107523

[Fuzzer] Passthrough zlib CMake paths into the test (llvm#107926)

eb0e4b1

We shouldn't assume that we're using system zlib installation.

[ValueTracking] Infer is-power-of-2 from assumptions. (llvm#107745)

ffcff4a

This patch tries to infer is-power-of-2 from assumptions. I don't see that this kind of assumption exists in my dataset. Related issue: rust-lang/rust#129795 Close llvm#58996.

[clang] fix half && bfloat16 convert node expr codegen (llvm#89051)

56905da

Data type conversion between fp16 and bf16 will generate fptrunc and fpextend nodes, but they are actually bitcast nodes.

kazutakahirata and others added 29 commits September 11, 2024 06:40

[Transforms] Avoid repeated hash lookups (NFC) (llvm#108139)

4b1b450

[Interfaces] Avoid repeated hash lookups (NFC) (llvm#108140)

6ffa7cd

[AMDGPU] Shrink a live interval instead of recomputing it. NFCI. (llv…

01967e2

…m#108171)

[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (llvm#108186)

7a30b9c

[TableGen] Fix MacOS failure in Option Emitter. (llvm#108225)

ccc4fa1

Handle the case of same pointer used as both inputs to the `CompareOptionRecords`, to avoid emitting errors for equivalent options. Follow-up to llvm#107696.

[clang][bytecode] Check for Pointer dereference in EvaluationResult (l…

35f7cfb

…lvm#108207) We will deref<>() it later, so this is the right check.

[DAG] Add test coverage for ABD "sub of selects" patterns based off l…

43da8a7

…lvm#53045 Add tests for "sub(select(icmp(a,b),a,b),select(icmp(a,b),b,a)) -> abd(a,b)" patterns that still fail to match to abd nodes This will hopefully be helped by llvm#108218

AMDGPU: Add tests for minimumnum/maximumnum intrinsics

ee61a4d

Vector cases are broken, so leave those for later.

[LV] Generalize check lines for interleave group costs.

1741b9c

Check cost of all instructions in an interleave group, to prepare for follow-up changes.

Revert "[flang][runtime] Fix odd "invalid descriptor" runtime crash (l…

050f785

…lvm#107785)" This reverts commit 15106c2. Commit does not pass check-flang on x86 host.

[AMDGPU] Remove dead code in SIISelLowering (NFC) (llvm#108198)

ccc52a8

This return is dead code as the return just above will always be taken.

[clang][transformer] Make describe() terser for NamedDecls. (llvm…

512ceca

…#108215) Right now `describe()`ing a `FunctionDecl` dups the whole code of the function. Dump only its name.

[lldb] Print a warning on checksum mismatch (llvm#107968)

ffa2f53

Print a warning when the debugger detects a mismatch between the MD5 checksum in the DWARF 5 line table and the file on disk. The warning is printed only once per file.

[AMDGPU] Simplify API of matchFPExtFromF16. NFC. (llvm#108223)

ff7eb1d

[SandboxVec][NFC] Rename a variable

d5bc1f4

[RISCV] Reorder zvfbfmin operation actions to match zvfhmin. NFC

30fbfe5

This makes it slightly easier to see what's different between the two.

[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (l…

e55d6f5

…lvm#107889) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls.

[AutoBump] Merge with e55d6f5 (Sep 11)

6118705

cferry-AMD approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with e55d6f5e (Sep 11) (22) #376

[AutoBump] Merge with e55d6f5e (Sep 11) (22) #376

mgehre-amd commented Sep 26, 2024

[AutoBump] Merge with e55d6f5e (Sep 11) (22) #376

Are you sure you want to change the base?

[AutoBump] Merge with e55d6f5e (Sep 11) (22) #376

Conversation

mgehre-amd commented Sep 26, 2024