Releases/gcc 12 #65

jacopobrusini · 2022-06-04T17:20:00Z

Support for Apple Silicon!!!

jwakely · 2024-02-21T00:10:33Z

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression in a TImode register. Otherwise, the TImode variable will be put in the GPR save area which guarantees only 8-byte alignment. gcc/ PR target/116621 * config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for a PARALLEL BLKmode container of an EXPR_LIST expression in a TImode register. gcc/testsuite/ PR target/116621 * gcc.target/i386/pr116621.c: New test. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit fa7bbb0)

r12-3495 added maybe_warn_about_constant_value which will crash if it gets a nameless VAR_DECL, which is what happens in this PR. We created this VAR_DECL in cp_parser_decomposition_declaration. PR c++/116676 gcc/cp/ChangeLog: * constexpr.cc (maybe_warn_about_constant_value): Check DECL_NAME. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/constexpr-116676.C: New test. Reviewed-by: Jason Merrill <[email protected]> (cherry picked from commit dfe0d43)

This patch is backported from GCC15 with some tweaks. Since r15-3539, there are requests coming in to add other alias option documentation. This patch will add all of them, including corei7, corei7-avx, core-avx-i, core-avx2, atom and slm. Also in the patch, I reordered that part of documentation, currently all the CPUs/products are just all over the place. I regrouped them by date-to-now products (since the very first CPU to latest Panther Lake), P-core (since the clients become hybrid cores, starting from Sapphire Rapids) and E-core (since Bonnell). In GCC14 and eariler GCC, Xeon Phi CPUs are still there, I put them after E-core CPUs. And in the patch, I refined the product names in documentation. gcc/ChangeLog: * doc/invoke.texi: Add corei7, corei7-avx, core-avx-i, core-avx2, atom, and slm. Reorder the -march documentation by splitting them into date-to-now products, P-core, E-core and Xeon Phi. Refine the product names in documentation.

In s390_expand_insv(), if generating code for ICM et al. src is a MEM and gen_lowpart might force src into a register such that we end up with patterns which do not match anymore. Use adjust_address() instead in order to preserve a MEM. Furthermore, it is not straight forward to enforce a subreg. For example, in case of a paradoxical subreg, gen_lowpart() may return a register. In order to compensate this, s390_gen_lowpart_subreg() emits a reference to a pseudo which does not coincide with its definition which is wrong. Additionally, if dest is a paradoxical subreg, then do not try to emit a strict_low_part since it could mean that dest was not initialized even though this might be fixed up later by init-regs. Splitter for insn *get_tp_64, *zero_extendhisi2_31, *zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload. Thus, operands[0] is a hard register and gen_lowpart (m, operands[0]) just returns the hard register for mode m which is fine to use as an argument for strict_low_part, i.e., we do not need to enforce subregs here since after reload subregs are supposed to be eliminated anyway. This fixes gcc.dg/torture/pr111821.c. gcc/ChangeLog: * config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove. * config/s390/s390.cc (s390_gen_lowpart_subreg): Remove. (s390_expand_insv): Use adjust_address() and emit a strict_low_part only in case of a natural subreg. * config/s390/s390.md: Use gen_lowpart() instead of s390_gen_lowpart_subreg(). (cherry picked from commit 9ebc9fb)

When a memory copy operation is analyzed by analyze_ssa_name, if both the load and store are made through the same SSA name, the store is overlooked. gcc/ * ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Always process both the load and the store of a memory copy operation. gcc/testsuite/ * gcc.dg/ipa/modref-4.c: New test.

gcc/fortran/ChangeLog: PR fortran/100273 * trans-decl.cc (gfc_create_module_variable): Handle module variable also when it is needed for the result specification of a contained function. gcc/testsuite/ChangeLog: PR fortran/100273 * gfortran.dg/pr100273.f90: New test. (cherry picked from commit 1f462b5)

Ensure for AQ and AR constraints that the resulting displacement after adding any positive offset less than the size of the object being referenced is still valid. gcc/ChangeLog: * config/s390/s390.cc (s390_mem_constraint): Check displacement for AQ and AR constraints. (cherry picked from commit 1a71ff3)

Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1 survive register allocation. This in turn leads to wrong register renaming. Keeping the current approach would mean we need two insns for *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along the lines (define_insn "*tf_to_fprx2_0" [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0) (unspec:DF [(match_operand:TF 1 "general_operand" "v")] UNSPEC_TF_TO_FPRX2_0))] "TARGET_VXE" "#") (define_insn "*tf_to_fprx2_0" [(set (match_operand:DF 0 "nonimmediate_operand" "=f") (unspec:DF [(match_operand:TF 1 "general_operand" "v")] UNSPEC_TF_TO_FPRX2_0))] "TARGET_VXE" "vpdi\t%v0,%v1,%v0,1 [(set_attr "op_type" "VRR")]) and similar for *tf_to_fprx2_1. Note, pre register allocation operand 0 has mode FPRX2 and afterwards DF once subregs have been eliminated. Since we always copy a whole vector register into a floating-point register pair, another way to fix this is to merge *tf_to_fprx2_0 and *tf_to_fprx2_1 into a single insn which means we don't have to use subregs at all. The downside of this is that the assembler template contains two instructions, now. The upside is that we don't have to come up with some artificial insn before RA which might be more readable/maintainable. That is implemented by this patch. In commit r11-4872-ge627cda5686592, the output operand specifier %V was introduced which is used in tf_to_fprx2 only, now. Instead of coming up with its counterpart %F for floating-point registers, which would also only be used in tf_to_fprx2, I print the operands directly. This renders %V unused which is why it is removed by this patch. gcc/ChangeLog: PR target/115860 * config/s390/s390.cc (print_operand): Remove operand specifier %V. * config/s390/s390.md (UNSPEC_TF_TO_FPRX2): New. * config/s390/vector.md (*tf_to_fprx2_0): Remove. (*tf_to_fprx2_1): Remove. (tf_to_fprx2): New. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-abi.c: Adapt scan-assembler directive. * gcc.target/s390/vector/long-double-to-i64.c: Adapt scan-assembler directive. * gcc.target/s390/pr115860-1.c: New test. (cherry picked from commit 46c2538)

Address override only applies to the (reg32) part in the thread address fs:(reg32). Don't rewrite thread address like (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 98 [ __gmpfr_emax.0_1 ]) (mem/c:SI (plus:SI (plus:SI (unspec:SI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:SI 107)) (const:SI (unspec:SI [ (symbol_ref:SI ("previous_emax") [flags 0x1a] <var_decl 0x7fffe9a11cf0 previous_emax>) ] UNSPEC_DTPOFF))) [1 previous_emax+0 S4 A32]))) if address override is used to avoid the invalid memory operand like cmpl %fs:previous_emax@dtpoff(%eax), %r12d gcc/ PR target/116839 * config/i386/i386.cc (ix86_rewrite_tls_address_1): Make it static. Return if TLS address is thread register plus an integer register. gcc/testsuite/ PR target/116839 * gcc.target/i386/pr116839.c: New file. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit c79cc30)

2024-02-14 Jan Hubicka <[email protected]> Karthiban Anbazhagan <[email protected]> gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver5. * common/config/i386/i386-common.cc (processor_names): Add znver5. (processor_alias_table): Likewise. * common/config/i386/i386-cpuinfo.h (processor_types): Add new zen family. (processor_subtypes): Add znver5. * config.gcc (x86_64-*-* |...): Likewise. * config/i386/driver-i386.cc (host_detect_local_cpu): Let march=native detect znver5 cpu's. * config/i386/i386-c.cc (ix86_target_macros_internal): Add znver5. * config/i386/i386-options.cc (m_ZNVER5): New definition (processor_cost_table): Add znver5. * config/i386/i386.cc (ix86_reassociation_width): Likewise. * config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER5 (PTA_ZNVER5): New definition. * config/i386/i386.md (define_attr "cpu"): Add znver5. (Scheduling descriptions) Add znver5.md. * config/i386/x86-tune-costs.h (znver5_cost): New definition. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver5. (ix86_adjust_cost): Likewise. * config/i386/x86-tune.def (avx512_move_by_pieces): Add m_ZNVER5. (avx512_store_by_pieces): Add m_ZNVER5. * doc/extend.texi: Add znver5. * doc/invoke.texi: Likewise. * config/i386/znver4.md: Rename to zn4zn5.md; combine znver4 and znver5 Scheduler. gcc/testsuite/ChangeLog: * g++.target/i386/mv29.C: Handle znver5 arch. * gcc.target/i386/funcspec-56.inc:Likewise. (cherry picked from commit d0aa0af)

Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply copied from the bogus znver4 costs. The following makes the unaligned costs equal to the aligned costs like in the fixed znver4 version. * config/i386/x86-tune-costs.h (znver5_cost): Update unaligned load and store cost from the aligned costs. (cherry picked from commit 8963937)

testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fared well with FMA, it was because of the split registers. Znver5 benefits from avoding FMA on all widths. This may be different with the mobile version though. On naive matrix multiplication benchmark the difference is 8% with -O3 only since with -Ofast loop interchange solves the problem differently. It is 30% win, for example, on S323 from TSVC: real_t s323(struct args_t * func_args) { // recurrences // coupled recurrence initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl = 0; nl < iterations/2; nl++) { for (int i = 1; i < LEN_1D; i++) { a[i] = b[i-1] + c[i] * d[i]; b[i] = a[i] + c[i] * e[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for znver5. (X86_TUNE_AVOID_256FMA_CHAINS): Likewise. (X86_TUNE_AVOID_512FMA_CHAINS): Likewise. (cherry picked from commit d6360b4)

For invalid constant operand values used in built-in functions, return const0_rtx to signify an error occurred during expansion. 2025-01-16 Peter Bergner <[email protected]> gcc/ * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Return const0_rtx when there is an error. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-error.c: New test. (cherry picked from commit 0696af7)

…on-r15-7214-g0710024b5bd861 Contracts nonattr rebase on r15 7214 g0710024b5bd861

At the end of a sequence like: #pragma GCC push_options ... #pragma GCC pop_options the handler for pop_options calls cl_optimization_compare() (as generated by optc-save-gen.awk) to make sure that all global state has been restored to the value it had prior to the push_options call. The verification is performed for almost all entries in the global_options struct. This leads to unexpected checking asserts, as discussed in the PR, in case the state of warnings-related options has been intentionally modified in between push_options and pop_options via a call to #pragma GCC diagnostic. Address that by skipping the verification for CL_WARNING-flagged options. gcc/ChangeLog: PR middle-end/115913 * optc-save-gen.awk (cl_optimization_compare): Skip options with CL_WARNING flag. gcc/testsuite/ChangeLog: PR middle-end/115913 * c-c++-common/cpp/pr115913.c: New test.

x is not a macro argument. It just happens to work as final.cc passes x for 2nd argument: final.cc: ASM_OUTPUT_SYMBOL_REF (file, x); PR target/118825 * config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with SYM. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit 7317fc0)

atahanozbayram approved these changes Apr 2, 2024

View reviewed changes

GCC Administrator and others added 28 commits September 15, 2024 00:18

Daily bump.

5c8f84c

Daily bump.

772393c

Daily bump.

46bf97c

Daily bump.

0ab2379

Daily bump.

f467bbb

Daily bump.

645a11f

Daily bump.

a761f10

Daily bump.

2a6e9bf

Daily bump.

917b6c6

Daily bump.

52bb3a2

Daily bump.

50c8048

Daily bump.

596d857

Daily bump.

63a5a1f

Daily bump.

e282606

Daily bump.

3cc85e9

Daily bump.

c44494e

peter-bergner and others added 5 commits January 24, 2025 09:23

Daily bump.

809bf0c

Daily bump.

5032e62

Daily bump.

71c036f

Daily bump.

88e26ba

NinaRanns pushed a commit to NinaRanns/gcc that referenced this pull request Jan 28, 2025

Merge pull request gcc-mirror#65 from iains/contracts-nonattr-rebase-…

5128291

…on-r15-7214-g0710024b5bd861 Contracts nonattr rebase on r15 7214 g0710024b5bd861

GCC Administrator and others added 24 commits January 29, 2025 00:21

Daily bump.

d2183f8

Daily bump.

5d0dcc6

Daily bump.

19aaf8c

Daily bump.

823a3b7

Daily bump.

71c41f4

Daily bump.

c15745b

Daily bump.

11d14c3

Daily bump.

3518019

Daily bump.

14de2b6

Daily bump.

a2d7edb

Daily bump.

312576b

Daily bump.

5bcf181

Daily bump.

e15b196

Daily bump.

53b84cf

Daily bump.

775b03b

Daily bump.

c72f9c0

Daily bump.

be0d9f7

Daily bump.

0e6f7bc

Daily bump.

f4eb061

Daily bump.

20bea93

Daily bump.

d468550

Daily bump.

ed78c09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases/gcc 12 #65

Releases/gcc 12 #65

jacopobrusini commented Jun 4, 2022

jwakely commented Feb 21, 2024

Releases/gcc 12 #65

Are you sure you want to change the base?

Releases/gcc 12 #65

Conversation

jacopobrusini commented Jun 4, 2022

jwakely commented Feb 21, 2024