Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releases/gcc 12 #65

Open
wants to merge 2,646 commits into
base: master
Choose a base branch
from
Open

Releases/gcc 12 #65

wants to merge 2,646 commits into from

Conversation

jacopobrusini
Copy link

Support for Apple Silicon!!!

@jwakely
Copy link
Contributor

jwakely commented Feb 21, 2024

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

GCC Administrator and others added 28 commits September 15, 2024 00:18
Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
in a TImode register.  Otherwise, the TImode variable will be put in
the GPR save area which guarantees only 8-byte alignment.

gcc/

	PR target/116621
	* config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
	a PARALLEL BLKmode container of an EXPR_LIST expression in a
	TImode register.

gcc/testsuite/

	PR target/116621
	* gcc.target/i386/pr116621.c: New test.

Signed-off-by: H.J. Lu <[email protected]>
(cherry picked from commit fa7bbb0)
r12-3495 added maybe_warn_about_constant_value which will crash if
it gets a nameless VAR_DECL, which is what happens in this PR.

We created this VAR_DECL in cp_parser_decomposition_declaration.

	PR c++/116676

gcc/cp/ChangeLog:

	* constexpr.cc (maybe_warn_about_constant_value): Check DECL_NAME.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/constexpr-116676.C: New test.

Reviewed-by: Jason Merrill <[email protected]>
(cherry picked from commit dfe0d43)
This patch is backported from GCC15 with some tweaks.

Since r15-3539, there are requests coming in to add other alias option
documentation. This patch will add all of them, including corei7, corei7-avx,
core-avx-i, core-avx2, atom and slm.

Also in the patch, I reordered that part of documentation, currently all
the CPUs/products are just all over the place. I regrouped them by
date-to-now products (since the very first CPU to latest Panther Lake), P-core
(since the clients become hybrid cores, starting from Sapphire Rapids) and
E-core (since Bonnell). In GCC14 and eariler GCC, Xeon Phi CPUs are still
there, I put them after E-core CPUs.

And in the patch, I refined the product names in documentation.

gcc/ChangeLog:

	* doc/invoke.texi: Add corei7, corei7-avx, core-avx-i,
	core-avx2, atom, and slm. Reorder the -march documentation by
	splitting them into date-to-now products, P-core, E-core and
	Xeon Phi. Refine the product names in documentation.
In s390_expand_insv(), if generating code for ICM et al. src is a MEM
and gen_lowpart might force src into a register such that we end up with
patterns which do not match anymore.  Use adjust_address() instead in
order to preserve a MEM.

Furthermore, it is not straight forward to enforce a subreg.  For
example, in case of a paradoxical subreg, gen_lowpart() may return a
register.  In order to compensate this, s390_gen_lowpart_subreg() emits
a reference to a pseudo which does not coincide with its definition
which is wrong.  Additionally, if dest is a paradoxical subreg, then do
not try to emit a strict_low_part since it could mean that dest was not
initialized even though this might be fixed up later by init-regs.

Splitter for insn *get_tp_64, *zero_extendhisi2_31,
*zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload.
Thus, operands[0] is a hard register and gen_lowpart (m, operands[0])
just returns the hard register for mode m which is fine to use as an
argument for strict_low_part, i.e., we do not need to enforce subregs
here since after reload subregs are supposed to be eliminated anyway.

This fixes gcc.dg/torture/pr111821.c.

gcc/ChangeLog:

	* config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove.
	* config/s390/s390.cc (s390_gen_lowpart_subreg): Remove.
	(s390_expand_insv): Use adjust_address() and emit a
	strict_low_part only in case of a natural subreg.
	* config/s390/s390.md: Use gen_lowpart() instead of
	s390_gen_lowpart_subreg().

(cherry picked from commit 9ebc9fb)
When a memory copy operation is analyzed by analyze_ssa_name, if both the
load and store are made through the same SSA name, the store is overlooked.

gcc/
	* ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Always
	process both the load and the store of a memory copy operation.

gcc/testsuite/
	* gcc.dg/ipa/modref-4.c: New test.
gcc/fortran/ChangeLog:

	PR fortran/100273
	* trans-decl.cc (gfc_create_module_variable): Handle module
	variable also when it is needed for the result specification
	of a contained function.

gcc/testsuite/ChangeLog:

	PR fortran/100273
	* gfortran.dg/pr100273.f90: New test.

(cherry picked from commit 1f462b5)
Ensure for AQ and AR constraints that the resulting displacement after
adding any positive offset less than the size of the object being
referenced is still valid.

gcc/ChangeLog:

	* config/s390/s390.cc (s390_mem_constraint): Check displacement
	for AQ and AR constraints.

(cherry picked from commit 1a71ff3)
Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
survive register allocation.  This in turn leads to wrong register
renaming.  Keeping the current approach would mean we need two insns for
*tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
lines

(define_insn "*tf_to_fprx2_0"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
        (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
                   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "#")

(define_insn "*tf_to_fprx2_0"
  [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
        (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
                   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "vpdi\t%v0,%v1,%v0,1
  [(set_attr "op_type" "VRR")])

and similar for *tf_to_fprx2_1.  Note, pre register allocation operand 0
has mode FPRX2 and afterwards DF once subregs have been eliminated.

Since we always copy a whole vector register into a floating-point
register pair, another way to fix this is to merge *tf_to_fprx2_0 and
*tf_to_fprx2_1 into a single insn which means we don't have to use
subregs at all.  The downside of this is that the assembler template
contains two instructions, now.  The upside is that we don't have to
come up with some artificial insn before RA which might be more
readable/maintainable.  That is implemented by this patch.

In commit r11-4872-ge627cda5686592, the output operand specifier %V was
introduced which is used in tf_to_fprx2 only, now.  Instead of coming up
with its counterpart %F for floating-point registers, which would also
only be used in tf_to_fprx2, I print the operands directly.  This
renders %V unused which is why it is removed by this patch.

gcc/ChangeLog:

	PR target/115860
	* config/s390/s390.cc (print_operand): Remove operand specifier
	%V.
	* config/s390/s390.md (UNSPEC_TF_TO_FPRX2): New.
	* config/s390/vector.md (*tf_to_fprx2_0): Remove.
	(*tf_to_fprx2_1): Remove.
	(tf_to_fprx2): New.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/long-double-asm-abi.c: Adapt
	scan-assembler directive.
	* gcc.target/s390/vector/long-double-to-i64.c: Adapt
	scan-assembler directive.
	* gcc.target/s390/pr115860-1.c: New test.

(cherry picked from commit 46c2538)
Address override only applies to the (reg32) part in the thread address
fs:(reg32).  Don't rewrite thread address like

(set (reg:CCZ 17 flags)
    (compare:CCZ (reg:SI 98 [ __gmpfr_emax.0_1 ])
        (mem/c:SI (plus:SI (plus:SI (unspec:SI [
                            (const_int 0 [0])
                        ] UNSPEC_TP)
                    (reg:SI 107))
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("previous_emax") [flags 0x1a] <var_decl 0x7fffe9a11cf0 previous_emax>)
                        ] UNSPEC_DTPOFF))) [1 previous_emax+0 S4 A32])))

if address override is used to avoid the invalid memory operand like

	cmpl	%fs:previous_emax@dtpoff(%eax), %r12d

gcc/

	PR target/116839
	* config/i386/i386.cc (ix86_rewrite_tls_address_1): Make it
	static.  Return if TLS address is thread register plus an integer
	register.

gcc/testsuite/

	PR target/116839
	* gcc.target/i386/pr116839.c: New file.

Signed-off-by: H.J. Lu <[email protected]>
(cherry picked from commit c79cc30)
2024-02-14  Jan Hubicka  <[email protected]>
	    Karthiban Anbazhagan  <[email protected]>

gcc/ChangeLog:
	* common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver5.
	* common/config/i386/i386-common.cc (processor_names): Add znver5.
	(processor_alias_table): Likewise.
	* common/config/i386/i386-cpuinfo.h (processor_types): Add new zen
	family.
	(processor_subtypes): Add znver5.
	* config.gcc (x86_64-*-* |...): Likewise.
	* config/i386/driver-i386.cc (host_detect_local_cpu): Let
	march=native detect znver5 cpu's.
	* config/i386/i386-c.cc (ix86_target_macros_internal): Add
	znver5.
	* config/i386/i386-options.cc (m_ZNVER5): New definition
	(processor_cost_table): Add znver5.
	* config/i386/i386.cc (ix86_reassociation_width): Likewise.
	* config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER5
	(PTA_ZNVER5): New definition.
	* config/i386/i386.md (define_attr "cpu"): Add znver5.
	(Scheduling descriptions) Add znver5.md.
	* config/i386/x86-tune-costs.h (znver5_cost): New definition.
	* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver5.
	(ix86_adjust_cost): Likewise.
	* config/i386/x86-tune.def (avx512_move_by_pieces): Add m_ZNVER5.
	(avx512_store_by_pieces): Add m_ZNVER5.
	* doc/extend.texi: Add znver5.
	* doc/invoke.texi: Likewise.
	* config/i386/znver4.md: Rename to zn4zn5.md; combine znver4 and znver5 Scheduler.

gcc/testsuite/ChangeLog:
	* g++.target/i386/mv29.C: Handle znver5 arch.
	* gcc.target/i386/funcspec-56.inc:Likewise.

(cherry picked from commit d0aa0af)
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply copied from the bogus znver4 costs.  The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.

	* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
	load and store cost from the aligned costs.

(cherry picked from commit 8963937)
testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.

While on znver4 AVX512 code fared well with FMA, it was because of the split
registers. Znver5 benefits from avoding FMA on all widths.  This may be different
with the mobile version though.

On naive matrix multiplication benchmark the difference is 8% with -O3
only since with -Ofast loop interchange solves the problem differently.
It is 30% win, for example, on S323 from TSVC:

real_t s323(struct args_t * func_args)
{

//    recurrences
//    coupled recurrence

    initialise_arrays(__func__);
    gettimeofday(&func_args->t1, NULL);

    for (int nl = 0; nl < iterations/2; nl++) {
        for (int i = 1; i < LEN_1D; i++) {
            a[i] = b[i-1] + c[i] * d[i];
            b[i] = a[i] + c[i] * e[i];
        }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
    }

    gettimeofday(&func_args->t2, NULL);
    return calc_checksum(__func__);
}

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
	znver5.
	(X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
	(X86_TUNE_AVOID_512FMA_CHAINS): Likewise.

(cherry picked from commit d6360b4)
peter-bergner and others added 5 commits January 24, 2025 09:23
For invalid constant operand values used in built-in functions, return
const0_rtx to signify an error occurred during expansion.

2025-01-16  Peter Bergner  <[email protected]>

gcc/
	* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Return
	const0_rtx when there is an error.

gcc/testsuite/
	* gcc.target/powerpc/mma-builtin-error.c: New test.

(cherry picked from commit 0696af7)
NinaRanns pushed a commit to NinaRanns/gcc that referenced this pull request Jan 28, 2025
…on-r15-7214-g0710024b5bd861

Contracts nonattr rebase on r15 7214 g0710024b5bd861
GCC Administrator and others added 24 commits January 29, 2025 00:21
At the end of a sequence like:
 #pragma GCC push_options
 ...
 #pragma GCC pop_options

the handler for pop_options calls cl_optimization_compare() (as generated by
optc-save-gen.awk) to make sure that all global state has been restored to
the value it had prior to the push_options call. The verification is
performed for almost all entries in the global_options struct. This leads to
unexpected checking asserts, as discussed in the PR, in case the state of
warnings-related options has been intentionally modified in between
push_options and pop_options via a call to #pragma GCC diagnostic. Address
that by skipping the verification for CL_WARNING-flagged options.

gcc/ChangeLog:

	PR middle-end/115913
	* optc-save-gen.awk (cl_optimization_compare): Skip options with
	CL_WARNING flag.

gcc/testsuite/ChangeLog:

	PR middle-end/115913
	* c-c++-common/cpp/pr115913.c: New test.
x is not a macro argument.  It just happens to work as final.cc passes
x for 2nd argument:

final.cc:      ASM_OUTPUT_SYMBOL_REF (file, x);

	PR target/118825
	* config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with
	SYM.

Signed-off-by: H.J. Lu <[email protected]>
(cherry picked from commit 7317fc0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.