mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Releases/gcc 12 #65
Open
jacopobrusini
wants to merge
2,646
commits into
master
Choose a base branch
from
releases/gcc-12
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Releases/gcc 12 #65
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time. Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place. |
atahanozbayram
approved these changes
Apr 2, 2024
Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression in a TImode register. Otherwise, the TImode variable will be put in the GPR save area which guarantees only 8-byte alignment. gcc/ PR target/116621 * config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for a PARALLEL BLKmode container of an EXPR_LIST expression in a TImode register. gcc/testsuite/ PR target/116621 * gcc.target/i386/pr116621.c: New test. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit fa7bbb0)
r12-3495 added maybe_warn_about_constant_value which will crash if it gets a nameless VAR_DECL, which is what happens in this PR. We created this VAR_DECL in cp_parser_decomposition_declaration. PR c++/116676 gcc/cp/ChangeLog: * constexpr.cc (maybe_warn_about_constant_value): Check DECL_NAME. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/constexpr-116676.C: New test. Reviewed-by: Jason Merrill <[email protected]> (cherry picked from commit dfe0d43)
This patch is backported from GCC15 with some tweaks. Since r15-3539, there are requests coming in to add other alias option documentation. This patch will add all of them, including corei7, corei7-avx, core-avx-i, core-avx2, atom and slm. Also in the patch, I reordered that part of documentation, currently all the CPUs/products are just all over the place. I regrouped them by date-to-now products (since the very first CPU to latest Panther Lake), P-core (since the clients become hybrid cores, starting from Sapphire Rapids) and E-core (since Bonnell). In GCC14 and eariler GCC, Xeon Phi CPUs are still there, I put them after E-core CPUs. And in the patch, I refined the product names in documentation. gcc/ChangeLog: * doc/invoke.texi: Add corei7, corei7-avx, core-avx-i, core-avx2, atom, and slm. Reorder the -march documentation by splitting them into date-to-now products, P-core, E-core and Xeon Phi. Refine the product names in documentation.
In s390_expand_insv(), if generating code for ICM et al. src is a MEM and gen_lowpart might force src into a register such that we end up with patterns which do not match anymore. Use adjust_address() instead in order to preserve a MEM. Furthermore, it is not straight forward to enforce a subreg. For example, in case of a paradoxical subreg, gen_lowpart() may return a register. In order to compensate this, s390_gen_lowpart_subreg() emits a reference to a pseudo which does not coincide with its definition which is wrong. Additionally, if dest is a paradoxical subreg, then do not try to emit a strict_low_part since it could mean that dest was not initialized even though this might be fixed up later by init-regs. Splitter for insn *get_tp_64, *zero_extendhisi2_31, *zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload. Thus, operands[0] is a hard register and gen_lowpart (m, operands[0]) just returns the hard register for mode m which is fine to use as an argument for strict_low_part, i.e., we do not need to enforce subregs here since after reload subregs are supposed to be eliminated anyway. This fixes gcc.dg/torture/pr111821.c. gcc/ChangeLog: * config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove. * config/s390/s390.cc (s390_gen_lowpart_subreg): Remove. (s390_expand_insv): Use adjust_address() and emit a strict_low_part only in case of a natural subreg. * config/s390/s390.md: Use gen_lowpart() instead of s390_gen_lowpart_subreg(). (cherry picked from commit 9ebc9fb)
When a memory copy operation is analyzed by analyze_ssa_name, if both the load and store are made through the same SSA name, the store is overlooked. gcc/ * ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Always process both the load and the store of a memory copy operation. gcc/testsuite/ * gcc.dg/ipa/modref-4.c: New test.
gcc/fortran/ChangeLog: PR fortran/100273 * trans-decl.cc (gfc_create_module_variable): Handle module variable also when it is needed for the result specification of a contained function. gcc/testsuite/ChangeLog: PR fortran/100273 * gfortran.dg/pr100273.f90: New test. (cherry picked from commit 1f462b5)
Ensure for AQ and AR constraints that the resulting displacement after adding any positive offset less than the size of the object being referenced is still valid. gcc/ChangeLog: * config/s390/s390.cc (s390_mem_constraint): Check displacement for AQ and AR constraints. (cherry picked from commit 1a71ff3)
Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1 survive register allocation. This in turn leads to wrong register renaming. Keeping the current approach would mean we need two insns for *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along the lines (define_insn "*tf_to_fprx2_0" [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0) (unspec:DF [(match_operand:TF 1 "general_operand" "v")] UNSPEC_TF_TO_FPRX2_0))] "TARGET_VXE" "#") (define_insn "*tf_to_fprx2_0" [(set (match_operand:DF 0 "nonimmediate_operand" "=f") (unspec:DF [(match_operand:TF 1 "general_operand" "v")] UNSPEC_TF_TO_FPRX2_0))] "TARGET_VXE" "vpdi\t%v0,%v1,%v0,1 [(set_attr "op_type" "VRR")]) and similar for *tf_to_fprx2_1. Note, pre register allocation operand 0 has mode FPRX2 and afterwards DF once subregs have been eliminated. Since we always copy a whole vector register into a floating-point register pair, another way to fix this is to merge *tf_to_fprx2_0 and *tf_to_fprx2_1 into a single insn which means we don't have to use subregs at all. The downside of this is that the assembler template contains two instructions, now. The upside is that we don't have to come up with some artificial insn before RA which might be more readable/maintainable. That is implemented by this patch. In commit r11-4872-ge627cda5686592, the output operand specifier %V was introduced which is used in tf_to_fprx2 only, now. Instead of coming up with its counterpart %F for floating-point registers, which would also only be used in tf_to_fprx2, I print the operands directly. This renders %V unused which is why it is removed by this patch. gcc/ChangeLog: PR target/115860 * config/s390/s390.cc (print_operand): Remove operand specifier %V. * config/s390/s390.md (UNSPEC_TF_TO_FPRX2): New. * config/s390/vector.md (*tf_to_fprx2_0): Remove. (*tf_to_fprx2_1): Remove. (tf_to_fprx2): New. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-abi.c: Adapt scan-assembler directive. * gcc.target/s390/vector/long-double-to-i64.c: Adapt scan-assembler directive. * gcc.target/s390/pr115860-1.c: New test. (cherry picked from commit 46c2538)
Address override only applies to the (reg32) part in the thread address fs:(reg32). Don't rewrite thread address like (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 98 [ __gmpfr_emax.0_1 ]) (mem/c:SI (plus:SI (plus:SI (unspec:SI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:SI 107)) (const:SI (unspec:SI [ (symbol_ref:SI ("previous_emax") [flags 0x1a] <var_decl 0x7fffe9a11cf0 previous_emax>) ] UNSPEC_DTPOFF))) [1 previous_emax+0 S4 A32]))) if address override is used to avoid the invalid memory operand like cmpl %fs:previous_emax@dtpoff(%eax), %r12d gcc/ PR target/116839 * config/i386/i386.cc (ix86_rewrite_tls_address_1): Make it static. Return if TLS address is thread register plus an integer register. gcc/testsuite/ PR target/116839 * gcc.target/i386/pr116839.c: New file. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit c79cc30)
2024-02-14 Jan Hubicka <[email protected]> Karthiban Anbazhagan <[email protected]> gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver5. * common/config/i386/i386-common.cc (processor_names): Add znver5. (processor_alias_table): Likewise. * common/config/i386/i386-cpuinfo.h (processor_types): Add new zen family. (processor_subtypes): Add znver5. * config.gcc (x86_64-*-* |...): Likewise. * config/i386/driver-i386.cc (host_detect_local_cpu): Let march=native detect znver5 cpu's. * config/i386/i386-c.cc (ix86_target_macros_internal): Add znver5. * config/i386/i386-options.cc (m_ZNVER5): New definition (processor_cost_table): Add znver5. * config/i386/i386.cc (ix86_reassociation_width): Likewise. * config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER5 (PTA_ZNVER5): New definition. * config/i386/i386.md (define_attr "cpu"): Add znver5. (Scheduling descriptions) Add znver5.md. * config/i386/x86-tune-costs.h (znver5_cost): New definition. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver5. (ix86_adjust_cost): Likewise. * config/i386/x86-tune.def (avx512_move_by_pieces): Add m_ZNVER5. (avx512_store_by_pieces): Add m_ZNVER5. * doc/extend.texi: Add znver5. * doc/invoke.texi: Likewise. * config/i386/znver4.md: Rename to zn4zn5.md; combine znver4 and znver5 Scheduler. gcc/testsuite/ChangeLog: * g++.target/i386/mv29.C: Handle znver5 arch. * gcc.target/i386/funcspec-56.inc:Likewise. (cherry picked from commit d0aa0af)
Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply copied from the bogus znver4 costs. The following makes the unaligned costs equal to the aligned costs like in the fixed znver4 version. * config/i386/x86-tune-costs.h (znver5_cost): Update unaligned load and store cost from the aligned costs. (cherry picked from commit 8963937)
testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fared well with FMA, it was because of the split registers. Znver5 benefits from avoding FMA on all widths. This may be different with the mobile version though. On naive matrix multiplication benchmark the difference is 8% with -O3 only since with -Ofast loop interchange solves the problem differently. It is 30% win, for example, on S323 from TSVC: real_t s323(struct args_t * func_args) { // recurrences // coupled recurrence initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl = 0; nl < iterations/2; nl++) { for (int i = 1; i < LEN_1D; i++) { a[i] = b[i-1] + c[i] * d[i]; b[i] = a[i] + c[i] * e[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for znver5. (X86_TUNE_AVOID_256FMA_CHAINS): Likewise. (X86_TUNE_AVOID_512FMA_CHAINS): Likewise. (cherry picked from commit d6360b4)
For invalid constant operand values used in built-in functions, return const0_rtx to signify an error occurred during expansion. 2025-01-16 Peter Bergner <[email protected]> gcc/ * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Return const0_rtx when there is an error. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-error.c: New test. (cherry picked from commit 0696af7)
NinaRanns
pushed a commit
to NinaRanns/gcc
that referenced
this pull request
Jan 28, 2025
…on-r15-7214-g0710024b5bd861 Contracts nonattr rebase on r15 7214 g0710024b5bd861
At the end of a sequence like: #pragma GCC push_options ... #pragma GCC pop_options the handler for pop_options calls cl_optimization_compare() (as generated by optc-save-gen.awk) to make sure that all global state has been restored to the value it had prior to the push_options call. The verification is performed for almost all entries in the global_options struct. This leads to unexpected checking asserts, as discussed in the PR, in case the state of warnings-related options has been intentionally modified in between push_options and pop_options via a call to #pragma GCC diagnostic. Address that by skipping the verification for CL_WARNING-flagged options. gcc/ChangeLog: PR middle-end/115913 * optc-save-gen.awk (cl_optimization_compare): Skip options with CL_WARNING flag. gcc/testsuite/ChangeLog: PR middle-end/115913 * c-c++-common/cpp/pr115913.c: New test.
x is not a macro argument. It just happens to work as final.cc passes x for 2nd argument: final.cc: ASM_OUTPUT_SYMBOL_REF (file, x); PR target/118825 * config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with SYM. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit 7317fc0)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for Apple Silicon!!!