[i386] : Fix PR 91654: Runtime SPEC regression on Haswell

Message ID CAFULd4bb6LM8xDg3HgKqcA3DK7_67_yhvg+DOqvAjkL54Aim8A@mail.gmail.com
State New
Headers show
Series
  • [i386] : Fix PR 91654: Runtime SPEC regression on Haswell
Related show

Commit Message

Uros Bizjak Sept. 6, 2019, 7:31 p.m.
On Thu, Sep 5, 2019 at 10:53 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Thu, Sep 5, 2019 at 7:47 AM Hongtao Liu <crazylht@gmail.com> wrote:

> >

> > Change cost from 2->6 got

> > -------------

> > 531.deepsjeng_r  9.64%

> > 548.exchange_r  10.24%

> > 557.xc_r              7.99%

> > 508.namd_r         1.08%

> > 527.cam4_r          6.91%

> > 553.nab_r            3.06%

> > ------------

> >

> > for 531,548,557,527, even better comparing to version before regression.

> > for 508,533, still little regressions comparing to version before regression.

>

> Good, that brings us into "noise" region.

>

> Based on these results and other findings, I propose the following solution:

>

> - The inter-regset move costs of architectures, that have been defined

> before r125951 remain the same. These are: size, i386, i486, pentium,

> pentiumpro, geode, k6, athlon, k8, amdfam10, pentium4 and nocona.

> - bdver, btver1 and btver2 have costs higher than 8, so they are not affected.

> - lakemont, znver1, znver2, atom, slm, intel and generic costs have

> inter-regset costs above intra-regset and below or equal memory

> load/store cost, should remain as they are. Additionally, intel and

> generic costs are regularly re-tuned.

> -  only skylake and core costs remain problematic

>

> So, I propose to raise XMM<->intreg costs of skylake and core

> architectures to 6 to solve the regression. These can be fine-tuned

> later, we are now able to change the cost for RA independently of RTX

> costs. Also, the RA cost can be asymmetrical.

>

> Attached patch implements the proposal. If there are no other

> proposals or discussions, I plan to commit it on Friday.


2019-09-06  UroŇ° Bizjak  <ubizjak@gmail.com>

    PR target/91654
    * config/i386/x86-tune-costs.h (skylake_cost): Raise the
    cost of SSE->integer and integer->SSE moves from 2 to 6.
    (core_cost): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.

Patch

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 3381b8bf143c..00edece3eb68 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1610,7 +1610,7 @@  struct processor_costs skylake_cost = {
 					   in 32,64,128,256 and 512-bit */
   {8, 8, 8, 12, 24},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  2, 2,					/* SSE->integer and integer->SSE moves */
+  6, 6,					/* SSE->integer and integer->SSE moves */
   /* End of register allocator costs.  */
   },
 
@@ -2555,7 +2555,7 @@  struct processor_costs core_cost = {
 					   in 32,64,128,256 and 512-bit */
   {6, 6, 6, 6, 12},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
-  2, 2,					/* SSE->integer and integer->SSE moves */
+  6, 6,					/* SSE->integer and integer->SSE moves */
   /* End of register allocator costs.  */
   },