[i386] : Do not limit the cost of moves to/from XMM register to minimum 8.

Message ID CAFULd4ZOgSxG5sn2Y7Rbpn8mcykO29mV_LenpKK+KuAte-0m_g@mail.gmail.com
State New
Headers show
Series
  • [i386] : Do not limit the cost of moves to/from XMM register to minimum 8.
Related show

Commit Message

Uros Bizjak Aug. 29, 2019, 6:08 p.m.
2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>

    * config/i386/i386.c (ix86_register_move_cost): Do not
    limit the cost of moves to/from XMM register to minimum 8.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Actually committed as r274994 with the wrong ChangeLog.

Uros.

Comments

Hongtao Liu Aug. 30, 2019, 12:10 a.m. | #1
On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>

>

>     * config/i386/i386.c (ix86_register_move_cost): Do not

>     limit the cost of moves to/from XMM register to minimum 8.

>

> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

>

> Actually committed as r274994 with the wrong ChangeLog.

>

> Uros.


There is 11% regression in 548.exchange_r of SPEC2017.

Reason for the regression:
For 548.exchange_r, a lot of movements between gpr and xmm are
generated as expected,
and it reduced  clocksticks by 3%.
But  however maybe too many xmm registers are used,
a frequency reduction issue is triggered(average frequency reduced by 13%).
So totally it takes more time.



-- 
BR,
Hongtao
Hongtao Liu Aug. 30, 2019, 12:15 a.m. | #2
On Fri, Aug 30, 2019 at 8:10 AM Hongtao Liu <crazylht@gmail.com> wrote:
>

> On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>

> >

> >     * config/i386/i386.c (ix86_register_move_cost): Do not

> >     limit the cost of moves to/from XMM register to minimum 8.

> >

> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

> >

> > Actually committed as r274994 with the wrong ChangeLog.

> >

> > Uros.

>

> There is 11% regression in 548.exchange_r of SPEC2017.

>

> Reason for the regression:

> For 548.exchange_r, a lot of movements between gpr and xmm are

> generated as expected,

> and it reduced  clocksticks by 3%.

> But  however maybe too many xmm registers are used,

> a frequency reduction issue is triggered(average frequency reduced by 13%).

> So totally it takes more time.

>

>

>

> --

> BR,

> Hongtao


Tested on skylake workstation.

-- 
BR,
Hongtao
Uros Bizjak Aug. 30, 2019, 6:18 a.m. | #3
On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu <crazylht@gmail.com> wrote:
>

> On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>

> >

> >     * config/i386/i386.c (ix86_register_move_cost): Do not

> >     limit the cost of moves to/from XMM register to minimum 8.

> >

> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

> >

> > Actually committed as r274994 with the wrong ChangeLog.

> >

> > Uros.

>

> There is 11% regression in 548.exchange_r of SPEC2017.

>

> Reason for the regression:

> For 548.exchange_r, a lot of movements between gpr and xmm are

> generated as expected,

> and it reduced  clocksticks by 3%.


This is OK, and expected from the patch.

> But  however maybe too many xmm registers are used,

> a frequency reduction issue is triggered(average frequency reduced by 13%).

> So totally it takes more time.


This is a secondary effect that is currently not modelled by the compiler.

However, I expected that SSE <-> int moves in x86-tune-cost.h will
have to be retuned. Up to now, both directions were limited to minimum
8, so any value lower than 8 was ignored. However, minimum was set to
work-around certain limitation in reload, which is not needed anymore.

You can simply set the values of SSE <-> int moves to 8 (which is an
arbitrary value!) to restore the previous behaviour, but I think that
a more precise cost value should be determined, probably a different
one for each direction. But until register pressure effects are
modelled, any artificially higher value will represent a workaround
and not the true reg-reg move cost.

Uros.
Hongtao Liu Aug. 30, 2019, 7:24 a.m. | #4
On Fri, Aug 30, 2019 at 2:18 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu <crazylht@gmail.com> wrote:

> >

> > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>

> > >

> > >     * config/i386/i386.c (ix86_register_move_cost): Do not

> > >     limit the cost of moves to/from XMM register to minimum 8.

> > >

> > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

> > >

> > > Actually committed as r274994 with the wrong ChangeLog.

> > >

> > > Uros.

> >

> > There is 11% regression in 548.exchange_r of SPEC2017.

> >

> > Reason for the regression:

> > For 548.exchange_r, a lot of movements between gpr and xmm are

> > generated as expected,

> > and it reduced  clocksticks by 3%.

>

> This is OK, and expected from the patch.

>

> > But  however maybe too many xmm registers are used,

> > a frequency reduction issue is triggered(average frequency reduced by 13%).

> > So totally it takes more time.

>

> This is a secondary effect that is currently not modelled by the compiler.

>

> However, I expected that SSE <-> int moves in x86-tune-cost.h will

> have to be retuned. Up to now, both directions were limited to minimum

> 8, so any value lower than 8 was ignored. However, minimum was set to

> work-around certain limitation in reload, which is not needed anymore.

>

> You can simply set the values of SSE <-> int moves to 8 (which is an

> arbitrary value!) to restore the previous behaviour, but I think that

> a more precise cost value should be determined, probably a different

> one for each direction. But until register pressure effects are

> modelled, any artificially higher value will represent a workaround

> and not the true reg-reg move cost.

>

> Uros.


Yes,  we're still working on skylake_cost model.

-- 
BR,
Hongtao

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 49ab50ea41bf..11c75be113e0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18601,9 +18601,9 @@  ix86_register_move_cost (machine_mode mode, reg_class_t class1_i,
        where integer modes in SSE registers are not tieable
        because of missing QImode and HImode moves to, from or between
        MMX/SSE registers.  */
-    return MAX (8, SSE_CLASS_P (class1)
-		? ix86_cost->hard_register.sse_to_integer
-		: ix86_cost->hard_register.integer_to_sse);
+    return (SSE_CLASS_P (class1)
+	    ? ix86_cost->hard_register.sse_to_integer
+	    : ix86_cost->hard_register.integer_to_sse);
 
   if (MAYBE_FLOAT_CLASS_P (class1))
     return ix86_cost->hard_register.fp_move;