[v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.

Message ID 001001d7c00b$92174210$b645c630$@nextmovesoftware.com
State New
Headers show
Series
  • [v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
Related show

Commit Message

Roger Sayle Oct. 13, 2021, 8:23 a.m.
Good catch.  I agree with Hongtao that although my testing revealed
no problems with the previous version of this patch, it makes sense to
call gen_reg_rtx to generate an pseudo intermediate instead of attempting
to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
intermediate.  I've left the existing behaviour the same, so that
memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?


2021-10-13  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
	pseudo intermediate when moving a SUBREG into a hard register,
	by checking ix86_hardreg_mov_ok.
	(ix86_expand_vector_extract): Store zero-extended SImode
	intermediate in a pseudo, then set target using a SUBREG_PROMOTED
	annotated subreg.
	* config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
	complex (SUBREG) sets of (vector) hard registers before reload, by
	checking ix86_hardreg_mov_ok.

Thanks,
Roger

-----Original Message-----
From: Hongtao Liu <crazylht@gmail.com> 

Sent: 11 October 2021 12:29
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.

On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> gcc/ChangeLog

>         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a

>         pseudo intermediate when moving a SUBREG into a hard register,

>         by checking ix86_hardreg_mov_ok.


   /* Make operand1 a register if it isn't already.  */
   if (can_create_pseudo_p ()
-      && !register_operand (op0, mode)
-      && !register_operand (op1, mode))
+      && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0, 
+ mode)
+      && !register_operand (op1, mode))))
     {
       rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));

ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.

--
BR,
Hongtao

Comments

Jason Merrill via Gcc-patches Oct. 13, 2021, 9:07 a.m. | #1
On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>

>

> Good catch.  I agree with Hongtao that although my testing revealed

> no problems with the previous version of this patch, it makes sense to

> call gen_reg_rtx to generate an pseudo intermediate instead of attempting

> to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an

> intermediate.  I've left the existing behaviour the same, so that

> memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.

>

> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"

> and "make -k check" with no new failures.

>

> Ok for mainline?

>

>

> 2021-10-13  Roger Sayle  <roger@nextmovesoftware.com>

>

> gcc/ChangeLog

>         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a

>         pseudo intermediate when moving a SUBREG into a hard register,

>         by checking ix86_hardreg_mov_ok.

>         (ix86_expand_vector_extract): Store zero-extended SImode

>         intermediate in a pseudo, then set target using a SUBREG_PROMOTED

>         annotated subreg.

>         * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating

>         complex (SUBREG) sets of (vector) hard registers before reload, by

>         checking ix86_hardreg_mov_ok.


OK.

Thanks,
Uros.

>

> Thanks,

> Roger

>

> -----Original Message-----

> From: Hongtao Liu <crazylht@gmail.com>

> Sent: 11 October 2021 12:29

> To: Roger Sayle <roger@nextmovesoftware.com>

> Cc: GCC Patches <gcc-patches@gcc.gnu.org>

> Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.

>

> On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:

> > gcc/ChangeLog

> >         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a

> >         pseudo intermediate when moving a SUBREG into a hard register,

> >         by checking ix86_hardreg_mov_ok.

>

>    /* Make operand1 a register if it isn't already.  */

>    if (can_create_pseudo_p ()

> -      && !register_operand (op0, mode)

> -      && !register_operand (op1, mode))

> +      && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0,

> + mode)

> +      && !register_operand (op1, mode))))

>      {

>        rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));

>

> ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.

>

> --

> BR,

> Hongtao

>
Jason Merrill via Gcc-patches Oct. 13, 2021, 11:02 p.m. | #2
On Wed, Oct 13, 2021 at 2:08 AM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>

> On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle <roger@nextmovesoftware.com> wrote:

> >

> >

> > Good catch.  I agree with Hongtao that although my testing revealed

> > no problems with the previous version of this patch, it makes sense to

> > call gen_reg_rtx to generate an pseudo intermediate instead of attempting

> > to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an

> > intermediate.  I've left the existing behaviour the same, so that

> > memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.

> >

> > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"

> > and "make -k check" with no new failures.

> >

> > Ok for mainline?

> >

> >

> > 2021-10-13  Roger Sayle  <roger@nextmovesoftware.com>

> >

> > gcc/ChangeLog

> >         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a

> >         pseudo intermediate when moving a SUBREG into a hard register,

> >         by checking ix86_hardreg_mov_ok.

> >         (ix86_expand_vector_extract): Store zero-extended SImode

> >         intermediate in a pseudo, then set target using a SUBREG_PROMOTED

> >         annotated subreg.

> >         * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating

> >         complex (SUBREG) sets of (vector) hard registers before reload, by

> >         checking ix86_hardreg_mov_ok.

>

> OK.

>

> Thanks,

> Uros.

>

> >

> > Thanks,

> > Roger

> >

> > -----Original Message-----

> > From: Hongtao Liu <crazylht@gmail.com>

> > Sent: 11 October 2021 12:29

> > To: Roger Sayle <roger@nextmovesoftware.com>

> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>

> > Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.

> >

> > On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:

> > > gcc/ChangeLog

> > >         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a

> > >         pseudo intermediate when moving a SUBREG into a hard register,

> > >         by checking ix86_hardreg_mov_ok.

> >

> >    /* Make operand1 a register if it isn't already.  */

> >    if (can_create_pseudo_p ()

> > -      && !register_operand (op0, mode)

> > -      && !register_operand (op1, mode))

> > +      && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0,

> > + mode)

> > +      && !register_operand (op1, mode))))

> >      {

> >        rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));

> >

> > ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.

> >

> > --

> > BR,

> > Hongtao

> >


This caused:

https://gcc.gnu.org/pipermail/gcc-regression/2021-October/075498.html

FAIL: gcc.target/i386/avx-1.c (internal compiler error)
FAIL: gcc.target/i386/avx-1.c (test for excess errors)
FAIL: gcc.target/i386/avx-2.c (internal compiler error)
FAIL: gcc.target/i386/avx-2.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/sse-13.c (internal compiler error)
FAIL: gcc.target/i386/sse-13.c (test for excess errors)
FAIL: gcc.target/i386/sse-14.c (internal compiler error)
FAIL: gcc.target/i386/sse-14.c (test for excess errors)
FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
FAIL: gcc.target/i386/sse-22.c (internal compiler error)
FAIL: gcc.target/i386/sse-22.c (test for excess errors)
FAIL: gcc.target/i386/sse-23.c (internal compiler error)
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
FAIL: gcc.target/i386/sse-24.c (internal compiler error)
FAIL: gcc.target/i386/sse-24.c (test for excess errors)
FAIL: gcc.target/i386/sse-25.c (internal compiler error)
FAIL: gcc.target/i386/sse-25.c (test for excess errors)
FAIL: gcc.target/i386/sse-26.c (internal compiler error)
FAIL: gcc.target/i386/sse-26.c (test for excess errors)

You can reproduce them by adding -march=cascadelake to these tests.
-- 
H.J.

Patch

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 3e6f7d8e..4a8fa2f 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -615,6 +615,16 @@  ix86_expand_vector_move (machine_mode mode, rtx operands[])
       return;
     }
 
+  /* If operand0 is a hard register, make operand1 a pseudo.  */
+  if (can_create_pseudo_p ()
+      && !ix86_hardreg_mov_ok (op0, op1))
+    {
+      rtx tmp = gen_reg_rtx (GET_MODE (op0));
+      emit_move_insn (tmp, op1);
+      emit_move_insn (op0, tmp);
+      return;
+    }
+
   /* Make operand1 a register if it isn't already.  */
   if (can_create_pseudo_p ()
       && !register_operand (op0, mode)
@@ -16005,11 +16015,15 @@  ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
       /* Let the rtl optimizers know about the zero extension performed.  */
       if (inner_mode == QImode || inner_mode == HImode)
 	{
+	  rtx reg = gen_reg_rtx (SImode);
 	  tmp = gen_rtx_ZERO_EXTEND (SImode, tmp);
-	  target = gen_lowpart (SImode, target);
+	  emit_move_insn (reg, tmp);
+	  tmp = gen_lowpart (inner_mode, reg);
+	  SUBREG_PROMOTED_VAR_P (tmp) = 1;
+	  SUBREG_PROMOTED_SET (tmp, 1);
 	}
 
-      emit_insn (gen_rtx_SET (target, tmp));
+      emit_move_insn (target, tmp);
     }
   else
     {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4559b0c..e43f597 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1270,7 +1270,8 @@ 
 	 " C,<sseconstm1>,vm,v"))]
   "TARGET_SSE
    && (register_operand (operands[0], <MODE>mode)
-       || register_operand (operands[1], <MODE>mode))"
+       || register_operand (operands[1], <MODE>mode))
+   && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
     {