Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.

Message ID 20210910043632.1087666-1-hongtao.liu@intel.com
State New
Headers show
Series
  • Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.
Related show

Commit Message

Martin Sebor via Gcc-patches Sept. 10, 2021, 4:36 a.m.
Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
optimizer wouldn't simplify if op0 has different mode with M, but that's too
restrict which will prevent below optimization, the condition can be relaxed
to op0 must have same inner mode with M.

(set (reg:V2DF 87 [ xx ])
    (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 2 [0x2])
                ]))
        (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 3 [0x3])
                ]))))

  Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

	* simplify-rtx.c
	(simplify_context::simplify_binary_operation_1): Relax
	condition of simplifying (vec_concat:M (vec_select op0
	index0)(vec_select op1 index1)) to allow different modes
	between op0 and M, but have same inner mode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-rebuild.c:
	* gcc.target/i386/avx512f-vect-rebuild.c: New test.
---
 gcc/simplify-rtx.c                            |  3 ++-
 .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-
 3 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

-- 
2.27.0

Comments

Martin Sebor via Gcc-patches Sept. 13, 2021, 2:09 p.m. | #1
On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:
>    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),

> optimizer wouldn't simplify if op0 has different mode with M, but that's too

> restrict which will prevent below optimization, the condition can be relaxed

> to op0 must have same inner mode with M.

>

> (set (reg:V2DF 87 [ xx ])

>      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)

>              (parallel [

>                      (const_int 2 [0x2])

>                  ]))

>          (vec_select:DF (reg:V4DF 92)

>              (parallel [

>                      (const_int 3 [0x3])

>                  ]))))

>

>    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.

>    Ok for trunk?

>

> gcc/ChangeLog:

>

> 	* simplify-rtx.c

> 	(simplify_context::simplify_binary_operation_1): Relax

> 	condition of simplifying (vec_concat:M (vec_select op0

> 	index0)(vec_select op1 index1)) to allow different modes

> 	between op0 and M, but have same inner mode.

>

> gcc/testsuite/ChangeLog:

>

> 	* gcc.target/i386/vect-rebuild.c:

> 	* gcc.target/i386/avx512f-vect-rebuild.c: New test.

Funny, I was looking at something rather similar recently, but never 
pushed on it because we were going to need too many entries in the 
parallel selector.

I'm not convinced that we need the inner mode to match anything.  As 
long as the vec_concat's mode is twice the size of the vec_select modes 
and the vec_select mode is <= the mode of its operands ISTM this is 
fine.   We  might want the modes of the vec_select to match, but I don't 
think that's strictly necessary either, they just need to be the same 
size.  ie, we could have somethig like

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

I'm not sure if that level of generality is useful though.  If we want 
the modes of the vec_selects to match I think we could still support

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))

Thoughts?

jeff

Jeff


> ---

>   gcc/simplify-rtx.c                            |  3 ++-

>   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++

>   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-

>   3 files changed, 24 insertions(+), 2 deletions(-)

>   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

>

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c

> index ebad5cb5a79..16286befd79 100644

> --- a/gcc/simplify-rtx.c

> +++ b/gcc/simplify-rtx.c

> @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,

>   	if (GET_CODE (trueop0) == VEC_SELECT

>   	    && GET_CODE (trueop1) == VEC_SELECT

>   	    && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))

> -	    && GET_MODE (XEXP (trueop0, 0)) == mode)

> +	    && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))

> +	       == GET_MODE_INNER(mode))

>   	  {

>   	    rtx par0 = XEXP (trueop0, 1);

>   	    rtx par1 = XEXP (trueop1, 1);

> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> new file mode 100644

> index 00000000000..aef6855aa46

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> @@ -0,0 +1,21 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */

> +

> +typedef double v2df __attribute__ ((__vector_size__ (16)));

> +typedef double v4df __attribute__ ((__vector_size__ (32)));

> +

> +v2df h (v4df x)

> +{

> +  v2df xx = { x[2], x[3] };

> +  return xx;

> +}

> +

> +v4df f2 (v4df x)

> +{

> +  v4df xx = { x[0], x[1], x[2], x[3] };

> +  return xx;

> +}

> +

> +/* { dg-final { scan-assembler-not "unpck" } } */

> +/* { dg-final { scan-assembler-not "valign" } } */

> +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */

> diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> index 570967f6b5c..8e85b98bf1d 100644

> --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> @@ -30,4 +30,4 @@ v2df h (v4df x)

>   

>   /* { dg-final { scan-assembler-not "unpck" } } */

>   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */

> -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */

> +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
Martin Sebor via Gcc-patches Sept. 13, 2021, 2:24 p.m. | #2
On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>

>

>

> On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:

> >    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),

> > optimizer wouldn't simplify if op0 has different mode with M, but that's too

> > restrict which will prevent below optimization, the condition can be relaxed

> > to op0 must have same inner mode with M.

> >

> > (set (reg:V2DF 87 [ xx ])

> >      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)

> >              (parallel [

> >                      (const_int 2 [0x2])

> >                  ]))

> >          (vec_select:DF (reg:V4DF 92)

> >              (parallel [

> >                      (const_int 3 [0x3])

> >                  ]))))

> >

> >    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.

> >    Ok for trunk?

> >

> > gcc/ChangeLog:

> >

> >       * simplify-rtx.c

> >       (simplify_context::simplify_binary_operation_1): Relax

> >       condition of simplifying (vec_concat:M (vec_select op0

> >       index0)(vec_select op1 index1)) to allow different modes

> >       between op0 and M, but have same inner mode.

> >

> > gcc/testsuite/ChangeLog:

> >

> >       * gcc.target/i386/vect-rebuild.c:

> >       * gcc.target/i386/avx512f-vect-rebuild.c: New test.

> Funny, I was looking at something rather similar recently, but never

> pushed on it because we were going to need too many entries in the

> parallel selector.

>

> I'm not convinced that we need the inner mode to match anything.  As

> long as the vec_concat's mode is twice the size of the vec_select modes

> and the vec_select mode is <= the mode of its operands ISTM this is

> fine.   We  might want the modes of the vec_select to match, but I don't

> think that's strictly necessary either, they just need to be the same

> size.  ie, we could have somethig like

>

> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

>

> I'm not sure if that level of generality is useful though.  If we want

> the modes of the vec_selects to match I think we could still support

>

> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))

>

> Thoughts?


I think the component or scalar modes of the elements to concat need to match
the component mode of the result.  I don't think you example involving
a cat of DF and DI is too useful - but you could use a subreg around the DI
value ;)

Richard.

>

> jeff

>

> Jeff

>

>

> > ---

> >   gcc/simplify-rtx.c                            |  3 ++-

> >   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++

> >   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-

> >   3 files changed, 24 insertions(+), 2 deletions(-)

> >   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> >

> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c

> > index ebad5cb5a79..16286befd79 100644

> > --- a/gcc/simplify-rtx.c

> > +++ b/gcc/simplify-rtx.c

> > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,

> >       if (GET_CODE (trueop0) == VEC_SELECT

> >           && GET_CODE (trueop1) == VEC_SELECT

> >           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))

> > -         && GET_MODE (XEXP (trueop0, 0)) == mode)

> > +         && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))

> > +            == GET_MODE_INNER(mode))

> >         {

> >           rtx par0 = XEXP (trueop0, 1);

> >           rtx par1 = XEXP (trueop1, 1);

> > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> > new file mode 100644

> > index 00000000000..aef6855aa46

> > --- /dev/null

> > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> > @@ -0,0 +1,21 @@

> > +/* { dg-do compile } */

> > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */

> > +

> > +typedef double v2df __attribute__ ((__vector_size__ (16)));

> > +typedef double v4df __attribute__ ((__vector_size__ (32)));

> > +

> > +v2df h (v4df x)

> > +{

> > +  v2df xx = { x[2], x[3] };

> > +  return xx;

> > +}

> > +

> > +v4df f2 (v4df x)

> > +{

> > +  v4df xx = { x[0], x[1], x[2], x[3] };

> > +  return xx;

> > +}

> > +

> > +/* { dg-final { scan-assembler-not "unpck" } } */

> > +/* { dg-final { scan-assembler-not "valign" } } */

> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */

> > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > index 570967f6b5c..8e85b98bf1d 100644

> > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > @@ -30,4 +30,4 @@ v2df h (v4df x)

> >

> >   /* { dg-final { scan-assembler-not "unpck" } } */

> >   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */

> > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */

> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */

>
Martin Sebor via Gcc-patches Sept. 13, 2021, 3:09 p.m. | #3
On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>

>

>

> On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:

> >    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),

> > optimizer wouldn't simplify if op0 has different mode with M, but that's too

> > restrict which will prevent below optimization, the condition can be relaxed

> > to op0 must have same inner mode with M.

> >

> > (set (reg:V2DF 87 [ xx ])

> >      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)

> >              (parallel [

> >                      (const_int 2 [0x2])

> >                  ]))

> >          (vec_select:DF (reg:V4DF 92)

> >              (parallel [

> >                      (const_int 3 [0x3])

> >                  ]))))

> >

> >    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.

> >    Ok for trunk?

> >

> > gcc/ChangeLog:

> >

> >       * simplify-rtx.c

> >       (simplify_context::simplify_binary_operation_1): Relax

> >       condition of simplifying (vec_concat:M (vec_select op0

> >       index0)(vec_select op1 index1)) to allow different modes

> >       between op0 and M, but have same inner mode.

> >

> > gcc/testsuite/ChangeLog:

> >

> >       * gcc.target/i386/vect-rebuild.c:

> >       * gcc.target/i386/avx512f-vect-rebuild.c: New test.

> Funny, I was looking at something rather similar recently, but never

> pushed on it because we were going to need too many entries in the

> parallel selector.

>

> I'm not convinced that we need the inner mode to match anything.  As

> long as the vec_concat's mode is twice the size of the vec_select modes

> and the vec_select mode is <= the mode of its operands ISTM this is

> fine.   We  might want the modes of the vec_select to match, but I don't

> think that's strictly necessary either, they just need to be the same

> size.  ie, we could have somethig like

>

> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

>

> I'm not sure if that level of generality is useful though.  If we want

> the modes of the vec_selects to match I think we could still support

>

> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))

The first operand of vec_select is required to be the same here. and I
guess (vec_select:DF (subreg:V4DF (reg:V8DF) 0) will be simplified to
(vec_select:DF (reg:V8DF))?
>

> Thoughts?

>

> jeff

>

> Jeff

>

>

> > ---

> >   gcc/simplify-rtx.c                            |  3 ++-

> >   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++

> >   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-

> >   3 files changed, 24 insertions(+), 2 deletions(-)

> >   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> >

> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c

> > index ebad5cb5a79..16286befd79 100644

> > --- a/gcc/simplify-rtx.c

> > +++ b/gcc/simplify-rtx.c

> > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,

> >       if (GET_CODE (trueop0) == VEC_SELECT

> >           && GET_CODE (trueop1) == VEC_SELECT

> >           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))

> > -         && GET_MODE (XEXP (trueop0, 0)) == mode)

> > +         && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))

> > +            == GET_MODE_INNER(mode))

> >         {

> >           rtx par0 = XEXP (trueop0, 1);

> >           rtx par1 = XEXP (trueop1, 1);

> > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> > new file mode 100644

> > index 00000000000..aef6855aa46

> > --- /dev/null

> > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> > @@ -0,0 +1,21 @@

> > +/* { dg-do compile } */

> > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */

> > +

> > +typedef double v2df __attribute__ ((__vector_size__ (16)));

> > +typedef double v4df __attribute__ ((__vector_size__ (32)));

> > +

> > +v2df h (v4df x)

> > +{

> > +  v2df xx = { x[2], x[3] };

> > +  return xx;

> > +}

> > +

> > +v4df f2 (v4df x)

> > +{

> > +  v4df xx = { x[0], x[1], x[2], x[3] };

> > +  return xx;

> > +}

> > +

> > +/* { dg-final { scan-assembler-not "unpck" } } */

> > +/* { dg-final { scan-assembler-not "valign" } } */

> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */

> > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > index 570967f6b5c..8e85b98bf1d 100644

> > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > @@ -30,4 +30,4 @@ v2df h (v4df x)

> >

> >   /* { dg-final { scan-assembler-not "unpck" } } */

> >   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */

> > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */

> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */

>



-- 
BR,
Hongtao
Martin Sebor via Gcc-patches Sept. 13, 2021, 3:19 p.m. | #4
On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>

>

>

> On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:

> >    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),

> > optimizer wouldn't simplify if op0 has different mode with M, but that's too

> > restrict which will prevent below optimization, the condition can be relaxed

> > to op0 must have same inner mode with M.

> >

> > (set (reg:V2DF 87 [ xx ])

> >      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)

> >              (parallel [

> >                      (const_int 2 [0x2])

> >                  ]))

> >          (vec_select:DF (reg:V4DF 92)

> >              (parallel [

> >                      (const_int 3 [0x3])

> >                  ]))))

> >

> >    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.

> >    Ok for trunk?

> >

> > gcc/ChangeLog:

> >

> >       * simplify-rtx.c

> >       (simplify_context::simplify_binary_operation_1): Relax

> >       condition of simplifying (vec_concat:M (vec_select op0

> >       index0)(vec_select op1 index1)) to allow different modes

> >       between op0 and M, but have same inner mode.

> >

> > gcc/testsuite/ChangeLog:

> >

> >       * gcc.target/i386/vect-rebuild.c:

> >       * gcc.target/i386/avx512f-vect-rebuild.c: New test.

> Funny, I was looking at something rather similar recently, but never

> pushed on it because we were going to need too many entries in the

> parallel selector.

>

> I'm not convinced that we need the inner mode to match anything.  As

> long as the vec_concat's mode is twice the size of the vec_select modes

> and the vec_select mode is <= the mode of its operands ISTM this is

> fine.   We  might want the modes of the vec_select to match, but I don't

> think that's strictly necessary either, they just need to be the same

> size.  ie, we could have somethig like

If they're different sizes, i.e, something like below should also be legal?
(vec_concat:V8SF (vec_select:V2SF (reg:V16SF)) (vec_select:V6SF (reg:V16SF)))
>

> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

>

> I'm not sure if that level of generality is useful though.  If we want

> the modes of the vec_selects to match I think we could still support

>

> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))

>

> Thoughts?

>

> jeff

>

> Jeff

>

>

> > ---

> >   gcc/simplify-rtx.c                            |  3 ++-

> >   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++

> >   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-

> >   3 files changed, 24 insertions(+), 2 deletions(-)

> >   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> >

> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c

> > index ebad5cb5a79..16286befd79 100644

> > --- a/gcc/simplify-rtx.c

> > +++ b/gcc/simplify-rtx.c

> > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,

> >       if (GET_CODE (trueop0) == VEC_SELECT

> >           && GET_CODE (trueop1) == VEC_SELECT

> >           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))

> > -         && GET_MODE (XEXP (trueop0, 0)) == mode)

> > +         && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))

> > +            == GET_MODE_INNER(mode))

> >         {

> >           rtx par0 = XEXP (trueop0, 1);

> >           rtx par1 = XEXP (trueop1, 1);

> > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> > new file mode 100644

> > index 00000000000..aef6855aa46

> > --- /dev/null

> > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

> > @@ -0,0 +1,21 @@

> > +/* { dg-do compile } */

> > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */

> > +

> > +typedef double v2df __attribute__ ((__vector_size__ (16)));

> > +typedef double v4df __attribute__ ((__vector_size__ (32)));

> > +

> > +v2df h (v4df x)

> > +{

> > +  v2df xx = { x[2], x[3] };

> > +  return xx;

> > +}

> > +

> > +v4df f2 (v4df x)

> > +{

> > +  v4df xx = { x[0], x[1], x[2], x[3] };

> > +  return xx;

> > +}

> > +

> > +/* { dg-final { scan-assembler-not "unpck" } } */

> > +/* { dg-final { scan-assembler-not "valign" } } */

> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */

> > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > index 570967f6b5c..8e85b98bf1d 100644

> > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c

> > @@ -30,4 +30,4 @@ v2df h (v4df x)

> >

> >   /* { dg-final { scan-assembler-not "unpck" } } */

> >   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */

> > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */

> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */

>



-- 
BR,
Hongtao

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ebad5cb5a79..16286befd79 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4587,7 +4587,8 @@  simplify_context::simplify_binary_operation_1 (rtx_code code,
 	if (GET_CODE (trueop0) == VEC_SELECT
 	    && GET_CODE (trueop1) == VEC_SELECT
 	    && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
-	    && GET_MODE (XEXP (trueop0, 0)) == mode)
+	    && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
+	       == GET_MODE_INNER(mode))
 	  {
 	    rtx par0 = XEXP (trueop0, 1);
 	    rtx par1 = XEXP (trueop1, 1);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
new file mode 100644
index 00000000000..aef6855aa46
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
+
+typedef double v2df __attribute__ ((__vector_size__ (16)));
+typedef double v4df __attribute__ ((__vector_size__ (32)));
+
+v2df h (v4df x)
+{
+  v2df xx = { x[2], x[3] };
+  return xx;
+}
+
+v4df f2 (v4df x)
+{
+  v4df xx = { x[0], x[1], x[2], x[3] };
+  return xx;
+}
+
+/* { dg-final { scan-assembler-not "unpck" } } */
+/* { dg-final { scan-assembler-not "valign" } } */
+/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
index 570967f6b5c..8e85b98bf1d 100644
--- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
+++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
@@ -30,4 +30,4 @@  v2df h (v4df x)
 
 /* { dg-final { scan-assembler-not "unpck" } } */
 /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
-/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
+/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */