[v2,1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125]

Message ID 20210909110923.3171630-2-rearnsha@arm.com
State New
Headers show
Series
  • lower more cases of memcpy [PR102125]
Related show

Commit Message

Richard Biener via Gcc-patches Sept. 9, 2021, 11:09 a.m.
gen_lowpart_general handles forming a SUBREG of a MEM by using
adjust_address to rework and validate a new version of the MEM.
However, gen_highpart does not attempt this and simply returns (SUBREG
(MEM)) if the change is not 'obviously' safe.  Improve on that by
using a similar approach so that gen_lowpart and gen_highpart are
mostly symmetrical in this regard.

gcc/ChangeLog:

	PR target/102125
	* emit-rtl.c (gen_highpart): If simplify_gen_subreg returns
	SUBREG (MEM) for a MEM, use adjust_address to produce a new
	MEM.
---
 gcc/emit-rtl.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Richard Biener via Gcc-patches Sept. 9, 2021, 12:23 p.m. | #1
On Thu, Sep 9, 2021 at 1:09 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>

>

> gen_lowpart_general handles forming a SUBREG of a MEM by using

> adjust_address to rework and validate a new version of the MEM.

> However, gen_highpart does not attempt this and simply returns (SUBREG

> (MEM)) if the change is not 'obviously' safe.  Improve on that by

> using a similar approach so that gen_lowpart and gen_highpart are

> mostly symmetrical in this regard.


When I decipher gen_lowpart correctly then it doesn't generate the
subreg of the mem in the first place so doing it like that in gen_highpart
would _not_ invoke simplify_gen_subreg on a MEM_P but instead
do what you now do directly?

I also wonder why gen_lowpart_general uses byte_lowpart_offset
while you use subreg_highpart_offset where subreg_lowpart_offset
is also available ... huh - and there's also
subreg_size_{lowpart,highpart}_offset.
So it looks like your case wouldn't handle the paradoxical highpart
(which better shouldn't be accessed?).

So like

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 77ea8948ee8..c3dae7d8075 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1585,6 +1585,13 @@ gen_highpart (machine_mode mode, rtx x)
   gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)
              || known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));

+  /* Offset MEMs.  */
+  if (MEM_P (x))
+    {
+      poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
+      return adjust_address (x, mode, offset);
+    }
+
   result = simplify_gen_subreg (mode, x, GET_MODE (x),
                                subreg_highpart_offset (mode, GET_MODE (x)));
   gcc_assert (result);

Testing

+  else if (GET_CODE (result) == SUBREG && MEM_P (SUBREG_REG (result))
+          && MEM_P (x))

looks a bit odd to me.

I'll note it leaves gen_highpart_mode "unfixed", some refactoring should
instead commonize the worker for both interfaces, making gen_highpart
invoke gen_highpart_mode or so.

> gcc/ChangeLog:

>

>         PR target/102125

>         * emit-rtl.c (gen_highpart): If simplify_gen_subreg returns

>         SUBREG (MEM) for a MEM, use adjust_address to produce a new

>         MEM.

> ---

>  gcc/emit-rtl.c | 8 ++++++++

>  1 file changed, 8 insertions(+)

>
Richard Biener via Gcc-patches Sept. 9, 2021, 2:39 p.m. | #2
On 09/09/2021 13:23, Richard Biener via Gcc-patches wrote:
> On Thu, Sep 9, 2021 at 1:09 PM Richard Earnshaw <rearnsha@arm.com> wrote:

>>

>>

>> gen_lowpart_general handles forming a SUBREG of a MEM by using

>> adjust_address to rework and validate a new version of the MEM.

>> However, gen_highpart does not attempt this and simply returns (SUBREG

>> (MEM)) if the change is not 'obviously' safe.  Improve on that by

>> using a similar approach so that gen_lowpart and gen_highpart are

>> mostly symmetrical in this regard.

> 

> When I decipher gen_lowpart correctly then it doesn't generate the

> subreg of the mem in the first place so doing it like that in gen_highpart

> would _not_ invoke simplify_gen_subreg on a MEM_P but instead

> do what you now do directly?

> 

> I also wonder why gen_lowpart_general uses byte_lowpart_offset

> while you use subreg_highpart_offset where subreg_lowpart_offset

> is also available ... huh - and there's also

> subreg_size_{lowpart,highpart}_offset.

> So it looks like your case wouldn't handle the paradoxical highpart

> (which better shouldn't be accessed?).

> 


Surely the highpart of a paradoxical subreg is meaningless... what's the 
highpart when the outer subreg is wider than the inner one?

And that's why there is subreg_lowpart_offset, subreg_highpart_offset 
and byte_lowpart_offset, but not byte_highpart_offset (because the 
latter is there to handle paradoxical cases, but decays to 
subreg_lowpart_offset for a non-paradoxical subreg case).

> So like

> 

> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c

> index 77ea8948ee8..c3dae7d8075 100644

> --- a/gcc/emit-rtl.c

> +++ b/gcc/emit-rtl.c

> @@ -1585,6 +1585,13 @@ gen_highpart (machine_mode mode, rtx x)

>     gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)

>                || known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));

> 

> +  /* Offset MEMs.  */

> +  if (MEM_P (x))

> +    {

> +      poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));

> +      return adjust_address (x, mode, offset);

> +    }

> +

>     result = simplify_gen_subreg (mode, x, GET_MODE (x),

>                                  subreg_highpart_offset (mode, GET_MODE (x)));

>     gcc_assert (result);

> 


In which case, I'm pretty certain the subsequent MEM_P (result) test can 
be removed, as I can't see how simplify_gen_subreg would return a MEM 
with such a change.

> Testing

> 

> +  else if (GET_CODE (result) == SUBREG && MEM_P (SUBREG_REG (result))

> +          && MEM_P (x))

> 

> looks a bit odd to me.

> 

> I'll note it leaves gen_highpart_mode "unfixed", some refactoring should

> instead commonize the worker for both interfaces, making gen_highpart

> invoke gen_highpart_mode or so.

> 


gen_highpart_mode invokes gen_highpart if the inner mode is not 
VOIDmode.  Perhaps the logic is somewhat backwards, or perhaps it's just 
a bit more efficient that way.

I'll try your suggested change.

R.

>> gcc/ChangeLog:

>>

>>          PR target/102125

>>          * emit-rtl.c (gen_highpart): If simplify_gen_subreg returns

>>          SUBREG (MEM) for a MEM, use adjust_address to produce a new

>>          MEM.

>> ---

>>   gcc/emit-rtl.c | 8 ++++++++

>>   1 file changed, 8 insertions(+)

>>

Patch

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 77ea8948ee8..bacf6fffa22 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1597,6 +1597,14 @@  gen_highpart (machine_mode mode, rtx x)
       result = validize_mem (result);
       gcc_assert (result);
     }
+  /* It may also just put a SUBREG wrapper around a MEM for the same
+     reason.  */
+  else if (GET_CODE (result) == SUBREG && MEM_P (SUBREG_REG (result))
+	   && MEM_P (x))
+    {
+      poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
+      result = adjust_address (x, mode, offset);
+    }
 
   return result;
 }