Fix PR rtl-optimization/84071

Message ID 17052335.8sCGlW3d3s@polaris
State New
Headers show
Series
  • Fix PR rtl-optimization/84071
Related show

Commit Message

Eric Botcazou Jan. 31, 2018, 10:09 a.m.
This is a regression present on mainline and 7 branch for big-endian ARM in 
the form of a wrong elimination of zero-extension after sign-extended load.

The problem is that reg_nonzero_bits_for_combine returns 0xffff for the 
register (reg:HI 121) when queried for SImode after:

(insn 10 7 11 2 (set (subreg:SI (reg:HI 121) 0)
        (sign_extend:SI (mem/c:HI (plus:SI (reg/f:SI 103 afp)
                    (const_int 4 [0x4])) [1 u+0 S2 A32]))) 171 
{*arm_extendhisi2_v6}

That's a valid query since the combiner keeps track of nonzero_bits in the 
largest possible mode:

/* Mode used to compute significance in reg_stat[].nonzero_bits.  It is the
   largest integer mode that can fit in HOST_BITS_PER_WIDE_INT.  */

static machine_mode nonzero_bits_mode;

The root cause of the issue is the SUBREG case of record_dead_and_set_regs_1:

  if (REG_P (dest))
    {
      /* If we are setting the whole register, we know its value.  Otherwise
	 show that we don't know the value.  We can handle SUBREG in
	 some cases.  */
      if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
	record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
      else if (GET_CODE (setter) == SET
	       && GET_CODE (SET_DEST (setter)) == SUBREG
	       && SUBREG_REG (SET_DEST (setter)) == dest
	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
	       && subreg_lowpart_p (SET_DEST (setter)))
	record_value_for_reg (dest, record_dead_insn,
			      gen_lowpart (GET_MODE (dest),
						       SET_SRC (setter)));
      else
	record_value_for_reg (dest, record_dead_insn, NULL_RTX);
    }

It's very old (r357) and effectively synthesizes a wrong SET operation here 
because it strips the sign-extension around the MEM; now rtlanal.c considers 
that loads are extended according to LOAD_EXTEND_OP on RISC platforms and ARM 
is a ZERO_EXTEND target in this configuration (ARMv4 and above), so the sign-
extension is effectively turned into a zero-extension.

This SUBREG case looks quite questionable, not only for paradoxical SUBREGs on 
RISC platforms, but also for regular SUBREGs, for which it's trying to infer 
information on the whole register from a SET of the lowpart.  But the fix only 
changes the first, problematic case.

Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.


2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

	PR rtl-optimization/84071
	* combine.c (record_dead_and_set_regs_1): Record the source unmodified
	for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.


2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

	* gcc.c-torture/execute/20180131-1.c: New test.

-- 
Eric Botcazou

Comments

Eric Botcazou Jan. 31, 2018, 3:02 p.m. | #1
> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.


As discussed in the audit trail, this beefs up the internal documentation 
about WORD_REGISTER_OPERATIONS.

Tested with 'make doc', applied on mainline and 7 branch.


2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

	PR rtl-optimization/84071
	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.
	* doc/tm.texi: Regenerate.

-- 
Eric Botcazou
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 257227)
+++ doc/tm.texi.in	(working copy)
@@ -7376,8 +7376,12 @@ is in effect.
 
 @defmac WORD_REGISTER_OPERATIONS
 Define this macro to 1 if operations between registers with integral mode
-smaller than a word are always performed on the entire register.
-Most RISC machines have this property and most CISC machines do not.
+smaller than a word are always performed on the entire register.  To be
+more explicit, if you start with a pair of @code{word_mode} registers with
+known values and you do a subword, for example @code{QImode}, addition on
+the low part of the registers, then the compiler may consider that the
+result has a known value in @code{word_mode} too if the macro is defined
+to 1.  Most RISC machines have this property and most CISC machines do not.
 @end defmac
 
 @hook TARGET_MIN_ARITHMETIC_PRECISION
Richard Sandiford Jan. 31, 2018, 7:04 p.m. | #2
Eric Botcazou <ebotcazou@adacore.com> writes:
>> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.

>

> As discussed in the audit trail, this beefs up the internal documentation 

> about WORD_REGISTER_OPERATIONS.

>

> Tested with 'make doc', applied on mainline and 7 branch.

>

>

> 2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

>

> 	PR rtl-optimization/84071

> 	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.

> 	* doc/tm.texi: Regenerate.

>

> Index: doc/tm.texi.in

> ===================================================================

> --- doc/tm.texi.in	(revision 257227)

> +++ doc/tm.texi.in	(working copy)

> @@ -7376,8 +7376,12 @@ is in effect.

>

>  @defmac WORD_REGISTER_OPERATIONS

>  Define this macro to 1 if operations between registers with integral mode

> -smaller than a word are always performed on the entire register.

> -Most RISC machines have this property and most CISC machines do not.

> +smaller than a word are always performed on the entire register.  To be

> +more explicit, if you start with a pair of @code{word_mode} registers with

> +known values and you do a subword, for example @code{QImode}, addition on

> +the low part of the registers, then the compiler may consider that the

> +result has a known value in @code{word_mode} too if the macro is defined

> +to 1.  Most RISC machines have this property and most CISC machines do not.


By QImode addition, do you mean:

   (set (subreg:QI (reg:SI X1) N)
   	(plus:QI (subreg:QI (reg:SI X2) N)
        	 (subreg:QI (reg:SI X3) N)))

?  I thought the point was instead that the target expected such ops
to be done on word_mode, even if the values involved are naturally QImode:

   (set (subreg:SI (reg:QI Y1) 0)
   	(plus:SI (subreg:SI (reg:QI Y2) 0)
        	 (subreg:SI (reg:QI Y3) 0)))

Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode
addition patterns, so the first insn seems unlikely.

Thanks,
Richard
Richard Earnshaw (lists) Feb. 1, 2018, 10:14 a.m. | #3
On 31/01/18 19:04, Richard Sandiford wrote:
> Eric Botcazou <ebotcazou@adacore.com> writes:

>>> Tested on SPARC64/Linux and ARM/EABI, applied on mainline and 7 branch.

>>

>> As discussed in the audit trail, this beefs up the internal documentation 

>> about WORD_REGISTER_OPERATIONS.

>>

>> Tested with 'make doc', applied on mainline and 7 branch.

>>

>>

>> 2018-01-31  Eric Botcazou  <ebotcazou@adacore.com>

>>

>> 	PR rtl-optimization/84071

>> 	* doc/tm.texi.in (WORD_REGISTER_OPERATIONS): Add explicit case.

>> 	* doc/tm.texi: Regenerate.

>>

>> Index: doc/tm.texi.in

>> ===================================================================

>> --- doc/tm.texi.in	(revision 257227)

>> +++ doc/tm.texi.in	(working copy)

>> @@ -7376,8 +7376,12 @@ is in effect.

>>

>>  @defmac WORD_REGISTER_OPERATIONS

>>  Define this macro to 1 if operations between registers with integral mode

>> -smaller than a word are always performed on the entire register.

>> -Most RISC machines have this property and most CISC machines do not.

>> +smaller than a word are always performed on the entire register.  To be

>> +more explicit, if you start with a pair of @code{word_mode} registers with

>> +known values and you do a subword, for example @code{QImode}, addition on

>> +the low part of the registers, then the compiler may consider that the

>> +result has a known value in @code{word_mode} too if the macro is defined

>> +to 1.  Most RISC machines have this property and most CISC machines do not.

> 

> By QImode addition, do you mean:

> 

>    (set (subreg:QI (reg:SI X1) N)

>    	(plus:QI (subreg:QI (reg:SI X2) N)

>         	 (subreg:QI (reg:SI X3) N)))

> 

> ?  I thought the point was instead that the target expected such ops

> to be done on word_mode, even if the values involved are naturally QImode:

> 

>    (set (subreg:SI (reg:QI Y1) 0)

>    	(plus:SI (subreg:SI (reg:QI Y2) 0)

>         	 (subreg:SI (reg:QI Y3) 0)))

> 

> Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode

> addition patterns, so the first insn seems unlikely.

> 

> Thanks,

> Richard

> 


That's always been my interpretation too.  Seems like we may be changing
the meaning of this macro...

R.
Eric Botcazou Feb. 2, 2018, 12:06 p.m. | #4
> By QImode addition, do you mean:

> 

>    (set (subreg:QI (reg:SI X1) N)

>    	(plus:QI (subreg:QI (reg:SI X2) N)

>         	 (subreg:QI (reg:SI X3) N)))

> 

> ?


Yes.

> I thought the point was instead that the target expected such ops

> to be done on word_mode, even if the values involved are naturally QImode:

> 

>    (set (subreg:SI (reg:QI Y1) 0)

>    	(plus:SI (subreg:SI (reg:QI Y2) 0)

>         	 (subreg:SI (reg:QI Y3) 0)))


But that's not what the implementation of WORD_REGISTER_OPERATIONS does; IOW 
you'll get the above from expand_binop without the macro if there is no addqi.

> Most RISC/WORD_REGISTER_OPERATIONS targets wouldn't provide QImode

> addition patterns, so the first insn seems unlikely.


Indeed, SImode on 64-bit RISC targets should have been more realistic, but I 
didn't want to give a 64-bit example.

-- 
Eric Botcazou
Eric Botcazou Feb. 2, 2018, 12:21 p.m. | #5
> That's always been my interpretation too.  Seems like we may be changing

> the meaning of this macro...


The main (and essentially only) effect of WORD_REGISTER_OPERATIONS in the 
compiler happens during combine and is explained by this comment taken from 
eliminate_regs_1 and written by Jim in 1998:

#ifdef WORD_REGISTER_OPERATIONS
	   /* On these machines, combine can create rtl of the form
		      (set (subreg:m1 (reg:m2 R) 0) ...)
	      where m1 < m2, and expects something interesting to 
	      happen to the entire word.  Moreover, it will use the
	      (reg:m2 R) later, expecting all bits to be preserved.
	      So if the number of words is the same, preserve the 
	      subreg so that push_reloads can see it.  */
	   && ! ((x_size-1)/UNITS_PER_WORD == (new_size-1) UNITS_PER_WORD)
#endif

-- 
Eric Botcazou
Richard Earnshaw (lists) Feb. 2, 2018, 1:46 p.m. | #6
On 02/02/18 12:21, Eric Botcazou wrote:
>> That's always been my interpretation too.  Seems like we may be changing

>> the meaning of this macro...

> 

> The main (and essentially only) effect of WORD_REGISTER_OPERATIONS in the 

> compiler happens during combine and is explained by this comment taken from 

> eliminate_regs_1 and written by Jim in 1998:

> 

> #ifdef WORD_REGISTER_OPERATIONS

> 	   /* On these machines, combine can create rtl of the form

> 		      (set (subreg:m1 (reg:m2 R) 0) ...)

> 	      where m1 < m2, and expects something interesting to 

> 	      happen to the entire word.  Moreover, it will use the

> 	      (reg:m2 R) later, expecting all bits to be preserved.

> 	      So if the number of words is the same, preserve the 

> 	      subreg so that push_reloads can see it.  */

> 	   && ! ((x_size-1)/UNITS_PER_WORD == (new_size-1) UNITS_PER_WORD)

> #endif

> 



interesting.  So I guess it all comes down to what 'something
interesting' means and whether that has to be consistent for all modes.

Patch

Index: combine.c
===================================================================
--- combine.c	(revision 257139)
+++ combine.c	(working copy)
@@ -13245,18 +13245,25 @@  record_dead_and_set_regs_1 (rtx dest, co
   if (REG_P (dest))
     {
       /* If we are setting the whole register, we know its value.  Otherwise
-	 show that we don't know the value.  We can handle SUBREG in
-	 some cases.  */
+	 show that we don't know the value.  We can handle a SUBREG if it's
+	 the low part, but we must be careful with paradoxical SUBREGs on
+	 RISC architectures because we cannot strip e.g. an extension around
+	 a load and record the naked load since the RTL middle-end considers
+	 that the upper bits are defined according to LOAD_EXTEND_OP.  */
       if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
 	record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
       else if (GET_CODE (setter) == SET
 	       && GET_CODE (SET_DEST (setter)) == SUBREG
 	       && SUBREG_REG (SET_DEST (setter)) == dest
-	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
+	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)),
+			    BITS_PER_WORD)
 	       && subreg_lowpart_p (SET_DEST (setter)))
 	record_value_for_reg (dest, record_dead_insn,
-			      gen_lowpart (GET_MODE (dest),
-						       SET_SRC (setter)));
+			      WORD_REGISTER_OPERATIONS
+			      && paradoxical_subreg_p (SET_DEST (setter))
+			      ? SET_SRC (setter)
+			      : gen_lowpart (GET_MODE (dest),
+					     SET_SRC (setter)));
       else
 	record_value_for_reg (dest, record_dead_insn, NULL_RTX);
     }