[committed] middle-end: Optimize (A&C)^(B&C) to (A^B)&C in simplify_rtx (take 3).

Message ID 00ca01d64e35$59cb3090$0d6191b0$@nextmovesoftware.com
State New
Headers show
Series
  • [committed] middle-end: Optimize (A&C)^(B&C) to (A^B)&C in simplify_rtx (take 3).
Related show

Commit Message

Roger Sayle June 29, 2020, 4:50 p.m.
Hi Richard,

Many thanks for the review.  Many thanks also to Hans-Peter Nilsson, Joseph Myers
and overseers@gcc.gnu.org for helping get my ssh keys updated.  I've taken the opportunity
of committing this patch to check that everything is working.  For the record, here's the
final version as committed.  I've added the (xor (ashiftrt x c) (ashiftrt y c)) case as per your
suggestion, which fires 6 times during make -k check on x86_64-pc-linux-gnu.

Cheers,
Roger
--

-----Original Message-----
From: Richard Sandiford <richard.sandiford@arm.com> 

Sent: 22 June 2020 20:41
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH take 2] middle-end: Optimize (A&C)^(B&C) to (A^B)&C in simplify_rtx.

Hi Roger,

Thanks for the update and sorry for the slow reply.

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> As suggested by Richard Sandiford, this patch splits out the previous 

> RTL simplification, of (X&C)^(Y&C) to (X^Y)&C, to its own function, 

> simplify_distributive_operation, and calls it when appropriate for 

> IOR, XOR and AND.

>

> Instrumenting a bootstrap reveals this optimization triggers

> 393358 times during stage2, stage3 and building the libraries, and a 

> further 263659 times during make check.  By order of occurrence the 

> RTL transformation opportunities are:

>

>  284447 01 and ior

>  156798 00 xor and

>  131110 11 and ior

>   47253 00 and ior

>   28035 00 ior and

>    2804 01 ior and

>    2698 11 ior and

>    2262 01 xor and

>     602 11 xor and

>     312 00 xor xor

>     298 00 xor rotate

>     120 00 and ashift

>     108 00 ior lshiftrt

>      60 00 and ashiftrt

>      54 00 ior ashift

>      18 00 and lshiftrt

>      12 10 xor and

>      12 00 xor rotatert

>       8 00 xor lshiftrt

>       4 10 ior and

>       2 00 ior ashiftrt


That's an impressive number of hits :-)

> where the two binary digits denote the surviving inner unique 

> operands, so "00 xor and" corresponds to the original (xor (and X Z) 

> (and Y Z)), and "01 and ior" corresponds to (and (ior X Y) (ior Y Z)).

>

> Many thanks also to Richard for pointing out simplify_rtx_c_tests, the 

> self-testing framework in simplify-rtx.c, which is new since my day.

> This patch supplements the existing vector mode testing, with a suite 

> of scalar integer mode tests that confirm that many of the expected 

> integer simplifications in simplify-rtx are being applied as expected.

> This includes three tests of the new simplify_distributive_operation.

>

> Before:

> xgcc -xc -fself-test: 59693 pass(es) in 0.820956 seconds xgcc -xc++ 

> -fself-test: 59723 pass(es) in 0.786662 seconds

> After:

> xgcc -xc -fself-test: 60003 pass(es) in 0.860637 seconds xgcc -xc++ 

> -fself-test: 60033 pass(es) in 0.794624 seconds

>

>

> I do have one thought/suggestion around test_scalar_ops for future 

> generations.  These tests are extremely strict; instead of an 

> unexpected failure in the testsuite, breaking a self-test stops the 

> build.  Instead of reverting this patch, should anything go wrong (in 

> future on a misbehaving platform), might I instead propose simply 

> commenting out the call to test_scalar_ops in simplify_rtx_c_tests as 

> a mitigation strategy whilst the build is restored.  In fact, removing 

> the "static" from test_scalar_ops would avoid the "defined but not 

> used" complication from this disaster recovery plan.


Yeah, we can work around it rather than revert the patch.

> This patch has been tested with "make bootstrap" and "make -k check"

> on x86_64-pc-linux-gnu with no regressions.

>

>

> 2020-06-16  Roger Sayle  <roger@nextmovesoftware.com>

>             Richard Sandiford  <richard.sandiford@arm.com>


Thanks for the gesture, but I don't think I should be co-author here.
I didn't write anything :-)

> […]

> @@ -3064,6 +3112,21 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,

>  	    }

>  	}

>  

> +      /* Convert (ior (and A C) (and B C)) into (and (ior A B) C).  */

> +      if (GET_CODE (op0) == GET_CODE (op1)

> +	  && (GET_CODE (op0) == AND

> +	      || GET_CODE (op0) == IOR

> +	      || GET_CODE (op0) == LSHIFTRT

> +	      || GET_CODE (op0) == ASHIFTRT

> +	      || GET_CODE (op0) == ASHIFT

> +	      || GET_CODE (op0) == ROTATE

> +	      || GET_CODE (op0) == ROTATERT))

> +	{

> +	  tem = simplify_distributive_operation (code, mode, op0, op1);

> +	  if (tem)

> +	    return tem;

> +	}

> +

>        tem = simplify_byte_swapping_operation (code, mode, op0, op1);

>        if (tem)

>  	return tem;

> @@ -3302,6 +3365,20 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,

>  	  && (reversed = reversed_comparison (op0, int_mode)))

>  	return reversed;

>  

> +      /* Convert (xor (and A C) (and B C)) into (and (xor A B) C).  */

> +      if (GET_CODE (op0) == GET_CODE (op1)

> +	  && (GET_CODE (op0) == AND

> +	      || GET_CODE (op0) == XOR

> +	      || GET_CODE (op0) == LSHIFTRT

> +	      || GET_CODE (op0) == ASHIFT

> +	      || GET_CODE (op0) == ROTATE

> +	      || GET_CODE (op0) == ROTATERT))

> +	{

> +	  tem = simplify_distributive_operation (code, mode, op0, op1);

> +	  if (tem)

> +	    return tem;

> +	}

> +

>        tem = simplify_byte_swapping_operation (code, mode, op0, op1);

>        if (tem)

>  	return tem;

> @@ -3500,6 +3577,21 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,

>  	  && rtx_equal_p (op1, XEXP (XEXP (op0, 1), 0)))

>  	return simplify_gen_binary (AND, mode, op1, XEXP (op0, 0));

>  

> +      /* Convert (and (ior A C) (ior B C)) into (ior (and A B) C).  */

> +      if (GET_CODE (op0) == GET_CODE (op1)

> +	  && (GET_CODE (op0) == AND

> +	      || GET_CODE (op0) == IOR

> +	      || GET_CODE (op0) == LSHIFTRT

> +	      || GET_CODE (op0) == ASHIFTRT

> +	      || GET_CODE (op0) == ASHIFT

> +	      || GET_CODE (op0) == ROTATE

> +	      || GET_CODE (op0) == ROTATERT))

> +	{

> +	  tem = simplify_distributive_operation (code, mode, op0, op1);

> +	  if (tem)

> +	    return tem;

> +	}

> +

>        tem = simplify_byte_swapping_operation (code, mode, op0, op1);

>        if (tem)

>  	return tem;


Any reason to exclude ASHIFTRT from the XOR case?  AFAICT it distributes in the same way: for equal shifts, the ASHIFTRT reproduces the same number of sign bits, which are handled by all logical ops in the same way as the original sign bit.  Thus doing the logical OP first on the original sign bit and then shifting will have the same effect.

LGTM otherwise, thanks.  If you can update the patch then I'll push it.

Richard

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 28c2dc6..cfb338e 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2392,6 +2392,54 @@  simplify_binary_operation_series (rtx_code code, machine_mode mode,
   return gen_vec_series (mode, new_base, new_step);
 }
 
+/* Subroutine of simplify_binary_operation_1.  Un-distribute a binary
+   operation CODE with result mode MODE, operating on OP0 and OP1.
+   e.g. simplify (xor (and A C) (and (B C)) to (and (xor (A B) C).
+   Returns NULL_RTX if no simplification is possible.  */
+
+static rtx
+simplify_distributive_operation (enum rtx_code code, machine_mode mode,
+				 rtx op0, rtx op1)
+{
+  enum rtx_code op = GET_CODE (op0);
+  gcc_assert (GET_CODE (op1) == op);
+
+  if (rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1))
+      && ! side_effects_p (XEXP (op0, 1)))
+    return simplify_gen_binary (op, mode,
+				simplify_gen_binary (code, mode,
+						     XEXP (op0, 0),
+						     XEXP (op1, 0)),
+				XEXP (op0, 1));
+
+  if (GET_RTX_CLASS (op) == RTX_COMM_ARITH)
+    {
+      if (rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
+	  && ! side_effects_p (XEXP (op0, 0)))
+	return simplify_gen_binary (op, mode,
+				    simplify_gen_binary (code, mode,
+							 XEXP (op0, 1),
+							 XEXP (op1, 1)),
+				    XEXP (op0, 0));
+      if (rtx_equal_p (XEXP (op0, 0), XEXP (op1, 1))
+	  && ! side_effects_p (XEXP (op0, 0)))
+	return simplify_gen_binary (op, mode,
+				    simplify_gen_binary (code, mode,
+							 XEXP (op0, 1),
+							 XEXP (op1, 0)),
+				    XEXP (op0, 0));
+      if (rtx_equal_p (XEXP (op0, 1), XEXP (op1, 0))
+	  && ! side_effects_p (XEXP (op0, 1)))
+	return simplify_gen_binary (op, mode,
+				    simplify_gen_binary (code, mode,
+							 XEXP (op0, 0),
+							 XEXP (op1, 1)),
+				    XEXP (op0, 1));
+    }
+
+  return NULL_RTX;
+}
+
 /* Subroutine of simplify_binary_operation.  Simplify a binary operation
    CODE with result mode MODE, operating on OP0 and OP1.  If OP0 and/or
    OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the
@@ -3064,6 +3112,21 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	    }
 	}
 
+      /* Convert (ior (and A C) (and B C)) into (and (ior A B) C).  */
+      if (GET_CODE (op0) == GET_CODE (op1)
+	  && (GET_CODE (op0) == AND
+	      || GET_CODE (op0) == IOR
+	      || GET_CODE (op0) == LSHIFTRT
+	      || GET_CODE (op0) == ASHIFTRT
+	      || GET_CODE (op0) == ASHIFT
+	      || GET_CODE (op0) == ROTATE
+	      || GET_CODE (op0) == ROTATERT))
+	{
+	  tem = simplify_distributive_operation (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
+
       tem = simplify_byte_swapping_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
@@ -3302,6 +3365,21 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	  && (reversed = reversed_comparison (op0, int_mode)))
 	return reversed;
 
+      /* Convert (xor (and A C) (and B C)) into (and (xor A B) C).  */
+      if (GET_CODE (op0) == GET_CODE (op1)
+	  && (GET_CODE (op0) == AND
+	      || GET_CODE (op0) == XOR
+	      || GET_CODE (op0) == LSHIFTRT
+	      || GET_CODE (op0) == ASHIFTRT
+	      || GET_CODE (op0) == ASHIFT
+	      || GET_CODE (op0) == ROTATE
+	      || GET_CODE (op0) == ROTATERT))
+	{
+	  tem = simplify_distributive_operation (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
+
       tem = simplify_byte_swapping_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
@@ -3500,6 +3578,21 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	  && rtx_equal_p (op1, XEXP (XEXP (op0, 1), 0)))
 	return simplify_gen_binary (AND, mode, op1, XEXP (op0, 0));
 
+      /* Convert (and (ior A C) (ior B C)) into (ior (and A B) C).  */
+      if (GET_CODE (op0) == GET_CODE (op1)
+	  && (GET_CODE (op0) == AND
+	      || GET_CODE (op0) == IOR
+	      || GET_CODE (op0) == LSHIFTRT
+	      || GET_CODE (op0) == ASHIFTRT
+	      || GET_CODE (op0) == ASHIFT
+	      || GET_CODE (op0) == ROTATE
+	      || GET_CODE (op0) == ROTATERT))
+	{
+	  tem = simplify_distributive_operation (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
+
       tem = simplify_byte_swapping_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
@@ -7218,6 +7311,81 @@  make_test_reg (machine_mode mode)
   return gen_rtx_REG (mode, test_reg_num++);
 }
 
+static void
+test_scalar_int_ops (machine_mode mode)
+{
+  rtx op0 = make_test_reg (mode);
+  rtx op1 = make_test_reg (mode);
+  rtx six = GEN_INT (6);
+
+  rtx neg_op0 = simplify_gen_unary (NEG, mode, op0, mode);
+  rtx not_op0 = simplify_gen_unary (NOT, mode, op0, mode);
+  rtx bswap_op0 = simplify_gen_unary (BSWAP, mode, op0, mode);
+
+  rtx and_op0_op1 = simplify_gen_binary (AND, mode, op0, op1);
+  rtx ior_op0_op1 = simplify_gen_binary (IOR, mode, op0, op1);
+  rtx xor_op0_op1 = simplify_gen_binary (XOR, mode, op0, op1);
+
+  rtx and_op0_6 = simplify_gen_binary (AND, mode, op0, six);
+  rtx and_op1_6 = simplify_gen_binary (AND, mode, op1, six);
+
+  /* Test some binary identities.  */
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (PLUS, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (PLUS, mode, const0_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (MINUS, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (MULT, mode, op0, const1_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (MULT, mode, const1_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (DIV, mode, op0, const1_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (AND, mode, op0, constm1_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (AND, mode, constm1_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (IOR, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (IOR, mode, const0_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (XOR, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (XOR, mode, const0_rtx, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ASHIFT, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ROTATE, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ASHIFTRT, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (LSHIFTRT, mode, op0, const0_rtx));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (ROTATERT, mode, op0, const0_rtx));
+
+  /* Test some self-inverse operations.  */
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (NEG, mode, neg_op0, mode));
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (NOT, mode, not_op0, mode));
+  ASSERT_RTX_EQ (op0, simplify_gen_unary (BSWAP, mode, bswap_op0, mode));
+
+  /* Test some reflexive operations.  */
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (AND, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (IOR, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (SMIN, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (SMAX, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (UMIN, mode, op0, op0));
+  ASSERT_RTX_EQ (op0, simplify_gen_binary (UMAX, mode, op0, op0));
+
+  ASSERT_RTX_EQ (const0_rtx, simplify_gen_binary (MINUS, mode, op0, op0));
+  ASSERT_RTX_EQ (const0_rtx, simplify_gen_binary (XOR, mode, op0, op0));
+
+  /* Test simplify_distributive_operation.  */
+  ASSERT_RTX_EQ (simplify_gen_binary (AND, mode, xor_op0_op1, six),
+		 simplify_gen_binary (XOR, mode, and_op0_6, and_op1_6));
+  ASSERT_RTX_EQ (simplify_gen_binary (AND, mode, ior_op0_op1, six),
+		 simplify_gen_binary (IOR, mode, and_op0_6, and_op1_6));
+  ASSERT_RTX_EQ (simplify_gen_binary (AND, mode, and_op0_op1, six),
+		 simplify_gen_binary (AND, mode, and_op0_6, and_op1_6));
+}
+
+/* Verify some simplifications involving scalar expressions.  */
+
+static void
+test_scalar_ops ()
+{
+  for (unsigned int i = 0; i < NUM_MACHINE_MODES; ++i)
+    {
+      machine_mode mode = (machine_mode) i;
+      if (SCALAR_INT_MODE_P (mode) && mode != BImode)
+	test_scalar_int_ops (mode);
+    }
+}
+
 /* Test vector simplifications involving VEC_DUPLICATE in which the
    operands and result have vector mode MODE.  SCALAR_REG is a pseudo
    register that holds one element of MODE.  */
@@ -7735,6 +7903,7 @@  simplify_const_poly_int_tests<N>::run ()
 void
 simplify_rtx_c_tests ()
 {
+  test_scalar_ops ();
   test_vector_ops ();
   simplify_const_poly_int_tests<NUM_POLY_INT_COEFFS>::run ();
 }