middle-end: Recognize/canonicalize MULT_HIGHPART_EXPR and expand it.

Message ID 000701d66a59$3e170d20$ba452760$@nextmovesoftware.com
State New
Headers show
Series
  • middle-end: Recognize/canonicalize MULT_HIGHPART_EXPR and expand it.
Related show

Commit Message

Roger Sayle Aug. 4, 2020, 12:17 p.m.
This middle-end patch teaches fold/match to recognize the idiom for
a highpart multiplication and represent it internally as a
MULT_HIGHPART_EXPR tree code.  At RTL expansion time, the compiler
will trying using an appropriate instruction (sequence) provided
by the backend, but if that fails, this patch now provides a fallback
by synthesizing a suitable sequence using either a widening multiply
or a multiplication in a wider mode [matching the original tree].

The benefit of this internal canonicalization is that it allows GCC
to generate muldi3_highpart instructions even on targets that require
a libcall to perform TImode multiplications.  Currently the RTL
optimizers can recognize highpart multiplications in combine, but
this matching fails when the multiplication requires a libcall.
Rather than attempt to do something via REG_EQUAL_NOTEs, a clever
solution is to make more use of the MULT_HIGHPART_EXPR tree code
in the tree optimizers.

This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check", and on nvptx-none with a "make"
and "make -k check", both with no few failures.  There's an
additional target-specific test in the nvptx patch to support
"mul.hi.s64" and "mul.hi.u64" that I'm just about to post, but
this code is already well exercised during bootstrap by libgcc.

Ok for mainline?


2020-08-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* match.pd (((wide)x * (wide)y)>>C -> mult_highpart): New
	simplification/canonicalization to recognize MULT_HIGHPART_EXPR.
	* optabs.c (expand_mult_highpart_1): New function to expand
	MULT_HIGHPART_EXPR as a widening or a wide multiplication
	followed by a right shift (or a gen_highpart subreg).
	(expand_mult_highpart): Call the above function if the target
	doesn't provide a suitable optab.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-mult-highpart-1.c: New test.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/testsuite/gcc.dg/fold-mult-highpart-1.c b/gcc/testsuite/gcc.dg/fold-mult-highpart-1.c
new file mode 100644
index 0000000..87daebd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-mult-highpart-1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wno-long-long -fdump-tree-optimized" } */
+
+/* Not all platforms support TImode integers.  */
+#if (defined(__LP64__) && !defined(__hppa__)) || defined(_WIN64)
+typedef unsigned int __attribute ((mode(TI))) uti_t;
+typedef int __attribute__ ((mode (TI))) ti_t;
+#else
+typedef unsigned long uti_type;
+typedef long ti_type;
+#endif
+
+short smulhi3_highpart(short x, short y)
+{
+  return ((int)x * (int)y) >> (8*sizeof(short));
+}
+
+int smulsi3_highpart(int x, int y)
+{
+  return ((long)x * (long)y) >> (8*sizeof(int));
+}
+
+long smuldi3_highpart(long x, long y)
+{
+  return ((ti_t)x * (ti_t)y) >> (8*sizeof(long));
+}
+
+unsigned short umulhi3_highpart(unsigned short x, unsigned short y)
+{
+  return ((unsigned int)x * (unsigned int)y) >> (8*sizeof(unsigned short));
+}
+
+unsigned int umulsi3_highpart(unsigned int x, unsigned int y)
+{
+  return ((unsigned long)x * (unsigned long)y) >> (8*sizeof(unsigned int));
+}
+
+unsigned long umuldi3_highpart(unsigned long x, unsigned long y)
+{
+  return ((uti_t)x * (uti_t)y) >> (8*sizeof(unsigned long));
+}
+
+/* { dg-final { scan-tree-dump-times " h\\* " 6 "optimized" } } */
+

Comments

Jakub Jelinek via Gcc-patches Sept. 1, 2020, 7:54 a.m. | #1
On Tue, Aug 4, 2020 at 2:18 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>

>

> This middle-end patch teaches fold/match to recognize the idiom for

> a highpart multiplication and represent it internally as a

> MULT_HIGHPART_EXPR tree code.  At RTL expansion time, the compiler

> will trying using an appropriate instruction (sequence) provided

> by the backend, but if that fails, this patch now provides a fallback

> by synthesizing a suitable sequence using either a widening multiply

> or a multiplication in a wider mode [matching the original tree].

>

> The benefit of this internal canonicalization is that it allows GCC

> to generate muldi3_highpart instructions even on targets that require

> a libcall to perform TImode multiplications.  Currently the RTL

> optimizers can recognize highpart multiplications in combine, but

> this matching fails when the multiplication requires a libcall.

> Rather than attempt to do something via REG_EQUAL_NOTEs, a clever

> solution is to make more use of the MULT_HIGHPART_EXPR tree code

> in the tree optimizers.

>

> This patch has been tested on x86_64-pc-linux-gnu with a "make

> bootstrap" and "make -k check", and on nvptx-none with a "make"

> and "make -k check", both with no few failures.  There's an

> additional target-specific test in the nvptx patch to support

> "mul.hi.s64" and "mul.hi.u64" that I'm just about to post, but

> this code is already well exercised during bootstrap by libgcc.

>

> Ok for mainline?


So currently MULT_HIGHPART_EXPR is one of the operations
that only appear when there's suitable target support (it's
solely used by vectorization it seems).  Is there any benefit
of matching the pattern when the target doesn't support it?
That is, how about guarding it with can_mult_highpart_p ()?
Also note the usual issue of early introduction of such
MULT_HIGHPART_EXPR which isn't widely supported
in optimization passes - it's one of the things I'd rather
do as part of instruction selection before RTL expansion
(where it could also directly synthesize a direct internal
fn for the respective optab).

So do you see any advantage synthesizing those
a) early, b) for targets that do not natively support it?

We now have the ISEL pass and if you write the
pattern as

(match (highpart_multiply @0 @1)
  (convert (rshift (mult:s ( (convert@3 @0) (convert @1))
                  (INTEGER_CST@2)))
  (if (INTEGRAL_TYPE_P (type)
       && INTEGRAL_TYPE_P (TREE_TYPE (@3))
       && types_match (type, TREE_TYPE (@0))
       && types_match (type, TREE_TYPE (@1))
       && (TYPE_PRECISION (TREE_TYPE (@3))
          >= 2 * TYPE_PRECISION (type))

       && tree_fits_uhwi_p (@2)
       && tree_to_uhwi (@2) == TYPE_PRECISION (type)
       && TYPE_SIGN (TREE_TYPE (@3)) == TYPE_SIGN (type))))

you can match that in C++ code doing

 extern bool gimple_highpart_multiply (tree, tree *, tree (*)(tree));
 tree res_ops[2];
 if (gimple_highpart_multiply (gimple_assign_lhs (stmt), &res_ops, NULL)
   {
      res_ops[0] is now @0 and res_ops[1] is @1 and you know 'stmt' computes
      a highpart multiply
   }

Thanks,
Richard.

>

> 2020-08-04  Roger Sayle  <roger@nextmovesoftware.com>

>

> gcc/ChangeLog

>         * match.pd (((wide)x * (wide)y)>>C -> mult_highpart): New

>         simplification/canonicalization to recognize MULT_HIGHPART_EXPR.

>         * optabs.c (expand_mult_highpart_1): New function to expand

>         MULT_HIGHPART_EXPR as a widening or a wide multiplication

>         followed by a right shift (or a gen_highpart subreg).

>         (expand_mult_highpart): Call the above function if the target

>         doesn't provide a suitable optab.

>

> gcc/testsuite/ChangeLog

>         * gcc.dg/fold-mult-highpart-1.c: New test.

>

>

> Thanks in advance,

> Roger

> --

> Roger Sayle

> NextMove Software

> Cambridge, UK

>

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index a052c9e..15c33f2 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6443,3 +6443,18 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    to the number of trailing zeroes.  */
 (match (ctz_table_index @1 @2 @3)
   (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))
+
+/* Recognize MULT_HIGHPART_EXPR.  */
+(simplify
+  (convert (rshift (mult:s (convert@3 @0) (convert @1))
+		   (INTEGER_CST@2)))
+  (if (INTEGRAL_TYPE_P (type)
+       && INTEGRAL_TYPE_P (TREE_TYPE (@3))
+       && types_match (type, TREE_TYPE (@0))
+       && types_match (type, TREE_TYPE (@1))
+       && (TYPE_PRECISION (TREE_TYPE (@3))
+	   >= 2 * TYPE_PRECISION (type))
+       && tree_fits_uhwi_p (@2)
+       && tree_to_uhwi (@2) == TYPE_PRECISION (type)
+       && TYPE_SIGN (TREE_TYPE (@3)) == TYPE_SIGN (type))
+    (mult_highpart @0 @1)))
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 184827f..2416a69 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5870,6 +5870,52 @@  expand_vec_cmp_expr (tree type, tree exp, rtx target)
   return ops[0].value;
 }
 
+/* Helper function of expand_mult_highpart.  Expand a highpart
+   multiplication using a widening or wider multiplication.  */
+
+static rtx
+expand_mult_highpart_1 (machine_mode mode, rtx op0, rtx op1, bool uns_p)
+{
+  scalar_int_mode narrow_mode;
+  rtx tem = NULL_RTX;
+  optab t;
+
+  if (!is_a <scalar_int_mode> (mode, &narrow_mode))
+    return NULL_RTX;
+
+  scalar_int_mode wide_mode = GET_MODE_WIDER_MODE (narrow_mode).require ();
+
+  /* Try a widening multiplication.  */
+  t = uns_p ? umul_widen_optab : smul_widen_optab;
+  if (convert_optab_handler (t, wide_mode, narrow_mode) != CODE_FOR_nothing)
+    tem = expand_binop (wide_mode, t, op0, op1, 0, uns_p, OPTAB_WIDEN);
+
+  /* If that fails, try a wider multiplication.  */
+  if (!tem)
+    {
+      rtx_insn *insns;
+      rtx wop0, wop1;
+      start_sequence();
+      wop0 = convert_modes (wide_mode, narrow_mode, op0, uns_p);
+      wop1 = convert_modes (wide_mode, narrow_mode, op1, uns_p);
+      tem = expand_binop (wide_mode, smul_optab, wop0, wop1, 0,
+			  uns_p, OPTAB_LIB_WIDEN);
+      insns = get_insns ();
+      end_sequence ();
+
+      if (!tem)
+	return NULL_RTX;
+
+      emit_insn (insns);
+    }
+
+  if (narrow_mode == word_mode)
+    return gen_highpart (narrow_mode, tem);
+  tem = expand_shift (RSHIFT_EXPR, wide_mode, tem,
+		     GET_MODE_BITSIZE (narrow_mode), 0, 1);
+  return convert_modes (narrow_mode, wide_mode, tem, 0);
+}
+
 /* Expand a highpart multiply.  */
 
 rtx
@@ -5887,7 +5933,8 @@  expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
   switch (method)
     {
     case 0:
-      return NULL_RTX;
+      /* We don't have an optab, try expanding this the hard way.  */
+      return expand_mult_highpart_1 (mode, op0, op1, uns_p);
     case 1:
       tab1 = uns_p ? umul_highpart_optab : smul_highpart_optab;
       return expand_binop (mode, tab1, op0, op1, target, uns_p,