match.pd: Restrict clz cmp 0 replacement by single_use, PR99142

Message ID 20210217233334.CDEA4203D2@pchp3.se.axis.com
State New
Headers show
Series
  • match.pd: Restrict clz cmp 0 replacement by single_use, PR99142
Related show

Commit Message

Jason Merrill via Gcc-patches Feb. 17, 2021, 11:33 p.m.
If we're not going to eliminate the clz, it's better for the
comparison to use that result than its input, so we don't
extend the lifetime of the input.  Also, an additional use
of the result is more likely cheaper than a compare of the
input, in particular considering that the clz may have made
available a non-zero condition matching the original use.
The "s" modifier doesn't stop this situation, as the
transformation wouldn't result in "an expression with more
than one operator"; a gating single_use condition on the
result is necessary.

Regtested cris-elf and x86_64-linux.
Ok to commit?

gcc:
	PR tree-optimization/99142
	* match.pd (clz cmp 0): Gate replacement on single_use of clz result.

gcc/testsuite:
	PR tree-optimization/99142
	* testsuite/gcc.dg/tree-ssa/pr99142.c: New test.
---
 gcc/match.pd                            |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/pr99142.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99142.c

-- 
2.11.0

Comments

Jason Merrill via Gcc-patches Feb. 18, 2021, 10:12 a.m. | #1
On Thu, Feb 18, 2021 at 12:35 AM Hans-Peter Nilsson via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>

> If we're not going to eliminate the clz, it's better for the

> comparison to use that result than its input, so we don't

> extend the lifetime of the input.  Also, an additional use

> of the result is more likely cheaper than a compare of the

> input, in particular considering that the clz may have made

> available a non-zero condition matching the original use.


Of course the non-zero compare can execute concurrently
with the clz if we use the input which can result in faster
operation.  It's one of the many cases where local folding
doesn't tell us enough.  One of the usual issues with
single_use checks is that the IL is not fully DCEd while
we fold it and thus dead uses can make sth !single_use
even though it really is single_use.

Also if there's a use of the input live at the point of the
compare we want to replace it's likely better to do the
transform.

> The "s" modifier doesn't stop this situation, as the

> transformation wouldn't result in "an expression with more

> than one operator"; a gating single_use condition on the

> result is necessary.


Note that :s was done this way for the use in VN where
we can this way CSE c != 0 with an earlier a >= 0.

It might be an interesting idea to allow match.pd consumers
to "switch" behavior of :s according to their needs.

> Regtested cris-elf and x86_64-linux.

> Ok to commit?


OK.

Thanks,
Richard.

> gcc:

>         PR tree-optimization/99142

>         * match.pd (clz cmp 0): Gate replacement on single_use of clz result.

>

> gcc/testsuite:

>         PR tree-optimization/99142

>         * testsuite/gcc.dg/tree-ssa/pr99142.c: New test.

> ---

>  gcc/match.pd                            |  4 ++--

>  gcc/testsuite/gcc.dg/tree-ssa/pr99142.c | 14 ++++++++++++++

>  2 files changed, 16 insertions(+), 2 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99142.c

>

> diff --git a/gcc/match.pd b/gcc/match.pd

> index e14f69744d7e..760f773cf1b1 100644

> --- a/gcc/match.pd

> +++ b/gcc/match.pd

> @@ -6315,8 +6315,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>   (for op (eq ne)

>        cmp (lt ge)

>    (simplify

> -   (op (clz:s @0) INTEGER_CST@1)

> -   (if (integer_zerop (@1))

> +   (op (clz:s@2 @0) INTEGER_CST@1)

> +   (if (integer_zerop (@1) && single_use (@2))

>      /* clz(X) == 0 is (int)X < 0 and clz(X) != 0 is (int)X >= 0.  */

>      (with { tree stype = signed_type_for (TREE_TYPE (@0));

>             HOST_WIDE_INT val = 0;

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c b/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c

> new file mode 100644

> index 000000000000..1781a89de32a

> --- /dev/null

> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c

> @@ -0,0 +1,14 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -fdump-tree-optimized" } */

> +/* { dg-final { scan-tree-dump-not " >= 0\\)" "optimized" } } */

> +int f(int a, int *b, int *d)

> +{

> +  int c = __builtin_clz(a);

> +

> +  *b = c;

> +

> +  if (c != 0)

> +    *d = c;

> +

> +  return c;

> +}

> --

> 2.11.0

>
Jason Merrill via Gcc-patches Feb. 18, 2021, 10:22 a.m. | #2
On Thu, Feb 18, 2021 at 11:12:26AM +0100, Richard Biener via Gcc-patches wrote:
> On Thu, Feb 18, 2021 at 12:35 AM Hans-Peter Nilsson via Gcc-patches

> <gcc-patches@gcc.gnu.org> wrote:

> >

> > If we're not going to eliminate the clz, it's better for the

> > comparison to use that result than its input, so we don't

> > extend the lifetime of the input.  Also, an additional use

> > of the result is more likely cheaper than a compare of the

> > input, in particular considering that the clz may have made

> > available a non-zero condition matching the original use.

> 

> Of course the non-zero compare can execute concurrently

> with the clz if we use the input which can result in faster

> operation.  It's one of the many cases where local folding

> doesn't tell us enough.  One of the usual issues with

> single_use checks is that the IL is not fully DCEd while

> we fold it and thus dead uses can make sth !single_use

> even though it really is single_use.


And on the other side, the IL might not be fully CSEd yet and
so we can see single use, but after SCCVN it wouldn't be single use.

	Jakub
Jason Merrill via Gcc-patches Feb. 18, 2021, 10:29 a.m. | #3
On Thu, Feb 18, 2021 at 11:22 AM Jakub Jelinek <jakub@redhat.com> wrote:
>

> On Thu, Feb 18, 2021 at 11:12:26AM +0100, Richard Biener via Gcc-patches wrote:

> > On Thu, Feb 18, 2021 at 12:35 AM Hans-Peter Nilsson via Gcc-patches

> > <gcc-patches@gcc.gnu.org> wrote:

> > >

> > > If we're not going to eliminate the clz, it's better for the

> > > comparison to use that result than its input, so we don't

> > > extend the lifetime of the input.  Also, an additional use

> > > of the result is more likely cheaper than a compare of the

> > > input, in particular considering that the clz may have made

> > > available a non-zero condition matching the original use.

> >

> > Of course the non-zero compare can execute concurrently

> > with the clz if we use the input which can result in faster

> > operation.  It's one of the many cases where local folding

> > doesn't tell us enough.  One of the usual issues with

> > single_use checks is that the IL is not fully DCEd while

> > we fold it and thus dead uses can make sth !single_use

> > even though it really is single_use.

>

> And on the other side, the IL might not be fully CSEd yet and

> so we can see single use, but after SCCVN it wouldn't be single use.


Yes, the issue is definitely complex.  The IL before SCCVN can also
be not fully DCEd and so VN could CSE an expression for one that
is dead (and now becomes live).

Richard.

>         Jakub

>
Jason Merrill via Gcc-patches Feb. 18, 2021, 12:21 p.m. | #4
> From: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>

> Date: Thu, 18 Feb 2021 11:12:26 +0100


> On Thu, Feb 18, 2021 at 12:35 AM Hans-Peter Nilsson via Gcc-patches

> <gcc-patches@gcc.gnu.org> wrote:

> >

> > If we're not going to eliminate the clz, it's better for the

> > comparison to use that result than its input, so we don't

> > extend the lifetime of the input.  Also, an additional use

> > of the result is more likely cheaper than a compare of the

> > input, in particular considering that the clz may have made

> > available a non-zero condition matching the original use.

> 

> Of course the non-zero compare can execute concurrently

> with the clz if we use the input which can result in faster

> operation.  It's one of the many cases where local folding

> doesn't tell us enough.  One of the usual issues with

> single_use checks is that the IL is not fully DCEd while

> we fold it and thus dead uses can make sth !single_use

> even though it really is single_use.

> 

> Also if there's a use of the input live at the point of the

> compare we want to replace it's likely better to do the

> transform.


Yeah, all depending on the code context.  I guess these
simple match.pd transformations will grow some conditions.

> > The "s" modifier doesn't stop this situation, as the

> > transformation wouldn't result in "an expression with more

> > than one operator"; a gating single_use condition on the

> > result is necessary.

> 

> Note that :s was done this way for the use in VN where

> we can this way CSE c != 0 with an earlier a >= 0.

> 

> It might be an interesting idea to allow match.pd consumers

> to "switch" behavior of :s according to their needs.


Like, as a heuristic for a non-ooo-target, to mean single_use?
Yes, good idea!

> > Regtested cris-elf and x86_64-linux.

> > Ok to commit?

> 

> OK.


Done as a2ef38b1f94d, thanks for the review.

> gcc/testsuite:

>         PR tree-optimization/99142

>         * testsuite/gcc.dg/tree-ssa/pr99142.c: New test.


JFTR; oops: ^ Spurious "testsuite/" removed.

brgds, H-P
Jason Merrill via Gcc-patches Feb. 18, 2021, 1:10 p.m. | #5
On Thu, Feb 18, 2021 at 1:21 PM Hans-Peter Nilsson <hp@axis.com> wrote:
>

> > From: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>

> > Date: Thu, 18 Feb 2021 11:12:26 +0100

>

> > On Thu, Feb 18, 2021 at 12:35 AM Hans-Peter Nilsson via Gcc-patches

> > <gcc-patches@gcc.gnu.org> wrote:

> > >

> > > If we're not going to eliminate the clz, it's better for the

> > > comparison to use that result than its input, so we don't

> > > extend the lifetime of the input.  Also, an additional use

> > > of the result is more likely cheaper than a compare of the

> > > input, in particular considering that the clz may have made

> > > available a non-zero condition matching the original use.

> >

> > Of course the non-zero compare can execute concurrently

> > with the clz if we use the input which can result in faster

> > operation.  It's one of the many cases where local folding

> > doesn't tell us enough.  One of the usual issues with

> > single_use checks is that the IL is not fully DCEd while

> > we fold it and thus dead uses can make sth !single_use

> > even though it really is single_use.

> >

> > Also if there's a use of the input live at the point of the

> > compare we want to replace it's likely better to do the

> > transform.

>

> Yeah, all depending on the code context.  I guess these

> simple match.pd transformations will grow some conditions.

>

> > > The "s" modifier doesn't stop this situation, as the

> > > transformation wouldn't result in "an expression with more

> > > than one operator"; a gating single_use condition on the

> > > result is necessary.

> >

> > Note that :s was done this way for the use in VN where

> > we can this way CSE c != 0 with an earlier a >= 0.

> >

> > It might be an interesting idea to allow match.pd consumers

> > to "switch" behavior of :s according to their needs.

>

> Like, as a heuristic for a non-ooo-target, to mean single_use?

> Yes, good idea!


More like have VN the existing semantics but fold_stmt treat
it as single_use for example.

> > > Regtested cris-elf and x86_64-linux.

> > > Ok to commit?

> >

> > OK.

>

> Done as a2ef38b1f94d, thanks for the review.

>

> > gcc/testsuite:

> >         PR tree-optimization/99142

> >         * testsuite/gcc.dg/tree-ssa/pr99142.c: New test.

>

> JFTR; oops: ^ Spurious "testsuite/" removed.

>

> brgds, H-P

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index e14f69744d7e..760f773cf1b1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6315,8 +6315,8 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (for op (eq ne)
       cmp (lt ge)
   (simplify
-   (op (clz:s @0) INTEGER_CST@1)
-   (if (integer_zerop (@1))
+   (op (clz:s@2 @0) INTEGER_CST@1)
+   (if (integer_zerop (@1) && single_use (@2))
     /* clz(X) == 0 is (int)X < 0 and clz(X) != 0 is (int)X >= 0.  */
     (with { tree stype = signed_type_for (TREE_TYPE (@0));
 	    HOST_WIDE_INT val = 0;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c b/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c
new file mode 100644
index 000000000000..1781a89de32a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not " >= 0\\)" "optimized" } } */
+int f(int a, int *b, int *d)
+{
+  int c = __builtin_clz(a);
+
+  *b = c;
+
+  if (c != 0)
+    *d = c;
+
+  return c;
+}