RFC: make combine do as advertised (cheaper-than)?

Message ID 202007060201.06621sH0032715@ignucius.se.axis.com
State New
Headers show
Series
  • RFC: make combine do as advertised (cheaper-than)?
Related show

Commit Message

Jason Merrill via Gcc-patches July 6, 2020, 2:01 a.m.
Most comments, including the second sentence in the head comment
of combine_validate_cost, the main decision-maker of the combine
pass, refer to the function as returning true if the new
insns(s) *cheaper* than the old insns, when in fact the function
returned true also if the cost was the same.  Returning true for
cheaper also seems more sane than as-cheap-as considering the
need to avoid oscillation between same-cost combinations.  Also,
it makes the job of later passes harder, having combine make
more complex combinations of the same cost.

Right, you can affect this with your target TARGET_RTX_COSTS and
TARGET_INSN_COST hooks, but only for trivial cases, and you have
increasingly more complex combinations (many-to-many
combinations) where you have to twist simple logic to appease
combine (stop it from combining) or give up.

Main-interest ports are unsurprisingly pretty tied to this
effect.  I'd love to install the following patch, adjusting the
function and the two opposing comments.  But...it causes
hundreds of regressions for each of x86_64-linux and
aarch64-linux (tens for ppc64le-linux), so I'm just not up to
the task, at least not without previous buy-in from reviewers.

It would need those targets to have their TARGET_INSN_COST
and/or TARGET_RTX_COSTS functions adjusted.

Alternatives from the top of my head, one of:

- With buy-in from global reviewers, installing this patch on a
development branch and let all target maintainers adjust their
target test-cases and cost-functions there, for merge when
first-class targets are done.  (I'm a dreamer.)

- A target combine hook for the decision (passing for inspection
tuples of from-insns and to-insns and costs) and just falling
back to the current addition of rtx costs.

- A simpler target combine decision hook that says which one of
"cheaper" or "as-cheap-as".

- Adjusting documentation and comments that are currently
untruthful about the cost decision to instead say (to the effect
of) "as cheap as" instead of "cheaper".

So, WDYT?

(Tested as above, causing massive pattern-match regressions.)

gcc:
	* combine.c (combine_validate_cost): Reject unless the new total
 	cost is cheaper than the original.  Adjust the minority of
 	comments that don't say "cheaper":

---
 gcc/combine.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.11.0

Comments

Jason Merrill via Gcc-patches July 6, 2020, 7:50 a.m. | #1
On Mon, Jul 6, 2020 at 4:03 AM Hans-Peter Nilsson via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>

> Most comments, including the second sentence in the head comment

> of combine_validate_cost, the main decision-maker of the combine

> pass, refer to the function as returning true if the new

> insns(s) *cheaper* than the old insns, when in fact the function

> returned true also if the cost was the same.  Returning true for

> cheaper also seems more sane than as-cheap-as considering the

> need to avoid oscillation between same-cost combinations.  Also,

> it makes the job of later passes harder, having combine make

> more complex combinations of the same cost.

>

> Right, you can affect this with your target TARGET_RTX_COSTS and

> TARGET_INSN_COST hooks, but only for trivial cases, and you have

> increasingly more complex combinations (many-to-many

> combinations) where you have to twist simple logic to appease

> combine (stop it from combining) or give up.

>

> Main-interest ports are unsurprisingly pretty tied to this

> effect.  I'd love to install the following patch, adjusting the

> function and the two opposing comments.  But...it causes

> hundreds of regressions for each of x86_64-linux and

> aarch64-linux (tens for ppc64le-linux), so I'm just not up to

> the task, at least not without previous buy-in from reviewers.

>

> It would need those targets to have their TARGET_INSN_COST

> and/or TARGET_RTX_COSTS functions adjusted.

>

> Alternatives from the top of my head, one of:

>

> - With buy-in from global reviewers, installing this patch on a

> development branch and let all target maintainers adjust their

> target test-cases and cost-functions there, for merge when

> first-class targets are done.  (I'm a dreamer.)

>

> - A target combine hook for the decision (passing for inspection

> tuples of from-insns and to-insns and costs) and just falling

> back to the current addition of rtx costs.

>

> - A simpler target combine decision hook that says which one of

> "cheaper" or "as-cheap-as".


A target combine decision hook that on old == new cost
decides between both and given the actual insns?  Might
lead to quite hackish target hook implementations...

> - Adjusting documentation and comments that are currently

> untruthful about the cost decision to instead say (to the effect

> of) "as cheap as" instead of "cheaper".


Well, adjusting the function comment to reflect reality is
kind-of obvious ;)

> So, WDYT?

>

> (Tested as above, causing massive pattern-match regressions.)

>

> gcc:

>         * combine.c (combine_validate_cost): Reject unless the new total

>         cost is cheaper than the original.  Adjust the minority of

>         comments that don't say "cheaper":

>

> ---

>  gcc/combine.c | 8 ++++----

>  1 file changed, 4 insertions(+), 4 deletions(-)

>

> diff --git a/gcc/combine.c b/gcc/combine.c

> index f69413a..7da144e 100644

> --- a/gcc/combine.c

> +++ b/gcc/combine.c

> @@ -846,8 +846,8 @@ do_SUBST_LINK (struct insn_link **into, struct insn_link *newval)

>     than the original sequence I0, I1, I2, I3 and undobuf.other_insn.  Note

>     that I0, I1 and/or NEWI2PAT may be NULL_RTX.  Similarly, NEWOTHERPAT and

>     undobuf.other_insn may also both be NULL_RTX.  Return false if the cost

> -   of all the instructions can be estimated and the replacements are more

> -   expensive than the original sequence.  */

> +   of all the instructions can be estimated and the replacements are not

> +   cheaper than the original sequence.  */

>

>  static bool

>  combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3,

> @@ -938,8 +938,8 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3,

>      }

>

>    /* Disallow this combination if both new_cost and old_cost are greater than

> -     zero, and new_cost is greater than old cost.  */

> -  int reject = old_cost > 0 && new_cost > old_cost;

> +     zero, and new_cost is greater than or equal to the old cost.  */

> +  int reject = old_cost > 0 && new_cost >= old_cost;

>

>    if (dump_file)

>      {

> --

> 2.11.0
Richard Sandiford July 6, 2020, 9:48 a.m. | #2
Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Most comments, including the second sentence in the head comment

> of combine_validate_cost, the main decision-maker of the combine

> pass, refer to the function as returning true if the new

> insns(s) *cheaper* than the old insns, when in fact the function

> returned true also if the cost was the same.  Returning true for

> cheaper also seems more sane than as-cheap-as considering the

> need to avoid oscillation between same-cost combinations.  Also,

> it makes the job of later passes harder, having combine make

> more complex combinations of the same cost.

>

> Right, you can affect this with your target TARGET_RTX_COSTS and

> TARGET_INSN_COST hooks, but only for trivial cases, and you have

> increasingly more complex combinations (many-to-many

> combinations) where you have to twist simple logic to appease

> combine (stop it from combining) or give up.

>

> Main-interest ports are unsurprisingly pretty tied to this

> effect.  I'd love to install the following patch, adjusting the

> function and the two opposing comments.  But...it causes

> hundreds of regressions for each of x86_64-linux and

> aarch64-linux (tens for ppc64le-linux), so I'm just not up to

> the task, at least not without previous buy-in from reviewers.


Out of interest, how do the results change if we still allow the
combination for equal costs if the new sequence is shorter than
the original?  I think that still counts as “cheaper than”,
but maybe I'm too RISC-centric. ;-)  (I'm not saying that's
what we should do, I'm just curious.)

Originally combine always produced shorter sequences, so by the
above reasoning, combining for == was correct.  These days we allow
N-to-N replacements too, which are obviously a good thing when
the new cost is lower, but are more of a wash when the costs
are the same.  But even then, the combination should have a
“canonicalisation” effect.  (Unfortunately that effect includes
the result of expand_compound_operation/make_compound_operation.)

Do you have specific examples of where things go wrong?

Thanks,
Richard
Segher Boessenkool July 14, 2020, 9:44 p.m. | #3
Hi!

On Mon, Jul 06, 2020 at 10:48:25AM +0100, Richard Sandiford wrote:
> Originally combine always produced shorter sequences, so by the


(Shorter in # insns, not # bytes).

> above reasoning, combining for == was correct.  These days we allow

> N-to-N replacements too, which are obviously a good thing when


Only 2-2 so far (not 1-1 yet, there are some target problems to
overcome first).

> the new cost is lower, but are more of a wash when the costs

> are the same.  But even then, the combination should have a

> “canonicalisation” effect.  (Unfortunately that effect includes

> the result of expand_compound_operation/make_compound_operation.)


2-2 is always reducing latency if the costs are equal (and sane ;-) ),
that is a large part of what makes 2-2 combinations useful.  Originally
the output of i2 is input to i3, but not anymore in the new insns.


Segher
Segher Boessenkool July 14, 2020, 9:58 p.m. | #4
Hi!

On Mon, Jul 06, 2020 at 04:01:54AM +0200, Hans-Peter Nilsson via Gcc-patches wrote:
> Most comments, including the second sentence in the head comment

> of combine_validate_cost, the main decision-maker of the combine

> pass, refer to the function as returning true if the new

> insns(s) *cheaper* than the old insns, when in fact the function

> returned true also if the cost was the same.  Returning true for

> cheaper also seems more sane than as-cheap-as considering the

> need to avoid oscillation between same-cost combinations.  Also,

> it makes the job of later passes harder, having combine make

> more complex combinations of the same cost.


All of this was added in https://gcc.gnu.org/g:64b8935d4809 , which also
adds the

+  /* Disallow this recombination if both new_cost and old_cost are
+     greater than zero, and new_cost is greater than old cost.  */

comment (which is what it actually does).  Before that change, combine
didn't look at costs at all.

Combine never has used a "strictly smaller than" cost thing.

> Right, you can affect this with your target TARGET_RTX_COSTS and

> TARGET_INSN_COST hooks, but only for trivial cases, and you have

> increasingly more complex combinations (many-to-many

> combinations) where you have to twist simple logic to appease

> combine (stop it from combining) or give up.


There are 2-1, 3-1, 4-1, 3-2, 4-2, which all reduce the number of insns,
and then there is 2-2, which gets rid of one log_link.  If the isnn_cost
stays the same, it always wins something else.

> Alternatives from the top of my head, one of:


...

5) Improve your target so that its insn_cost reflects ithe costs of
the insns better.

Can you share some typical examples where things are worse with the
current behaviour?


Segher
Segher Boessenkool Aug. 10, 2020, 9:24 p.m. | #5
Hi!

On Mon, Aug 10, 2020 at 05:47:53PM +0200, Hans-Peter Nilsson wrote:
> > All of this was added in https://gcc.gnu.org/g:64b8935d4809 , which also

> > adds the

> > 

> > +  /* Disallow this recombination if both new_cost and old_cost are

> > +     greater than zero, and new_cost is greater than old cost.  */

> > 

> > comment (which is what it actually does).  Before that change, combine

> > didn't look at costs at all.

> > 

> > Combine never has used a "strictly smaller than" cost thing.

> 

> (I suggest we use terms of generally greater/lower cost, instead

> of smaller/greater cost, to avoid confusion whether "smaller"

> refers to the general cost or just code size.


It refers to neither.  It refers to the concept of "cost" in combine,
which is a number (and some extra conditions).  Combine is a local
optimisation ("peephole", if you want), so it uses a local cost
function.

And yes, that cost is calculated differently if you use -Os.  The is
because it uses insn_cost to base on.

As it is a number, I (and most other people) use "smaller than".

> > > Right, you can affect this with your target TARGET_RTX_COSTS and

> > > TARGET_INSN_COST hooks, but only for trivial cases, and you have

> > > increasingly more complex combinations (many-to-many

> > > combinations) where you have to twist simple logic to appease

> > > combine (stop it from combining) or give up.

> > 

> > There are 2-1, 3-1, 4-1, 3-2, 4-2, which all reduce the number of insns,

> > and then there is 2-2, which gets rid of one log_link.  If the isnn_cost

> > stays the same, it always wins something else.

> 

> That "something else" is presumptious.


I wrote this code, I am maintainer of combine, I really do know.  This
is not presumptuous.  This is important to make sure combine never can
get into an infinite loop!

> A longer

> dependency-chain may sum up to faster execution considering

> parallel or OoO execution.  Someone adding another N-M

> combination will notice, after two more years of work. :)


Combine knows nothing about how any specific CPU will actually execute
the code.  It sees just a tiny fraction of code at any one time, anyway.

All it does is optimise for the lowest cost.  Whether that actually
performs best, is something the backend writer has to take care of.

(insn_cost is only the secondary "score", anyway: the number of RTL
insns is primary).

> > 5) Improve your target so that its insn_cost reflects ithe costs of

> > the insns better.

> 

> I see cheap gnawing, I hope I didn't add any of that myself.


I guess we both did.  Sorry.

> I already covered target costs before the 1..4 list and you

> actually quoted that (the quoted paragraph above your log_links

> comment).


Let me write this differently, then:

5) Write your target cost callbacks so that you work *with* combine, not
*against* it.  This gives better results.

> To recap, it all began with the observation of comments in

> combine.c saying "cheaper than", with the code implementing

> "same or cheaper", together with the flaw fixed by 84c5396d4bdbf

> which I belive is a sufficient conclusion for that specific

> instance.  A lower cost also seems more consistent with other

> decisions.  (BTW, the commit is a bit misleading; I believe all

> "case #2" cc0 convertees will benefit from an accurate

> is_just_move, not just CRIS.)


It makes a difference on many other targets, but it never results in
different code there.  I wonder why not (or, why it does for cris).

> The fixed 2-2 case is all the "typical examples" I had so far.

> Remaining suggestions are just fixing perceived inconsistencies.


All the other cases reduce the number of RTL insns, which is a clear
cost metric (and originally the only one!)  If you have RTL insns that
correspond to more than just single machine insns, this can hurt you;
otherwise, there are a few cases where you want to trick combine, but
not many (doing that tends to backfire anyway, better keep it to a
minimum).


Segher

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index f69413a..7da144e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -846,8 +846,8 @@  do_SUBST_LINK (struct insn_link **into, struct insn_link *newval)
    than the original sequence I0, I1, I2, I3 and undobuf.other_insn.  Note
    that I0, I1 and/or NEWI2PAT may be NULL_RTX.  Similarly, NEWOTHERPAT and
    undobuf.other_insn may also both be NULL_RTX.  Return false if the cost
-   of all the instructions can be estimated and the replacements are more
-   expensive than the original sequence.  */
+   of all the instructions can be estimated and the replacements are not
+   cheaper than the original sequence.  */
 
 static bool
 combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3,
@@ -938,8 +938,8 @@  combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3,
     }
 
   /* Disallow this combination if both new_cost and old_cost are greater than
-     zero, and new_cost is greater than old cost.  */
-  int reject = old_cost > 0 && new_cost > old_cost;
+     zero, and new_cost is greater than or equal to the old cost.  */
+  int reject = old_cost > 0 && new_cost >= old_cost;
 
   if (dump_file)
     {