Schedule by INSN_COST in case of tie

Message ID 5B97C539.1020009@arm.com
State New
Headers show
Series
  • Schedule by INSN_COST in case of tie
Related show

Commit Message

Vlad Lazar Sept. 11, 2018, 1:38 p.m.
Hi.

This patch makes the scheduler prefer instructions with higher cost if two given instructions are equally good.
Issuing more restricted instructions first is particularly useful on in-order cores because it increases the
number of dual issue opportunities.

For example, on AArch64, instead of:

   add     x11, x2, 96
   mov     x4, x2
   mov     w10, 1
   ldrh    w5, [x0]
   ldrh    w13, [x0, 2]
   ldrh    w9, [x0, 4]
   ldrh    w12, [x0, 6]
   b       .L759

Generate:

   ldrh    w5, [x0]
   add     x11, x2, 96
   ldrh    w13, [x0, 2]
   mov     x4, x2
   ldrh    w9, [x0, 4]
   mov     w10, 1
   ldrh    w12, [x0, 6]
   b       .L759

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no regressions.
Ok for trunk?

Thanks,
Vlad

gcc/
Changelog for gcc/Changelog
2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>

	* haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.
	(rfs_decision): New scheduling decision.

Comments

Ramana Radhakrishnan Sept. 11, 2018, 2:55 p.m. | #1
On Tue, 11 Sep 2018, 14:38 Vlad Lazar, <vlad.lazar@arm.com> wrote:

> Hi.

>

> This patch makes the scheduler prefer instructions with higher cost if two

> given instructions are equally good.

> Issuing more restricted instructions first is particularly useful on

> in-order cores because it increases the

> number of dual issue opportunities.

>

> For example, on AArch64, instead of:

>

>    add     x11, x2, 96

>    mov     x4, x2

>    mov     w10, 1

>    ldrh    w5, [x0]

>    ldrh    w13, [x0, 2]

>    ldrh    w9, [x0, 4]

>    ldrh    w12, [x0, 6]

>    b       .L759

>

> Generate:

>

>    ldrh    w5, [x0]

>    add     x11, x2, 96

>    ldrh    w13, [x0, 2]

>    mov     x4, x2

>    ldrh    w9, [x0, 4]

>    mov     w10, 1

>    ldrh    w12, [x0, 6]

>    b       .L759

>

> Bootstrapped and regtested on aarch64-none-linux-gnu and there are no

> regressions.

> Ok for trunk?

>


This to me feels like the wrong approach as it feels like you are assuming
INSN_COST is latency in some way ? Surely, we shouldn't be introducing
INSN_COST based stuff into the scheduler.

Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the
right name in the internals documents) and such hooks that come from the
scheduler rather than trying to massage INSN_COST into the target
independent parts of the scheduler ?

Ramana


>

> Thanks,

> Vlad

>

> gcc/

> Changelog for gcc/Changelog

> 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>

>

>         * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.

>         (rfs_decision): New scheduling decision.

>
Kyrill Tkachov Sept. 11, 2018, 3 p.m. | #2
Hi Ramana,

On 11/09/18 15:55, Ramana Radhakrishnan wrote:
> On Tue, 11 Sep 2018, 14:38 Vlad Lazar, <vlad.lazar@arm.com> wrote:

>

> > Hi.

> >

> > This patch makes the scheduler prefer instructions with higher cost if two

> > given instructions are equally good.

> > Issuing more restricted instructions first is particularly useful on

> > in-order cores because it increases the

> > number of dual issue opportunities.

> >

> > For example, on AArch64, instead of:

> >

> >    add     x11, x2, 96

> >    mov     x4, x2

> >    mov     w10, 1

> >    ldrh    w5, [x0]

> >    ldrh    w13, [x0, 2]

> >    ldrh    w9, [x0, 4]

> >    ldrh    w12, [x0, 6]

> >    b       .L759

> >

> > Generate:

> >

> >    ldrh    w5, [x0]

> >    add     x11, x2, 96

> >    ldrh    w13, [x0, 2]

> >    mov     x4, x2

> >    ldrh    w9, [x0, 4]

> >    mov     w10, 1

> >    ldrh    w12, [x0, 6]

> >    b       .L759

> >

> > Bootstrapped and regtested on aarch64-none-linux-gnu and there are no

> > regressions.

> > Ok for trunk?

> >

>

> This to me feels like the wrong approach as it feels like you are assuming

> INSN_COST is latency in some way ? Surely, we shouldn't be introducing

> INSN_COST based stuff into the scheduler.

>

> Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the

> right name in the internals documents) and such hooks that come from the

> scheduler rather than trying to massage INSN_COST into the target

> independent parts of the scheduler ?

>


In the context of haifa-sched.c, INSN_COST is the latency cost.
It is not the rtx_cost of the insn, as used by combine and others.
So this approach looks reasonable to me (though I haven't done a deep review).

Thanks,
Kyrill

> Ramana

>

>

> >

> > Thanks,

> > Vlad

> >

> > gcc/

> > Changelog for gcc/Changelog

> > 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>

> >

> >         * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.

> >         (rfs_decision): New scheduling decision.

> >
Ramana Radhakrishnan Sept. 11, 2018, 3:03 p.m. | #3
>

> > This to me feels like the wrong approach as it feels like you are assuming

> > INSN_COST is latency in some way ? Surely, we shouldn't be introducing

> > INSN_COST based stuff into the scheduler.

> >

> > Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the

> > right name in the internals documents) and such hooks that come from the

> > scheduler rather than trying to massage INSN_COST into the target

> > independent parts of the scheduler ?

> >

>

> In the context of haifa-sched.c, INSN_COST is the latency cost.

> It is not the rtx_cost of the insn, as used by combine and others.


Ah, I was conflating rtx_cost with INSN_COST, sorry about the noise.

Ramana

> So this approach looks reasonable to me (though I haven't done a deep review).

>

> Thanks,

> Kyrill

>

> > Ramana

> >

> >

> > >

> > > Thanks,

> > > Vlad

> > >

> > > gcc/

> > > Changelog for gcc/Changelog

> > > 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>

> > >

> > >         * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.

> > >         (rfs_decision): New scheduling decision.

> > >

>
Jeff Law Sept. 11, 2018, 8:46 p.m. | #4
On 9/11/18 9:00 AM, Kyrill Tkachov wrote:
> Hi Ramana,

> 

> On 11/09/18 15:55, Ramana Radhakrishnan wrote:

>> On Tue, 11 Sep 2018, 14:38 Vlad Lazar, <vlad.lazar@arm.com> wrote:

>>

>> > Hi.

>> >

>> > This patch makes the scheduler prefer instructions with higher cost

>> if two

>> > given instructions are equally good.

>> > Issuing more restricted instructions first is particularly useful on

>> > in-order cores because it increases the

>> > number of dual issue opportunities.

>> >

>> > For example, on AArch64, instead of:

>> >

>> >    add     x11, x2, 96

>> >    mov     x4, x2

>> >    mov     w10, 1

>> >    ldrh    w5, [x0]

>> >    ldrh    w13, [x0, 2]

>> >    ldrh    w9, [x0, 4]

>> >    ldrh    w12, [x0, 6]

>> >    b       .L759

>> >

>> > Generate:

>> >

>> >    ldrh    w5, [x0]

>> >    add     x11, x2, 96

>> >    ldrh    w13, [x0, 2]

>> >    mov     x4, x2

>> >    ldrh    w9, [x0, 4]

>> >    mov     w10, 1

>> >    ldrh    w12, [x0, 6]

>> >    b       .L759

>> >

>> > Bootstrapped and regtested on aarch64-none-linux-gnu and there are no

>> > regressions.

>> > Ok for trunk?

>> >

>>

>> This to me feels like the wrong approach as it feels like you are

>> assuming

>> INSN_COST is latency in some way ? Surely, we shouldn't be introducing

>> INSN_COST based stuff into the scheduler.

>>

>> Have you investigated  using TARGET_SCHED_ADJUST_COST (IIRC, look for the

>> right name in the internals documents) and such hooks that come from the

>> scheduler rather than trying to massage INSN_COST into the target

>> independent parts of the scheduler ?

>>

> 

> In the context of haifa-sched.c, INSN_COST is the latency cost.

> It is not the rtx_cost of the insn, as used by combine and others.

> So this approach looks reasonable to me (though I haven't done a deep

> review).

It looks reasonable to me as well.  Essentially it's a new tie-breaker
if everything else is equal.

jeff
Jeff Law Sept. 11, 2018, 8:46 p.m. | #5
On 9/11/18 7:38 AM, Vlad Lazar wrote:
> Hi.

> 

> This patch makes the scheduler prefer instructions with higher cost if

> two given instructions are equally good.

> Issuing more restricted instructions first is particularly useful on

> in-order cores because it increases the

> number of dual issue opportunities.

> 

> For example, on AArch64, instead of:

> 

>   add     x11, x2, 96

>   mov     x4, x2

>   mov     w10, 1

>   ldrh    w5, [x0]

>   ldrh    w13, [x0, 2]

>   ldrh    w9, [x0, 4]

>   ldrh    w12, [x0, 6]

>   b       .L759

> 

> Generate:

> 

>   ldrh    w5, [x0]

>   add     x11, x2, 96

>   ldrh    w13, [x0, 2]

>   mov     x4, x2

>   ldrh    w9, [x0, 4]

>   mov     w10, 1

>   ldrh    w12, [x0, 6]

>   b       .L759

> 

> Bootstrapped and regtested on aarch64-none-linux-gnu and there are no

> regressions.

> Ok for trunk?

> 

> Thanks,

> Vlad

> 

> gcc/

> Changelog for gcc/Changelog

> 2018-09-11  Vlad Lazar  <vlad.lazar@arm.com>

> 

>     * haifa-sched.c (rank_for_schedule): Schedule by INSN_COST.

>     (rfs_decision): New scheduling decision.

OK.
jeff

Patch

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 4f0221f6f43..3095e0375b5 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -2542,7 +2542,7 @@  enum rfs_decision {
   RFS_SCHED_GROUP, RFS_PRESSURE_DELAY, RFS_PRESSURE_TICK,
   RFS_FEEDS_BACKTRACK_INSN, RFS_PRIORITY, RFS_SPECULATION,
   RFS_SCHED_RANK, RFS_LAST_INSN, RFS_PRESSURE_INDEX,
-  RFS_DEP_COUNT, RFS_TIE, RFS_FUSION, RFS_N };
+  RFS_DEP_COUNT, RFS_TIE, RFS_FUSION, RFS_COST, RFS_N };
 
 /* Corresponding strings for print outs.  */
 static const char *rfs_str[RFS_N] = {
@@ -2550,7 +2550,7 @@  static const char *rfs_str[RFS_N] = {
   "RFS_SCHED_GROUP", "RFS_PRESSURE_DELAY", "RFS_PRESSURE_TICK",
   "RFS_FEEDS_BACKTRACK_INSN", "RFS_PRIORITY", "RFS_SPECULATION",
   "RFS_SCHED_RANK", "RFS_LAST_INSN", "RFS_PRESSURE_INDEX",
-  "RFS_DEP_COUNT", "RFS_TIE", "RFS_FUSION" };
+  "RFS_DEP_COUNT", "RFS_TIE", "RFS_FUSION", "RFS_COST" };
 
 /* Statistical breakdown of rank_for_schedule decisions.  */
 struct rank_for_schedule_stats_t { unsigned stats[RFS_N]; };
@@ -2803,6 +2803,14 @@  rank_for_schedule (const void *x, const void *y)
   if (flag_sched_dep_count_heuristic && val != 0)
     return rfs_result (RFS_DEP_COUNT, val, tmp, tmp2);
 
+  /* Sort by INSN_COST rather than INSN_LUID.  This means that instructions
+     which take longer to execute are prioritised and it leads to more
+     dual-issue opportunities on in-order cores which have this feature.  */
+
+  if (INSN_COST (tmp) != INSN_COST (tmp2))
+    return rfs_result (RFS_COST, INSN_COST (tmp2) - INSN_COST (tmp),
+		       tmp, tmp2);
+
   /* If insns are equally good, sort by INSN_LUID (original insn order),
      so that we make the sort stable.  This minimizes instruction movement,
      thus minimizing sched's effect on debugging and cross-jumping.  */