Use simple LRA algorithm at -O0

Message ID 1615258.mc8dMGPkXm@polaris
State New
Headers show
Series
  • Use simple LRA algorithm at -O0
Related show

Commit Message

Eric Botcazou Dec. 17, 2019, 6:02 p.m.
Hi,

LRA is getting measurably slower since GCC 8, at least on x86, and things are 
worsening since GCC 9.  While this might be legitimate when optimization is 
enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA 
over to using the simple algorithm when optimization is disabled.  The effect 
on code size is tiny (typically 0.2% on x86).

Tested on x86_64-suse-linux, OK for the mainline?


2019-12-17  Eric Botcazou  <ebotcazou@adacore.com>

	* ira.c (ira): Use simple LRA algorithm when not optimizing.

-- 
Eric Botcazou

Comments

Vladimir Makarov Dec. 18, 2019, 1:59 p.m. | #1
On 2019-12-17 1:02 p.m., Eric Botcazou wrote:
> Hi,

>

> LRA is getting measurably slower since GCC 8, at least on x86, and things are

> worsening since GCC 9.  While this might be legitimate when optimization is

> enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA

> over to using the simple algorithm when optimization is disabled.  The effect

> on code size is tiny (typically 0.2% on x86).

>

> Tested on x86_64-suse-linux, OK for the mainline?

>

Eric, thank you for reporting this issue and providing the patch.   
Simple LRA algorithms switch off hard register splitting, so there might 
a slightly bigger chance for occurring "can find reload register" error 
(e.g. when -O0 -fschedule-insns is used). But this error is still not 
solved in general case and in my experience the chance for this error is 
even bigger for optimized modes than for -O0 with simple LRA algorithms.

Saying that I believe the patch is OK for the trunk.

> 2019-12-17  Eric Botcazou  <ebotcazou@adacore.com>

>

> 	* ira.c (ira): Use simple LRA algorithm when not optimizing.

>
Eric Botcazou Dec. 19, 2019, 11:29 a.m. | #2
> Simple LRA algorithms switch off hard register splitting, so there might

> a slightly bigger chance for occurring "can find reload register" error

> (e.g. when -O0 -fschedule-insns is used). But this error is still not

> solved in general case and in my experience the chance for this error is

> even bigger for optimized modes than for -O0 with simple LRA algorithms.


I see, thanks for the explanation.  So this could occur for register varuables 
or something along these lines?

> Saying that I believe the patch is OK for the trunk.


OK, let's see how it fares.  We have been using it with a GCC 9 compiler for 
some time, without any problem so far.

-- 
Eric Botcazou
Vladimir Makarov Dec. 19, 2019, 10:33 p.m. | #3
On 12/19/19 6:29 AM, Eric Botcazou wrote:
>> Simple LRA algorithms switch off hard register splitting, so there might

>> a slightly bigger chance for occurring "can find reload register" error

>> (e.g. when -O0 -fschedule-insns is used). But this error is still not

>> solved in general case and in my experience the chance for this error is

>> even bigger for optimized modes than for -O0 with simple LRA algorithms.

> I see, thanks for the explanation.  So this could occur for register varuables

> or something along these lines?


It might occur when when liveness of hard registers explicitly present 
in RTL are expanded. A typical example is a move of hard register (e.g. 
x86-64 dx used as function call argument) through insn always requiring 
this hard register (e.g. a x86-64 div insn using ax/dx hard register).  
Also there are more complicated cases.  Reload pass never tried to solve 
this problem.  LRA tries to solve it but still in general case this 
problem is also not solved.  Therefore 1st insn scheduler on some 
targets is switched off by default.  Still GCC users can switch it on 
and ran into the problem with or without the patch.

>> Saying that I believe the patch is OK for the trunk.

> OK, let's see how it fares.  We have been using it with a GCC 9 compiler for

> some time, without any problem so far.

>

As I wrote for typical GCC use the patch will not create any problem.  
But GCC users (or running automatically generated tests with artificial 
option set) still can ran into the problem as it was before the patch.

Patch

Index: ira.c
===================================================================
--- ira.c	(revision 279442)
+++ ira.c	(working copy)
@@ -5192,8 +5192,6 @@  ira (FILE *f)
   int ira_max_point_before_emit;
   bool saved_flag_caller_saves = flag_caller_saves;
   enum ira_region saved_flag_ira_region = flag_ira_region;
-  unsigned int i;
-  int num_used_regs = 0;
 
   clear_bb_flags ();
 
@@ -5207,18 +5205,28 @@  ira (FILE *f)
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
-  ira_conflicts_p = optimize > 0;
+  if (optimize)
+    {
+      ira_conflicts_p = true;
 
-  /* Determine the number of pseudos actually requiring coloring.  */
-  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
-    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
-
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
-     pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
-     use simplified and faster algorithms in LRA.  */
-  lra_simple_p
-    = (ira_use_lra_p
-       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
+      /* Determine the number of pseudos actually requiring coloring.  */
+      unsigned int num_used_regs = 0;
+      for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+	if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+	  num_used_regs++;
+
+      /* If there are too many pseudos and/or basic blocks (e.g. 10K
+	 pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+	 use simplified and faster algorithms in LRA.  */
+      lra_simple_p
+	= ira_use_lra_p
+	  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
+    }
+  else
+    {
+      ira_conflicts_p = false;
+      lra_simple_p = ira_use_lra_p;
+    }
 
   if (lra_simple_p)
     {