[ARM] Fix ICE during thunk generation with -mlong-calls

Message ID 1722777.YijAB52ccF@polaris
State New
Headers show
Series
  • [ARM] Fix ICE during thunk generation with -mlong-calls
Related show

Commit Message

Eric Botcazou Sept. 17, 2018, 11:19 a.m.
Hi,

this is a regression present on mainline, 8 and 7 branches.  The new, RTL 
implementation of arm32_output_mi_thunk breaks during the libstdc++ build if 
you configure the compiler with -mlong-calls by default:

0xdb57eb gen_reg_rtx(machine_mode)
        /home/eric/svn/gcc/gcc/emit-rtl.c:1155
0xde9ae7 force_reg(machine_mode, rtx_def*)
        /home/eric/svn/gcc/gcc/explow.c:654
0x1bf73bf gen_sibcall(rtx_def*, rtx_def*, rtx_def*)
        /home/eric/svn/gcc/gcc/config/arm/arm.md:8272
0x187d3b1 arm32_output_mi_thunk
        /home/eric/svn/gcc/gcc/config/arm/arm.c:26762
0x187d4af arm_output_mi_thunk
        /home/eric/svn/gcc/gcc/config/arm/arm.c:26783
0xcb9c94 cgraph_node::expand_thunk(bool, bool)
        /home/eric/svn/gcc/gcc/cgraphunit.c:1783

because the code is wired for a short call.  Moreover, in PIC mode you need to 
work harder and fix up the minipool too with -mlong-calls.

Tested on ARM/Linux, OK for mainline, 8 and 7 branches?


2018-09-17  Eric Botcazou  <ebotcazou@adacore.com>

	* config/arm/arm.c (arm_reorg): Skip Thumb reorg pass for thunks.
	(arm32_output_mi_thunk): Deal with long calls.


2018-09-17  Eric Botcazou  <ebotcazou@adacore.com>

	* g++.dg/other/thunk2a.C: New test.
	* g++.dg/other/thunk2b.C: Likewise.

-- 
Eric Botcazou

Comments

Richard Earnshaw (lists) Sept. 17, 2018, 12:54 p.m. | #1
On 17/09/18 12:19, Eric Botcazou wrote:
> Hi,

> 

> this is a regression present on mainline, 8 and 7 branches.  The new, RTL 

> implementation of arm32_output_mi_thunk breaks during the libstdc++ build if 

> you configure the compiler with -mlong-calls by default:

> 

> 0xdb57eb gen_reg_rtx(machine_mode)

>         /home/eric/svn/gcc/gcc/emit-rtl.c:1155

> 0xde9ae7 force_reg(machine_mode, rtx_def*)

>         /home/eric/svn/gcc/gcc/explow.c:654

> 0x1bf73bf gen_sibcall(rtx_def*, rtx_def*, rtx_def*)

>         /home/eric/svn/gcc/gcc/config/arm/arm.md:8272

> 0x187d3b1 arm32_output_mi_thunk

>         /home/eric/svn/gcc/gcc/config/arm/arm.c:26762

> 0x187d4af arm_output_mi_thunk

>         /home/eric/svn/gcc/gcc/config/arm/arm.c:26783

> 0xcb9c94 cgraph_node::expand_thunk(bool, bool)

>         /home/eric/svn/gcc/gcc/cgraphunit.c:1783

> 

> because the code is wired for a short call.  Moreover, in PIC mode you need to 

> work harder and fix up the minipool too with -mlong-calls.

> 

> Tested on ARM/Linux, OK for mainline, 8 and 7 branches?

> 

> 

> 2018-09-17  Eric Botcazou  <ebotcazou@adacore.com>

> 

> 	* config/arm/arm.c (arm_reorg): Skip Thumb reorg pass for thunks.


this seems to contradict your statement above about having to work
harder to fix up minipools.  why is it correct to skip this entirely?

> 	(arm32_output_mi_thunk): Deal with long calls.

> 

> 

> 2018-09-17  Eric Botcazou  <ebotcazou@adacore.com>

> 

> 	* g++.dg/other/thunk2a.C: New test.

> 	* g++.dg/other/thunk2b.C: Likewise.

> 

> 

> p.diff

> 

> 

> Index: config/arm/arm.c

> ===================================================================

> --- config/arm/arm.c	(revision 264342)

> +++ config/arm/arm.c	(working copy)

> @@ -17647,7 +17647,9 @@ arm_reorg (void)

>  

>    if (use_cmse)

>      cmse_nonsecure_call_clear_caller_saved ();

> -  if (TARGET_THUMB1)

> +  if (cfun->is_thunk)

> +    ;

> +  else if (TARGET_THUMB1)

>      thumb1_reorg ();

>    else if (TARGET_THUMB2)

>      thumb2_reorg ();

> @@ -26721,6 +26723,8 @@ static void

>  arm32_output_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta,

>  		       HOST_WIDE_INT vcall_offset, tree function)

>  {

> +  const bool long_call_p = arm_is_long_call_p (function);

> +

>    /* On ARM, this_regno is R0 or R1 depending on

>       whether the function returns an aggregate or not.

>    */

> @@ -26758,9 +26762,22 @@ arm32_output_mi_thunk (FILE *file, tree,

>        TREE_USED (function) = 1;

>      }

>    rtx funexp = XEXP (DECL_RTL (function), 0);

> +  if (long_call_p)

> +    {

> +      emit_move_insn (temp, funexp);

> +      funexp = temp;

> +    }

>    funexp = gen_rtx_MEM (FUNCTION_MODE, funexp);

> -  rtx_insn * insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, NULL_RTX));

> +  rtx_insn *insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, NULL_RTX));

>    SIBLING_CALL_P (insn) = 1;

> +  emit_barrier ();


Why do we need a barrier here unconditionally (ie in the non-longcall case)?

R.

> +

> +  /* Indirect calls require a bit of fixup in PIC mode.  */

> +  if (long_call_p)

> +    {

> +      split_all_insns_noflow ();

> +      arm_reorg ();

> +    }

>  

>    insn = get_insns ();

>    shorten_branches (insn);

> 

> 

> thunk2a.C

> 

> 

> // { dg-do compile { target arm*-*-* } }

> // { dg-options "-mlong-calls -ffunction-sections }

> 

> class a {

> public:

>   virtual ~a();

> };

> 

> class b : virtual a {};

> 

> class c : b {

>   ~c();

> };

> 

> c::~c() {}

> 

> 

> thunk2b.C

> 

> 

> // { dg-do compile { target arm*-*-* && fpic } }

> // { dg-options "-mlong-calls -ffunction-sections -fPIC }

> 

> class a {

> public:

>   virtual ~a();

> };

> 

> class b : virtual a {};

> 

> class c : b {

>   ~c();

> };

> 

> c::~c() {}

>
Eric Botcazou Sept. 18, 2018, 9 a.m. | #2
> this seems to contradict your statement above about having to work

> harder to fix up minipools.


Why?  Fixing up minipools is done in the generic ARM reorg pass, not in the 
Thumb reorg pass(es).

> Why do we need a barrier here unconditionally (ie in the non-longcall case)?


We don't, but it doesn't harm to put it either.  For example, the x86, PowerPC 
and SPARC ports always do it.

-- 
Eric Botcazou
Richard Earnshaw (lists) Sept. 18, 2018, 10:26 a.m. | #3
On 18/09/18 10:00, Eric Botcazou wrote:
>> this seems to contradict your statement above about having to work

>> harder to fix up minipools.

> 

> Why?  Fixing up minipools is done in the generic ARM reorg pass, not in the 

> Thumb reorg pass(es).

> 


Ah!  But that still doesn't explain why you want to skip these passes
when building thunks.

>> Why do we need a barrier here unconditionally (ie in the non-longcall case)?

> 

> We don't, but it doesn't harm to put it either.  For example, the x86, PowerPC 

> and SPARC ports always do it.

> 


So is the barrier correct, or isn't it?  There's really no two ways
about this.  I don't like arbitrary changes that are justified solely on
'that's what another port does'.

R.
Eric Botcazou Sept. 24, 2018, 9:19 a.m. | #4
> Ah!  But that still doesn't explain why you want to skip these passes

> when building thunks.


They simply don't work because there is no CFG for thunks; I can add a blurb 
about that.

> So is the barrier correct, or isn't it?  There's really no two ways

> about this.  I don't like arbitrary changes that are justified solely on

> 'that's what another port does'.


The barrier is required by the arm_reorg pass, but it is optional when the 
pass is not run.  I think that we can consider that it is also correct.

-- 
Eric Botcazou
Richard Earnshaw (lists) Sept. 25, 2018, 2:27 p.m. | #5
On 24/09/18 10:19, Eric Botcazou wrote:
>> Ah!  But that still doesn't explain why you want to skip these passes

>> when building thunks.

> 

> They simply don't work because there is no CFG for thunks; I can add a blurb 

> about that.


Yes, this needs a comment as it's far from obvious when looking at the code.

> 

>> So is the barrier correct, or isn't it?  There's really no two ways

>> about this.  I don't like arbitrary changes that are justified solely on

>> 'that's what another port does'.

> 

> The barrier is required by the arm_reorg pass, but it is optional when the 

> pass is not run.  I think that we can consider that it is also correct.

> 


Ah, because you're now calling arm_reorg directly and it needs the
barrier to drop the minipool in the right place.

So OK with the additional comment.

R.

Patch

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 264342)
+++ config/arm/arm.c	(working copy)
@@ -17647,7 +17647,9 @@  arm_reorg (void)
 
   if (use_cmse)
     cmse_nonsecure_call_clear_caller_saved ();
-  if (TARGET_THUMB1)
+  if (cfun->is_thunk)
+    ;
+  else if (TARGET_THUMB1)
     thumb1_reorg ();
   else if (TARGET_THUMB2)
     thumb2_reorg ();
@@ -26721,6 +26723,8 @@  static void
 arm32_output_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta,
 		       HOST_WIDE_INT vcall_offset, tree function)
 {
+  const bool long_call_p = arm_is_long_call_p (function);
+
   /* On ARM, this_regno is R0 or R1 depending on
      whether the function returns an aggregate or not.
   */
@@ -26758,9 +26762,22 @@  arm32_output_mi_thunk (FILE *file, tree,
       TREE_USED (function) = 1;
     }
   rtx funexp = XEXP (DECL_RTL (function), 0);
+  if (long_call_p)
+    {
+      emit_move_insn (temp, funexp);
+      funexp = temp;
+    }
   funexp = gen_rtx_MEM (FUNCTION_MODE, funexp);
-  rtx_insn * insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, NULL_RTX));
+  rtx_insn *insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, NULL_RTX));
   SIBLING_CALL_P (insn) = 1;
+  emit_barrier ();
+
+  /* Indirect calls require a bit of fixup in PIC mode.  */
+  if (long_call_p)
+    {
+      split_all_insns_noflow ();
+      arm_reorg ();
+    }
 
   insn = get_insns ();
   shorten_branches (insn);