Find tailcall frames before inline frames

Message ID 20200220155820.22809-1-tromey@adacore.com
State New
Headers show
Series
  • Find tailcall frames before inline frames
Related show

Commit Message

Tom Tromey Feb. 20, 2020, 3:58 p.m.
A customer reported a failure to unwind in a certain core dump.  A
lengthy investigation showed that the problem came from the
interaction between the tailcall and inline frame sniffers.

Normally, the regular DWARF unwinder may discover a chain of tail
calls ending in the current frame.  In this case, it sets a member on
the dwarf2_frame_cache object, so that a subsequent call into the
tailcall sniffer will create the tailcall frames.

However, in this scenario, what happened is that the DWARF unwinder
did find tailcall frames -- but then the PC of the first such frame
was recognized and claimed by the inline frame sniffer.

This then caused unwinding to go astray further up the stack.

This patch fixes the problem by arranging for the tailcall sniffer to
be called before the inline sniffer.  This way, if a DWARF frame has
tailcall information, the tailcalls will always be processed first.
This is safe to do, because the tailcall sniffer can only claim a
frame if the previous frame did in fact find this information.  (So,
for example, if no DWARF frame is ever found, then this sniffer will
never trigger.)

This patch also partially reverts:

    commit 1ec56e88aa9b052ab10b806d82fbdbc8d153d977
    Author: Pedro Alves <palves@redhat.com>
    Date:   Fri Nov 22 13:17:46 2013 +0000

	Eliminate dwarf2_frame_cache recursion, don't unwind from the dwarf2 sniffer (move dwarf2_tailcall_sniffer_first elsewhere).

That patch moved the call to dwarf2_tailcall_sniffer_first out of
dwarf2_frame_cache, and into dwarf2_frame_prev_register.  However, in
this situation, this is too late -- by the time
dwarf2_frame_prev_register is called, the frame in question is already
recognized by the inline frame sniffer.

Rather than fully revert that patch, though, this just arranges to
call dwarf2_tailcall_sniffer_first from dwarf2_frame_cache -- which is
called shortly after the DWARF frame sniffer succeeds, via
compute_frame_id.

I don't know how to write a test case for this.

gdb/ChangeLog
2020-02-20  Tom Tromey  <tromey@adacore.com>

	* dwarf2/frame.c (struct dwarf2_frame_cache)
	<checked_tailcall_bottom, entry_cfa_sp_offset,
	entry_cfa_sp_offset_p>: Remove members.
	(dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.
	(dwarf2_frame_prev_register): Don't call
	dwarf2_tailcall_sniffer_first.
	(dwarf2_append_unwinders): Don't append tailcall unwinder.
	* frame-unwind.c (add_unwinder): New fuction.
	(frame_unwind_init): Use it.  Add tailcall unwinder.
---
 gdb/ChangeLog      | 12 ++++++++++++
 gdb/dwarf2/frame.c | 34 ++++++++--------------------------
 gdb/frame-unwind.c | 33 +++++++++++++++++++++++++++------
 3 files changed, 47 insertions(+), 32 deletions(-)

-- 
2.21.1

Comments

Tom Tromey March 3, 2020, 9:45 p.m. | #1
Tom> gdb/ChangeLog
Tom> 2020-02-20  Tom Tromey  <tromey@adacore.com>

Tom> 	* dwarf2/frame.c (struct dwarf2_frame_cache)
Tom> 	<checked_tailcall_bottom, entry_cfa_sp_offset,
Tom>    entry_cfa_sp_offset_p> : Remove members.
Tom> 	(dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.
Tom> 	(dwarf2_frame_prev_register): Don't call
Tom> 	dwarf2_tailcall_sniffer_first.
Tom> 	(dwarf2_append_unwinders): Don't append tailcall unwinder.
Tom> 	* frame-unwind.c (add_unwinder): New fuction.
Tom> 	(frame_unwind_init): Use it.  Add tailcall unwinder.

I'm going to check this in now.

Tom
Luis Machado March 5, 2020, 10:21 a.m. | #2
Hi Tom,

On 3/3/20 6:45 PM, Tom Tromey wrote:
> Tom> gdb/ChangeLog

> Tom> 2020-02-20  Tom Tromey  <tromey@adacore.com>

> 

> Tom> 	* dwarf2/frame.c (struct dwarf2_frame_cache)

> Tom> 	<checked_tailcall_bottom, entry_cfa_sp_offset,

> Tom>    entry_cfa_sp_offset_p> : Remove members.

> Tom> 	(dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.

> Tom> 	(dwarf2_frame_prev_register): Don't call

> Tom> 	dwarf2_tailcall_sniffer_first.

> Tom> 	(dwarf2_append_unwinders): Don't append tailcall unwinder.

> Tom> 	* frame-unwind.c (add_unwinder): New fuction.

> Tom> 	(frame_unwind_init): Use it.  Add tailcall unwinder.

> 

> I'm going to check this in now.

> 

> Tom

> 


This has caused quite a few failures in the following tests for 
aarch64-linux:

gdb.opt/inline-break.exp
gdb.opt/inline-cmds.exp
gdb.python/py-frame-inline.exp
gdb.reverse/insn-reverse.exp

I see the following:

info frame^M
../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id 
get_frame_id(frame_info*): Assertion `fi->level == 0' failed.^M
A problem internal to GDB has been detected,^M
further debugging may prove unreliable.^M
Quit this debugging session? (y or n) FAIL: 
gdb.python/py-frame-inline.exp: info frame (GDB internal error)
Resyncing due to internal error.
n^M
^M
This is a bug, please report it.  For instructions, see:^M
<http://www.gnu.org/software/gdb/bugs/>.^M


(gdb) up^M
../../../repos/binutils-gdb/gdb/inline-frame.c:172: internal-error: void 
inline_frame_this_id(frame_info*, void**, frame_id*): Assertion 
`frame_id_p (*this_id)' failed.^M
A problem internal to GDB has been detected,^M
further debugging may prove unreliable.^M
Quit this debugging session? (y or n) FAIL: 
gdb.python/py-frame-inline.exp: up (GDB internal error)
Resyncing due to internal error.
n^M
^M
This is a bug, please report it.  For instructions, see:^M
<http://www.gnu.org/software/gdb/bugs/>.^M


I can help get more information on it if you need.
Tom Tromey March 5, 2020, 4:56 p.m. | #3
Luis> This has caused quite a few failures in the following tests for
Luis> aarch64-linux:

Sorry about that.  I will take a look.

Tom
Luis Machado March 9, 2020, 5:55 p.m. | #4
On 3/5/20 7:21 AM, Luis Machado wrote:
> Hi Tom,

> 

> On 3/3/20 6:45 PM, Tom Tromey wrote:

>> Tom> gdb/ChangeLog

>> Tom> 2020-02-20  Tom Tromey  <tromey@adacore.com>

>>

>> Tom>     * dwarf2/frame.c (struct dwarf2_frame_cache)

>> Tom>     <checked_tailcall_bottom, entry_cfa_sp_offset,

>> Tom>    entry_cfa_sp_offset_p> : Remove members.

>> Tom>     (dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.

>> Tom>     (dwarf2_frame_prev_register): Don't call

>> Tom>     dwarf2_tailcall_sniffer_first.

>> Tom>     (dwarf2_append_unwinders): Don't append tailcall unwinder.

>> Tom>     * frame-unwind.c (add_unwinder): New fuction.

>> Tom>     (frame_unwind_init): Use it.  Add tailcall unwinder.

>>

>> I'm going to check this in now.

>>

>> Tom

>>

> 

> This has caused quite a few failures in the following tests for 

> aarch64-linux:

> 

> gdb.opt/inline-break.exp

> gdb.opt/inline-cmds.exp

> gdb.python/py-frame-inline.exp

> gdb.reverse/insn-reverse.exp

> 

> I see the following:

> 

> info frame^M

> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id 

> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.^M

> A problem internal to GDB has been detected,^M

> further debugging may prove unreliable.^M

> Quit this debugging session? (y or n) FAIL: 

> gdb.python/py-frame-inline.exp: info frame (GDB internal error)

> Resyncing due to internal error.

> n^M

> ^M

> This is a bug, please report it.  For instructions, see:^M

> <http://www.gnu.org/software/gdb/bugs/>.^M

> 

> 

> (gdb) up^M

> ../../../repos/binutils-gdb/gdb/inline-frame.c:172: internal-error: void 

> inline_frame_this_id(frame_info*, void**, frame_id*): Assertion 

> `frame_id_p (*this_id)' failed.^M

> A problem internal to GDB has been detected,^M

> further debugging may prove unreliable.^M

> Quit this debugging session? (y or n) FAIL: 

> gdb.python/py-frame-inline.exp: up (GDB internal error)

> Resyncing due to internal error.

> n^M

> ^M

> This is a bug, please report it.  For instructions, see:^M

> <http://www.gnu.org/software/gdb/bugs/>.^M

> 

> 

> I can help get more information on it if you need.


Reported as https://sourceware.org/bugzilla/show_bug.cgi?id=25649 so we 
can track it.
Tom Tromey March 12, 2020, 9:34 p.m. | #5
>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:


Luis> This has caused quite a few failures in the following tests for
Luis> aarch64-linux:

I still haven't really tried to reproduce this yet.
I'll try tomorrow, I hope.

Luis> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id
Luis> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

Meanwhile I wonder if this is the same as

https://sourceware.org/pipermail/gdb-patches/2020-February/165511.html

Tom
Carl Love via Gdb-patches March 13, 2020, 1:31 p.m. | #6
On 3/12/20 6:34 PM, Tom Tromey wrote:
>>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:

> 

> Luis> This has caused quite a few failures in the following tests for

> Luis> aarch64-linux:

> 

> I still haven't really tried to reproduce this yet.

> I'll try tomorrow, I hope.

> 

> Luis> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: frame_id

> Luis> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

> 

> Meanwhile I wonder if this is the same as

> 

> https://sourceware.org/pipermail/gdb-patches/2020-February/165511.html

> 

> Tom

> 


The mention of fi->level looks the same, but i haven't looked into it yet.

I was planning to pinpoint the failure point in order to make this 
easier to solve.
Carl Love via Gdb-patches March 24, 2020, 9:24 p.m. | #7
Hi Tom,

On 3/13/20 10:31 AM, Luis Machado wrote:
> On 3/12/20 6:34 PM, Tom Tromey wrote:

>>>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:

>>

>> Luis> This has caused quite a few failures in the following tests for

>> Luis> aarch64-linux:

>>

>> I still haven't really tried to reproduce this yet.

>> I'll try tomorrow, I hope.

>>

>> Luis> ../../../repos/binutils-gdb/gdb/frame.c:579: internal-error: 

>> frame_id

>> Luis> get_frame_id(frame_info*): Assertion `fi->level == 0' failed.

>>

>> Meanwhile I wonder if this is the same as

>>

>> https://sourceware.org/pipermail/gdb-patches/2020-February/165511.html

>>

>> Tom

>>

> 

> The mention of fi->level looks the same, but i haven't looked into it yet.

> 

> I was planning to pinpoint the failure point in order to make this 

> easier to solve.


Having spent a few days trying to understand this problem, it seems all 
of these fi->level assertions (including 
https://sourceware.org/bugzilla/show_bug.cgi?id=22748) are related to 
attempting to unwind from places not safe to do so. That is, we're 
trying to unwind some content (registers for example) before a given 
frame is assigned a frame id.

For some cases we can see compute_frame_id being invoked in recursion 
for the same frame, which would lead to an infinite recursion. So the 
assertion catches this.

In my particular case, the call to dwarf2_tailcall_sniffer_first inside 
dwarf2_frame_cache leads to such infinite recursion, since no frame id 
has been assigned to the frame being used yet. It will be assigned by 
the time compute_frame_id is done.

I think dwarf2_tailcall_sniffer_first would have to be called from 
somewhere else, or conditions put in place. But I'm afraid adding more 
conditions would complicate things further. And this code is already 
reasonably complicated.

Since this is causing a number of inlining test failures for aarch64 
and, from what i saw, some other architectures like s390, should we 
consider reverting this while we discuss/review a reworked version of 
the patch?
Tom Tromey March 26, 2020, 1:59 a.m. | #8
>>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:


Luis> Having spent a few days trying to understand this problem, it seems
Luis> all of these fi->level assertions (including 
Luis> https://sourceware.org/bugzilla/show_bug.cgi?id=22748) are related to
Luis> attempting to unwind from places not safe to do so. That is, we're 
Luis> trying to unwind some content (registers for example) before a given
Luis> frame is assigned a frame id.

Yes, I agree.

Luis> I think dwarf2_tailcall_sniffer_first would have to be called from
Luis> somewhere else, or conditions put in place. But I'm afraid adding more 
Luis> conditions would complicate things further. And this code is already
Luis> reasonably complicated.

Luis> Since this is causing a number of inlining test failures for aarch64
Luis> and, from what i saw, some other architectures like s390, should we 
Luis> consider reverting this while we discuss/review a reworked version of
Luis> the patch?

I think that would be fine.  I haven't found the time to really dig into it.

I suspect that maybe the architectures doing this aren't playing by the rules.
Even so, though, it doesn't change that this used to work and now doesn't.

Tom
Carl Love via Gdb-patches March 26, 2020, 2:47 a.m. | #9
On Wed, Mar 25, 2020, 22:59 Tom Tromey <tom@tromey.com> wrote:

> >>>>> "Luis" == Luis Machado <luis.machado@linaro.org> writes:

>

> Luis> Having spent a few days trying to understand this problem, it seems

> Luis> all of these fi->level assertions (including

> Luis> https://sourceware.org/bugzilla/show_bug.cgi?id=22748) are related

> to

> Luis> attempting to unwind from places not safe to do so. That is, we're

> Luis> trying to unwind some content (registers for example) before a given

> Luis> frame is assigned a frame id.

>

> Yes, I agree.

>

> Luis> I think dwarf2_tailcall_sniffer_first would have to be called from

> Luis> somewhere else, or conditions put in place. But I'm afraid adding

> more

> Luis> conditions would complicate things further. And this code is already

> Luis> reasonably complicated.

>

> Luis> Since this is causing a number of inlining test failures for aarch64

> Luis> and, from what i saw, some other architectures like s390, should we

> Luis> consider reverting this while we discuss/review a reworked version of

> Luis> the patch?

>

> I think that would be fine.  I haven't found the time to really dig into

> it.

>

> I suspect that maybe the architectures doing this aren't playing by the

> rules.

> Even so, though, it doesn't change that this used to work and now doesn't.

>


It could be. I noticed aarch64 doesn't implement gdbarch_unwind_pc. But
s390 does.

It is hard to tell what is wrong given different unwinding implementations
may give correct results, even with wrong assumptions.


> Tom

>
Andrew Burgess June 18, 2020, 6:25 p.m. | #10
* Tom Tromey <tromey@adacore.com> [2020-02-20 08:58:20 -0700]:

> A customer reported a failure to unwind in a certain core dump.  A

> lengthy investigation showed that the problem came from the

> interaction between the tailcall and inline frame sniffers.

> 

> Normally, the regular DWARF unwinder may discover a chain of tail

> calls ending in the current frame.  In this case, it sets a member on

> the dwarf2_frame_cache object, so that a subsequent call into the

> tailcall sniffer will create the tailcall frames.

> 

> However, in this scenario, what happened is that the DWARF unwinder

> did find tailcall frames -- but then the PC of the first such frame

> was recognized and claimed by the inline frame sniffer.


I'm trying to understand the setup you have here in the hope I might
be able to craft a test case for this - given that I'm not convinced
the new placement of the tail call sniffer is safe.

Was the setup something like:

                    ,-- f3 tail calls to f4.
                    |
                    |
  f1 --> f2 --> f3 --> f4 --> f5 --> f6

  |_______________|
  All inlined in f1

Was there anything else special about this case?  I feel like there
must have been, but I don't really understand the problem description.

Thanks,
Andrew

> 

> This then caused unwinding to go astray further up the stack.

> 

> This patch fixes the problem by arranging for the tailcall sniffer to

> be called before the inline sniffer.  This way, if a DWARF frame has

> tailcall information, the tailcalls will always be processed first.

> This is safe to do, because the tailcall sniffer can only claim a

> frame if the previous frame did in fact find this information.  (So,

> for example, if no DWARF frame is ever found, then this sniffer will

> never trigger.)

> 

> This patch also partially reverts:

> 

>     commit 1ec56e88aa9b052ab10b806d82fbdbc8d153d977

>     Author: Pedro Alves <palves@redhat.com>

>     Date:   Fri Nov 22 13:17:46 2013 +0000

> 

> 	Eliminate dwarf2_frame_cache recursion, don't unwind from the dwarf2 sniffer (move dwarf2_tailcall_sniffer_first elsewhere).

> 

> That patch moved the call to dwarf2_tailcall_sniffer_first out of

> dwarf2_frame_cache, and into dwarf2_frame_prev_register.  However, in

> this situation, this is too late -- by the time

> dwarf2_frame_prev_register is called, the frame in question is already

> recognized by the inline frame sniffer.

> 

> Rather than fully revert that patch, though, this just arranges to

> call dwarf2_tailcall_sniffer_first from dwarf2_frame_cache -- which is

> called shortly after the DWARF frame sniffer succeeds, via

> compute_frame_id.

> 

> I don't know how to write a test case for this.

> 

> gdb/ChangeLog

> 2020-02-20  Tom Tromey  <tromey@adacore.com>

> 

> 	* dwarf2/frame.c (struct dwarf2_frame_cache)

> 	<checked_tailcall_bottom, entry_cfa_sp_offset,

> 	entry_cfa_sp_offset_p>: Remove members.

> 	(dwarf2_frame_cache): Call dwarf2_tailcall_sniffer_first.

> 	(dwarf2_frame_prev_register): Don't call

> 	dwarf2_tailcall_sniffer_first.

> 	(dwarf2_append_unwinders): Don't append tailcall unwinder.

> 	* frame-unwind.c (add_unwinder): New fuction.

> 	(frame_unwind_init): Use it.  Add tailcall unwinder.

> ---

>  gdb/ChangeLog      | 12 ++++++++++++

>  gdb/dwarf2/frame.c | 34 ++++++++--------------------------

>  gdb/frame-unwind.c | 33 +++++++++++++++++++++++++++------

>  3 files changed, 47 insertions(+), 32 deletions(-)

> 

> diff --git a/gdb/dwarf2/frame.c b/gdb/dwarf2/frame.c

> index b240a25e2d8..74488f9a8aa 100644

> --- a/gdb/dwarf2/frame.c

> +++ b/gdb/dwarf2/frame.c

> @@ -959,22 +959,12 @@ struct dwarf2_frame_cache

>    /* The .text offset.  */

>    CORE_ADDR text_offset;

>  

> -  /* True if we already checked whether this frame is the bottom frame

> -     of a virtual tail call frame chain.  */

> -  int checked_tailcall_bottom;

> -

>    /* If not NULL then this frame is the bottom frame of a TAILCALL_FRAME

>       sequence.  If NULL then it is a normal case with no TAILCALL_FRAME

>       involved.  Non-bottom frames of a virtual tail call frames chain use

>       dwarf2_tailcall_frame_unwind unwinder so this field does not apply for

>       them.  */

>    void *tailcall_cache;

> -

> -  /* The number of bytes to subtract from TAILCALL_FRAME frames frame

> -     base to get the SP, to simulate the return address pushed on the

> -     stack.  */

> -  LONGEST entry_cfa_sp_offset;

> -  int entry_cfa_sp_offset_p;

>  };

>  

>  static struct dwarf2_frame_cache *

> @@ -1037,6 +1027,8 @@ dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)

>       in an address that's within the range of FDE locations.  This

>       is due to the possibility of the function occupying non-contiguous

>       ranges.  */

> +  LONGEST entry_cfa_sp_offset;

> +  int entry_cfa_sp_offset_p = 0;

>    if (get_frame_func_if_available (this_frame, &entry_pc)

>        && fde->initial_location <= entry_pc

>        && entry_pc < fde->initial_location + fde->address_range)

> @@ -1049,8 +1041,8 @@ dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)

>  	  && (dwarf_reg_to_regnum (gdbarch, fs.regs.cfa_reg)

>  	      == gdbarch_sp_regnum (gdbarch)))

>  	{

> -	  cache->entry_cfa_sp_offset = fs.regs.cfa_offset;

> -	  cache->entry_cfa_sp_offset_p = 1;

> +	  entry_cfa_sp_offset = fs.regs.cfa_offset;

> +	  entry_cfa_sp_offset_p = 1;

>  	}

>      }

>    else

> @@ -1195,6 +1187,10 @@ incomplete CFI data; unspecified registers (e.g., %s) at %s"),

>        && fs.regs.reg[fs.retaddr_column].how == DWARF2_FRAME_REG_UNDEFINED)

>      cache->undefined_retaddr = 1;

>  

> +  dwarf2_tailcall_sniffer_first (this_frame, &cache->tailcall_cache,

> +				 (entry_cfa_sp_offset_p

> +				  ? &entry_cfa_sp_offset : NULL));

> +

>    return cache;

>  }

>  

> @@ -1239,16 +1235,6 @@ dwarf2_frame_prev_register (struct frame_info *this_frame, void **this_cache,

>    CORE_ADDR addr;

>    int realnum;

>  

> -  /* Check whether THIS_FRAME is the bottom frame of a virtual tail

> -     call frame chain.  */

> -  if (!cache->checked_tailcall_bottom)

> -    {

> -      cache->checked_tailcall_bottom = 1;

> -      dwarf2_tailcall_sniffer_first (this_frame, &cache->tailcall_cache,

> -				     (cache->entry_cfa_sp_offset_p

> -				      ? &cache->entry_cfa_sp_offset : NULL));

> -    }

> -

>    /* Non-bottom frames of a virtual tail call frames chain use

>       dwarf2_tailcall_frame_unwind unwinder so this code does not apply for

>       them.  If dwarf2_tailcall_prev_register_first does not have specific value

> @@ -1410,10 +1396,6 @@ static const struct frame_unwind dwarf2_signal_frame_unwind =

>  void

>  dwarf2_append_unwinders (struct gdbarch *gdbarch)

>  {

> -  /* TAILCALL_FRAME must be first to find the record by

> -     dwarf2_tailcall_sniffer_first.  */

> -  frame_unwind_append_unwinder (gdbarch, &dwarf2_tailcall_frame_unwind);

> -

>    frame_unwind_append_unwinder (gdbarch, &dwarf2_frame_unwind);

>    frame_unwind_append_unwinder (gdbarch, &dwarf2_signal_frame_unwind);

>  }

> diff --git a/gdb/frame-unwind.c b/gdb/frame-unwind.c

> index 35f2e82c57d..3334c472d02 100644

> --- a/gdb/frame-unwind.c

> +++ b/gdb/frame-unwind.c

> @@ -27,6 +27,7 @@

>  #include "gdb_obstack.h"

>  #include "target.h"

>  #include "gdbarch.h"

> +#include "dwarf2/frame-tailcall.h"

>  

>  static struct gdbarch_data *frame_unwind_data;

>  

> @@ -43,6 +44,18 @@ struct frame_unwind_table

>    struct frame_unwind_table_entry **osabi_head;

>  };

>  

> +/* A helper function to add an unwinder to a list.  LINK says where to

> +   install the new unwinder.  The new link is returned.  */

> +

> +static struct frame_unwind_table_entry **

> +add_unwinder (struct obstack *obstack, const struct frame_unwind *unwinder,

> +	      struct frame_unwind_table_entry **link)

> +{

> +  *link = OBSTACK_ZALLOC (obstack, struct frame_unwind_table_entry);

> +  (*link)->unwinder = unwinder;

> +  return &(*link)->next;

> +}

> +

>  static void *

>  frame_unwind_init (struct obstack *obstack)

>  {

> @@ -51,13 +64,21 @@ frame_unwind_init (struct obstack *obstack)

>  

>    /* Start the table out with a few default sniffers.  OSABI code

>       can't override this.  */

> -  table->list = OBSTACK_ZALLOC (obstack, struct frame_unwind_table_entry);

> -  table->list->unwinder = &dummy_frame_unwind;

> -  table->list->next = OBSTACK_ZALLOC (obstack,

> -				      struct frame_unwind_table_entry);

> -  table->list->next->unwinder = &inline_frame_unwind;

> +  struct frame_unwind_table_entry **link = &table->list;

> +

> +  link = add_unwinder (obstack, &dummy_frame_unwind, link);

> +  /* The DWARF tailcall sniffer must come before the inline sniffer.

> +     Otherwise, we can end up in a situation where a DWARF frame finds

> +     tailcall information, but then the inline sniffer claims a frame

> +     before the tailcall sniffer, resulting in confusion.  This is

> +     safe to do always because the tailcall sniffer can only ever be

> +     activated if the newer frame was created using the DWARF

> +     unwinder, and it also found tailcall information.  */

> +  link = add_unwinder (obstack, &dwarf2_tailcall_frame_unwind, link);

> +  link = add_unwinder (obstack, &inline_frame_unwind, link);

> +

>    /* The insertion point for OSABI sniffers.  */

> -  table->osabi_head = &table->list->next->next;

> +  table->osabi_head = link;

>    return table;

>  }

>  

> -- 

> 2.21.1

>
Tom Tromey June 18, 2020, 9:07 p.m. | #11
>>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:


>> However, in this scenario, what happened is that the DWARF unwinder

>> did find tailcall frames -- but then the PC of the first such frame

>> was recognized and claimed by the inline frame sniffer.


Andrew> I'm trying to understand the setup you have here in the hope I might
Andrew> be able to craft a test case for this - given that I'm not convinced
Andrew> the new placement of the tail call sniffer is safe.

Andrew> Was the setup something like:

Andrew>                     ,-- f3 tail calls to f4.
Andrew>                     |
Andrew>                     |
Andrew>   f1 --> f2 --> f3 --> f4 --> f5 --> f6

Andrew>   |_______________|
Andrew>   All inlined in f1

Andrew> Was there anything else special about this case?  I feel like there
Andrew> must have been, but I don't really understand the problem description.

Sorry about that.  While I do still have the test executable and core
file (and so I can easily try any patches), I can't send them out.  I'm
happy to test patches.

I only vaguely remember the details, and I can't really re-debug the
problem in the near term (I'm on PTO next week...).  However, I do also
have some notes, though not exhaustive ones.

Looking at the backtrace now, it does seem be like the above.
Here's some heavily trimmed down output:

    (gdb) frame apply 7 info frame
    [...]
    Stack level 0, frame at 0x7ffffffb07c0:
     called by frame at 0x7ffffffb07c0
    Stack level 1, frame at 0x7ffffffb07c0:
     tail call frame, caller of frame at 0x7ffffffb07c0
    Stack level 2, frame at 0x7ffffffb07c0:
     tail call frame, caller of frame at 0x7ffffffb07c0
    Stack level 3, frame at 0x7ffffffb0ab0:
     called by frame at 0x7ffffffb8c80, caller of frame at 0x7ffffffb07c0
    Stack level 4, frame at 0x7ffffffb8c80:
     inlined into frame 5, caller of frame at 0x7ffffffb0ab0
    Stack level 5, frame at 0x7ffffffb8c80:
     called by frame at 0x7ffffffb8cc0, caller of frame at 0x7ffffffb8c80
    Stack level 6, frame at 0x7ffffffb8cc0:
     called by frame at 0x7ffffffba490, caller of frame at 0x7ffffffb8c80

All I recall is what I wrote: dwarf2_tailcall_sniffer_first would note
that a frame came from tailcalls -- so tailcall_frame_sniffer would be
expected to notice this.  However, the inline sniffer would grab it
first, causing confusion.

One thing that would have helped here was the ability to turn off inline
and/or tail-call unwinding.  Perhaps we should add settings for this.

I wonder if I erred in my analysis and there's a combination of
tail-call and inlining happening.  That certainly seems like a
possibility and would explain the confusion ... whereas the "info frame"
stuff above seems innocuous to me now.

Tom

Patch

diff --git a/gdb/dwarf2/frame.c b/gdb/dwarf2/frame.c
index b240a25e2d8..74488f9a8aa 100644
--- a/gdb/dwarf2/frame.c
+++ b/gdb/dwarf2/frame.c
@@ -959,22 +959,12 @@  struct dwarf2_frame_cache
   /* The .text offset.  */
   CORE_ADDR text_offset;
 
-  /* True if we already checked whether this frame is the bottom frame
-     of a virtual tail call frame chain.  */
-  int checked_tailcall_bottom;
-
   /* If not NULL then this frame is the bottom frame of a TAILCALL_FRAME
      sequence.  If NULL then it is a normal case with no TAILCALL_FRAME
      involved.  Non-bottom frames of a virtual tail call frames chain use
      dwarf2_tailcall_frame_unwind unwinder so this field does not apply for
      them.  */
   void *tailcall_cache;
-
-  /* The number of bytes to subtract from TAILCALL_FRAME frames frame
-     base to get the SP, to simulate the return address pushed on the
-     stack.  */
-  LONGEST entry_cfa_sp_offset;
-  int entry_cfa_sp_offset_p;
 };
 
 static struct dwarf2_frame_cache *
@@ -1037,6 +1027,8 @@  dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
      in an address that's within the range of FDE locations.  This
      is due to the possibility of the function occupying non-contiguous
      ranges.  */
+  LONGEST entry_cfa_sp_offset;
+  int entry_cfa_sp_offset_p = 0;
   if (get_frame_func_if_available (this_frame, &entry_pc)
       && fde->initial_location <= entry_pc
       && entry_pc < fde->initial_location + fde->address_range)
@@ -1049,8 +1041,8 @@  dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
 	  && (dwarf_reg_to_regnum (gdbarch, fs.regs.cfa_reg)
 	      == gdbarch_sp_regnum (gdbarch)))
 	{
-	  cache->entry_cfa_sp_offset = fs.regs.cfa_offset;
-	  cache->entry_cfa_sp_offset_p = 1;
+	  entry_cfa_sp_offset = fs.regs.cfa_offset;
+	  entry_cfa_sp_offset_p = 1;
 	}
     }
   else
@@ -1195,6 +1187,10 @@  incomplete CFI data; unspecified registers (e.g., %s) at %s"),
       && fs.regs.reg[fs.retaddr_column].how == DWARF2_FRAME_REG_UNDEFINED)
     cache->undefined_retaddr = 1;
 
+  dwarf2_tailcall_sniffer_first (this_frame, &cache->tailcall_cache,
+				 (entry_cfa_sp_offset_p
+				  ? &entry_cfa_sp_offset : NULL));
+
   return cache;
 }
 
@@ -1239,16 +1235,6 @@  dwarf2_frame_prev_register (struct frame_info *this_frame, void **this_cache,
   CORE_ADDR addr;
   int realnum;
 
-  /* Check whether THIS_FRAME is the bottom frame of a virtual tail
-     call frame chain.  */
-  if (!cache->checked_tailcall_bottom)
-    {
-      cache->checked_tailcall_bottom = 1;
-      dwarf2_tailcall_sniffer_first (this_frame, &cache->tailcall_cache,
-				     (cache->entry_cfa_sp_offset_p
-				      ? &cache->entry_cfa_sp_offset : NULL));
-    }
-
   /* Non-bottom frames of a virtual tail call frames chain use
      dwarf2_tailcall_frame_unwind unwinder so this code does not apply for
      them.  If dwarf2_tailcall_prev_register_first does not have specific value
@@ -1410,10 +1396,6 @@  static const struct frame_unwind dwarf2_signal_frame_unwind =
 void
 dwarf2_append_unwinders (struct gdbarch *gdbarch)
 {
-  /* TAILCALL_FRAME must be first to find the record by
-     dwarf2_tailcall_sniffer_first.  */
-  frame_unwind_append_unwinder (gdbarch, &dwarf2_tailcall_frame_unwind);
-
   frame_unwind_append_unwinder (gdbarch, &dwarf2_frame_unwind);
   frame_unwind_append_unwinder (gdbarch, &dwarf2_signal_frame_unwind);
 }
diff --git a/gdb/frame-unwind.c b/gdb/frame-unwind.c
index 35f2e82c57d..3334c472d02 100644
--- a/gdb/frame-unwind.c
+++ b/gdb/frame-unwind.c
@@ -27,6 +27,7 @@ 
 #include "gdb_obstack.h"
 #include "target.h"
 #include "gdbarch.h"
+#include "dwarf2/frame-tailcall.h"
 
 static struct gdbarch_data *frame_unwind_data;
 
@@ -43,6 +44,18 @@  struct frame_unwind_table
   struct frame_unwind_table_entry **osabi_head;
 };
 
+/* A helper function to add an unwinder to a list.  LINK says where to
+   install the new unwinder.  The new link is returned.  */
+
+static struct frame_unwind_table_entry **
+add_unwinder (struct obstack *obstack, const struct frame_unwind *unwinder,
+	      struct frame_unwind_table_entry **link)
+{
+  *link = OBSTACK_ZALLOC (obstack, struct frame_unwind_table_entry);
+  (*link)->unwinder = unwinder;
+  return &(*link)->next;
+}
+
 static void *
 frame_unwind_init (struct obstack *obstack)
 {
@@ -51,13 +64,21 @@  frame_unwind_init (struct obstack *obstack)
 
   /* Start the table out with a few default sniffers.  OSABI code
      can't override this.  */
-  table->list = OBSTACK_ZALLOC (obstack, struct frame_unwind_table_entry);
-  table->list->unwinder = &dummy_frame_unwind;
-  table->list->next = OBSTACK_ZALLOC (obstack,
-				      struct frame_unwind_table_entry);
-  table->list->next->unwinder = &inline_frame_unwind;
+  struct frame_unwind_table_entry **link = &table->list;
+
+  link = add_unwinder (obstack, &dummy_frame_unwind, link);
+  /* The DWARF tailcall sniffer must come before the inline sniffer.
+     Otherwise, we can end up in a situation where a DWARF frame finds
+     tailcall information, but then the inline sniffer claims a frame
+     before the tailcall sniffer, resulting in confusion.  This is
+     safe to do always because the tailcall sniffer can only ever be
+     activated if the newer frame was created using the DWARF
+     unwinder, and it also found tailcall information.  */
+  link = add_unwinder (obstack, &dwarf2_tailcall_frame_unwind, link);
+  link = add_unwinder (obstack, &inline_frame_unwind, link);
+
   /* The insertion point for OSABI sniffers.  */
-  table->osabi_head = &table->list->next->next;
+  table->osabi_head = link;
   return table;
 }