[PATCHv3,0/2] Fix for an assertion when unwinding with inline frames

Message ID cover.1626772086.git.andrew.burgess@embecosm.com
Headers show
Series
  • Fix for an assertion when unwinding with inline frames
Related show

Message

Andrew Burgess July 20, 2021, 9:10 a.m.
Thanks for the feedback on v2.

In v3 I have:

 - Addressed all of Pedro's feedback on the test in patch #1.
 
 - Rewritten how the problem in patch #1 is fixed based on Simon's
   exception based approach.

Thanks,
Andrew

---

Andrew Burgess (2):
  gdb: prevent an assertion when computing the frame_id for an inline
    frame
  gdb: remove VALUE_FRAME_ID

 gdb/frame.c                                   |  57 ++++---
 gdb/frame.h                                   |   4 -
 gdb/inline-frame.c                            |  31 +++-
 .../gdb.base/inline-frame-cycle-unwind.c      |  58 +++++++
 .../gdb.base/inline-frame-cycle-unwind.exp    | 145 ++++++++++++++++++
 .../gdb.base/inline-frame-cycle-unwind.py     |  85 ++++++++++
 gdb/valops.c                                  |  17 +-
 gdb/value.c                                   |   5 +-
 gdb/value.h                                   |   6 -
 gdbsupport/common-exceptions.h                |   5 +
 10 files changed, 373 insertions(+), 40 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c
 create mode 100644 gdb/testsuite/gdb.base/inline-frame-cycle-unwind.exp
 create mode 100644 gdb/testsuite/gdb.base/inline-frame-cycle-unwind.py

-- 
2.25.4

Comments

Tom Tromey via Gdb-patches July 20, 2021, 9:59 p.m. | #1
On 2021-07-20 5:10 a.m., Andrew Burgess wrote:
> Thanks for the feedback on v2.

> 

> In v3 I have:

> 

>  - Addressed all of Pedro's feedback on the test in patch #1.

>  

>  - Rewritten how the problem in patch #1 is fixed based on Simon's

>    exception based approach.

> 

> Thanks,

> Andrew


When speaking to Pedro off-line, it did sound like he had concerns with
what I proposed, so let's wait to hear what he has to say.

Simon
Andrew Burgess July 26, 2021, 11:11 a.m. | #2
* Simon Marchi <simon.marchi@polymtl.ca> [2021-07-20 17:59:45 -0400]:

> On 2021-07-20 5:10 a.m., Andrew Burgess wrote:

> > Thanks for the feedback on v2.

> > 

> > In v3 I have:

> > 

> >  - Addressed all of Pedro's feedback on the test in patch #1.

> >  

> >  - Rewritten how the problem in patch #1 is fixed based on Simon's

> >    exception based approach.

> > 

> > Thanks,

> > Andrew

> 

> When speaking to Pedro off-line, it did sound like he had concerns with

> what I proposed, so let's wait to hear what he has to say.


That's fine.  Hopefully Pedro will be able to offer some feedback
soon.

Before then, I've spun off patch #2 and an additional fix related to
'set debug frame on' into a separate patch (below).  I think all of
this code is unrelated to the whole should we use an exception, or
change the API of get_prev_frame.

What are your thoughts on this patch?

Additionally, this one might be a possible candidate for merging into
gdb-11-branch.  Here's a ChangeLog if we decide that's a good idea.

gdb/ChangeLog:
	* frame.c (get_prev_frame_always_1): Handle case where
	this_frame->prev is nullptr.
	(get_prev_frame_id_by_id): Delete.
	* frame.h (get_prev_frame_id_by_id): Delete declaration.
	* valops.c (value_assign): Remove use of VALUE_FRAME_ID.
	* vaule.c (value_fetch_lazy_register): Likewise.
	* value.h (VALUE_FRAME_ID): Delete.

gdb/testsuite/ChangeLog:
	* gdb.base/premature-dummy-frame-removal.exp: Repeat test with
	'set debug frame on' in effect.

Thanks,
Andrew

---

commit 4365cdedccda9eaacd84af84a89c981d28ac1ef7
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Wed May 26 15:50:05 2021 +0100

    gdb: remove VALUE_FRAME_ID and fix another frame debug issue
    
    This commit was originally part of this patch series:
    
      (v1): https://sourceware.org/pipermail/gdb-patches/2021-May/179357.html
      (v2): https://sourceware.org/pipermail/gdb-patches/2021-June/180208.html
      (v3): https://sourceware.org/pipermail/gdb-patches/2021-July/181028.html
    
    However, that series is being held up in review, so I wanted to break
    out some of the non-related fixes in order to get these merged.
    
    This commit addresses two semi-related issues, both of which are
    problems exposed by using 'set debug frame on'.
    
    The first issue is in frame.c in get_prev_frame_always_1, and was
    introduced by this commit:
    
      commit a05a883fbaba69d0f80806e46a9457727fcbe74c
      Date:   Tue Jun 29 12:03:50 2021 -0400
    
          gdb: introduce frame_debug_printf
    
    This commit replaced fprint_frame with frame_info::to_string.
    However, the former could handle taking a nullptr while the later, a
    member function, obviously requires a non-nullptr in order to make the
    function call.  In one place we are not-guaranteed to have a
    non-nullptr, and so, there is the possibility of triggering undefined
    behaviour.
    
    The second issue addressed in this commit has existed for a while in
    GDB, and would cause this assertion:
    
      gdb/frame.c:622: internal-error: frame_id get_frame_id(frame_info*): Assertion `fi->this_id.p != frame_id_status::COMPUTING' failed.
    
    We attempt to get the frame_id for a frame while we are computing the
    frame_id for that same frame.
    
    What happens is that when GDB stops we create a frame_info object for
    the sentinel frame (frame #-1) and then we attempt to unwind this
    frame to create a frame_info object for frame #0.
    
    In the test case used here to expose the issue we have created a
    Python frame unwinder.  In the Python unwinder we attemt to read the
    program counter register.
    
    Reading this register will initially create a lazy register value.
    The frame-id stored in the lazy register value will be for the
    sentinel frame (lazy register values hold the frame-id for the frame
    from which the register will be unwound).
    
    However, the Python unwinder does actually want to examine the value
    of the program counter, and so the lazy register value is resolved
    into a non-lazy value.  This sends GDB into value_fetch_lazy_register
    in value.c.
    
    Now, inside this function, if 'set debug frame on' is in effect, then
    we want to print something like:
    
      frame=%d, regnum=%d(%s), ....
    
    Where 'frame=%d' will be the relative frame level of the frame for
    which the register is being fetched, so, in this case we would expect
    to see 'frame=0', i.e. we are reading a register as it would be in
    frame #0.  But, remember, the lazy register value actually holds the
    frame-id for frame #-1 (the sentinel frame).
    
    So, to get the frame_info for frame #0 we used to call:
    
      frame = frame_find_by_id (VALUE_FRAME_ID (val));
    
    Where VALUE_FRAME_ID is:
    
      #define VALUE_FRAME_ID(val) (get_prev_frame_id_by_id (VALUE_NEXT_FRAME_ID (val)))
    
    That is, we start with the frame-id for the next frame as obtained by
    VALUE_NEXT_FRAME_ID, then call get_prev_frame_id_by_id to get the
    frame-id of the previous frame.
    
    The get_prev_frame_id_by_id function finds the frame_info for the
    given frame-id (in this case frame #-1), calls get_prev_frame to get
    the previous frame, and then calls get_frame_id.
    
    The problem here is that calling get_frame_id requires that we know
    the frame unwinder, so then have to try each frame unwinder in turn,
    which would include the Python unwinder.... which is where we started,
    and thus we have a loop!
    
    To prevent this loop GDB has an assertion in place, which is what
    actually triggers.
    
    Solving the assertion failure is pretty easy, if we consider the code
    in value_fetch_lazy_register and get_prev_frame_id_by_id then what we
    do is:
    
      1. Start with a frame_id taken from a value,
      2. Lookup the corresponding frame,
      3. Find the previous frame,
      4. Get the frame_id for that frame, and
      5. Lookup the corresponding frame
      6. Print the frame's level
    
    Notice that steps 3 and 5 give us the exact same result, step 4 is
    just wasted effort.  We could shorten this process such that we drop
    steps 4 and 5, thus:
    
      1. Start with a frame_id taken from a value,
      2. Lookup the corresponding frame,
      3. Find the previous frame,
      6. Print the frame's level
    
    This will give the exact same frame as a result, and this is what I
    have done in this patch by removing the use of VALUE_FRAME_ID from
    value_fetch_lazy_register.
    
    Out of curiosity I looked to see how widely VALUE_FRAME_ID was used,
    and saw it was only used in one other place in valops.c:value_assign,
    where, once again, we take the result of VALUE_FRAME_ID and pass it to
    frame_find_by_id, thus introducing a redundant frame_id lookup.
    
    I don't think the value_assign case risks triggering the assertion
    though, as we are unlikely to call value_assign while computing the
    frame_id for a frame, however, we could make value_assign slightly
    more efficient, with no real additional complexity, by removing the
    use of VALUE_FRAME_ID.
    
    So, in this commit, I completely remove VALUE_FRAME_ID, and replace it
    with a use of VALUE_NEXT_FRAME_ID, followed by a direct call to
    get_prev_frame_always, this should make no difference in either case,
    and resolves the assertion issue from value.c.
    
    As I said, this patch was originally part of another series, the
    original test relied on the fixes in that original series.  However, I
    was able to create an alternative test for this issue by enabling
    frame debug within an existing test script.

diff --git a/gdb/frame.c b/gdb/frame.c
index 3f2d2700541..2332418f347 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -2121,8 +2121,13 @@ get_prev_frame_always_1 (struct frame_info *this_frame)
   /* Only try to do the unwind once.  */
   if (this_frame->prev_p)
     {
-      frame_debug_printf ("  -> %s // cached",
-			  this_frame->prev->to_string ().c_str ());
+      if (this_frame->prev != nullptr)
+	frame_debug_printf ("  -> %s // cached",
+			    this_frame->prev->to_string ().c_str ());
+      else
+	frame_debug_printf
+	  ("  -> nullptr // %s // cached",
+	   frame_stop_reason_symbol_string (this_frame->stop_reason));
       return this_frame->prev;
     }
 
@@ -2515,22 +2520,6 @@ get_prev_frame (struct frame_info *this_frame)
   return get_prev_frame_always (this_frame);
 }
 
-struct frame_id
-get_prev_frame_id_by_id (struct frame_id id)
-{
-  struct frame_id prev_id;
-  struct frame_info *frame;
-
-  frame = frame_find_by_id (id);
-
-  if (frame != NULL)
-    prev_id = get_frame_id (get_prev_frame (frame));
-  else
-    prev_id = null_frame_id;
-
-  return prev_id;
-}
-
 CORE_ADDR
 get_frame_pc (struct frame_info *frame)
 {
diff --git a/gdb/frame.h b/gdb/frame.h
index 0d2bc08a47b..2548846c1ed 100644
--- a/gdb/frame.h
+++ b/gdb/frame.h
@@ -394,10 +394,6 @@ extern struct frame_info *get_prev_frame_always (struct frame_info *);
    is not found.  */
 extern struct frame_info *frame_find_by_id (struct frame_id id);
 
-/* Given a frame's ID, find the previous frame's ID.  Returns null_frame_id
-   if the frame is not found.  */
-extern struct frame_id get_prev_frame_id_by_id (struct frame_id id);
-
 /* Base attributes of a frame: */
 
 /* The frame's `resume' address.  Where the program will resume in
diff --git a/gdb/testsuite/gdb.base/premature-dummy-frame-removal.exp b/gdb/testsuite/gdb.base/premature-dummy-frame-removal.exp
index bf2a2a79756..1c0826201bb 100644
--- a/gdb/testsuite/gdb.base/premature-dummy-frame-removal.exp
+++ b/gdb/testsuite/gdb.base/premature-dummy-frame-removal.exp
@@ -51,3 +51,22 @@ set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
 gdb_test_no_output "source ${pyfile}" "load python file"
 
 gdb_test "p some_func ()" " = 0"
+
+# When frame debugging is turned on, this test has (previously)
+# revealed some crashes due to the Python frame unwinder trying to
+# read registers.
+#
+# Enable frame debug and rerun the test.  We don't bother checking the
+# output of calling 'p some_func ()' as the output will be full of
+# debug, to format of which isn't fixed.  All we care about is that
+# GDB is still running afterwards.
+#
+# All of the debug output makes this really slow when testing with the
+# special read1 version of expect, hence the timeout factor.
+with_read1_timeout_factor 10 {
+    gdb_test_no_output "set debug frame on"
+    gdb_test "p some_func ()" ".*" \
+	"repeat p some_func () with frame debug on"
+    gdb_test_no_output "set debug frame off"
+}
+gdb_test "p 1 + 2 + 3" " = 6"
diff --git a/gdb/valops.c b/gdb/valops.c
index bd547923496..50874a5f55d 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1197,14 +1197,15 @@ value_assign (struct value *toval, struct value *fromval)
 	struct gdbarch *gdbarch;
 	int value_reg;
 
-	/* Figure out which frame this is in currently.
-	
-	   We use VALUE_FRAME_ID for obtaining the value's frame id instead of
-	   VALUE_NEXT_FRAME_ID due to requiring a frame which may be passed to
-	   put_frame_register_bytes() below.  That function will (eventually)
-	   perform the necessary unwind operation by first obtaining the next
-	   frame.  */
-	frame = frame_find_by_id (VALUE_FRAME_ID (toval));
+	/* Figure out which frame this register value is in.  The value
+	   holds the frame_id for the next frame, that is the frame this
+	   register value was unwound from.
+
+	   Below we will call put_frame_register_bytes which requires that
+	   we pass it the actual frame in which the register value is
+	   valid, i.e. not the next frame.  */
+	frame = frame_find_by_id (VALUE_NEXT_FRAME_ID (toval));
+	frame = get_prev_frame_always (frame);
 
 	value_reg = VALUE_REGNUM (toval);
 
diff --git a/gdb/value.c b/gdb/value.c
index 6a07495d32b..91db66fa3be 100644
--- a/gdb/value.c
+++ b/gdb/value.c
@@ -3950,9 +3950,8 @@ value_fetch_lazy_register (struct value *val)
     {
       struct gdbarch *gdbarch;
       struct frame_info *frame;
-      /* VALUE_FRAME_ID is used here, instead of VALUE_NEXT_FRAME_ID,
-	 so that the frame level will be shown correctly.  */
-      frame = frame_find_by_id (VALUE_FRAME_ID (val));
+      frame = frame_find_by_id (VALUE_NEXT_FRAME_ID (val));
+      frame = get_prev_frame_always (frame);
       regnum = VALUE_REGNUM (val);
       gdbarch = get_frame_arch (frame);
 
diff --git a/gdb/value.h b/gdb/value.h
index 379cddafbe7..e1c6aabfa29 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -458,12 +458,6 @@ extern struct internalvar **deprecated_value_internalvar_hack (struct value *);
 extern struct frame_id *deprecated_value_next_frame_id_hack (struct value *);
 #define VALUE_NEXT_FRAME_ID(val) (*deprecated_value_next_frame_id_hack (val))
 
-/* Frame ID of frame to which a register value is relative.  This is
-   similar to VALUE_NEXT_FRAME_ID, above, but may not be assigned to. 
-   Note that VALUE_FRAME_ID effectively undoes the "next" operation
-   that was performed during the assignment to VALUE_NEXT_FRAME_ID.  */
-#define VALUE_FRAME_ID(val) (get_prev_frame_id_by_id (VALUE_NEXT_FRAME_ID (val)))
-
 /* Register number if the value is from a register.  */
 extern int *deprecated_value_regnum_hack (struct value *);
 #define VALUE_REGNUM(val) (*deprecated_value_regnum_hack (val))
Tom Tromey via Gdb-patches July 26, 2021, 1:57 p.m. | #3
On 2021-07-26 7:11 a.m., Andrew Burgess wrote:
> * Simon Marchi <simon.marchi@polymtl.ca> [2021-07-20 17:59:45 -0400]:

> 

>> On 2021-07-20 5:10 a.m., Andrew Burgess wrote:

>>> Thanks for the feedback on v2.

>>>

>>> In v3 I have:

>>>

>>>  - Addressed all of Pedro's feedback on the test in patch #1.

>>>  

>>>  - Rewritten how the problem in patch #1 is fixed based on Simon's

>>>    exception based approach.

>>>

>>> Thanks,

>>> Andrew

>>

>> When speaking to Pedro off-line, it did sound like he had concerns with

>> what I proposed, so let's wait to hear what he has to say.

> 

> That's fine.  Hopefully Pedro will be able to offer some feedback

> soon.

> 

> Before then, I've spun off patch #2 and an additional fix related to

> 'set debug frame on' into a separate patch (below).  I think all of

> this code is unrelated to the whole should we use an exception, or

> change the API of get_prev_frame.

> 

> What are your thoughts on this patch?

> 

> Additionally, this one might be a possible candidate for merging into

> gdb-11-branch.  Here's a ChangeLog if we decide that's a good idea.

> 

> gdb/ChangeLog:

> 	* frame.c (get_prev_frame_always_1): Handle case where

> 	this_frame->prev is nullptr.

> 	(get_prev_frame_id_by_id): Delete.

> 	* frame.h (get_prev_frame_id_by_id): Delete declaration.

> 	* valops.c (value_assign): Remove use of VALUE_FRAME_ID.

> 	* vaule.c (value_fetch_lazy_register): Likewise.

> 	* value.h (VALUE_FRAME_ID): Delete.

> 

> gdb/testsuite/ChangeLog:

> 	* gdb.base/premature-dummy-frame-removal.exp: Repeat test with

> 	'set debug frame on' in effect.

> 

> Thanks,

> Andrew


The patch below does LGTM, thanks.

Simon
Andrew Burgess July 27, 2021, 10:06 a.m. | #4
* Simon Marchi <simon.marchi@polymtl.ca> [2021-07-26 09:57:02 -0400]:

> On 2021-07-26 7:11 a.m., Andrew Burgess wrote:

> > * Simon Marchi <simon.marchi@polymtl.ca> [2021-07-20 17:59:45 -0400]:

> > 

> >> On 2021-07-20 5:10 a.m., Andrew Burgess wrote:

> >>> Thanks for the feedback on v2.

> >>>

> >>> In v3 I have:

> >>>

> >>>  - Addressed all of Pedro's feedback on the test in patch #1.

> >>>  

> >>>  - Rewritten how the problem in patch #1 is fixed based on Simon's

> >>>    exception based approach.

> >>>

> >>> Thanks,

> >>> Andrew

> >>

> >> When speaking to Pedro off-line, it did sound like he had concerns with

> >> what I proposed, so let's wait to hear what he has to say.

> > 

> > That's fine.  Hopefully Pedro will be able to offer some feedback

> > soon.

> > 

> > Before then, I've spun off patch #2 and an additional fix related to

> > 'set debug frame on' into a separate patch (below).  I think all of

> > this code is unrelated to the whole should we use an exception, or

> > change the API of get_prev_frame.

> > 

> > What are your thoughts on this patch?

> > 

> > Additionally, this one might be a possible candidate for merging into

> > gdb-11-branch.  Here's a ChangeLog if we decide that's a good idea.

> > 

> > gdb/ChangeLog:

> > 	* frame.c (get_prev_frame_always_1): Handle case where

> > 	this_frame->prev is nullptr.

> > 	(get_prev_frame_id_by_id): Delete.

> > 	* frame.h (get_prev_frame_id_by_id): Delete declaration.

> > 	* valops.c (value_assign): Remove use of VALUE_FRAME_ID.

> > 	* vaule.c (value_fetch_lazy_register): Likewise.

> > 	* value.h (VALUE_FRAME_ID): Delete.

> > 

> > gdb/testsuite/ChangeLog:

> > 	* gdb.base/premature-dummy-frame-removal.exp: Repeat test with

> > 	'set debug frame on' in effect.

> > 

> > Thanks,

> > Andrew

> 

> The patch below does LGTM, thanks.


Thanks,

I pushed this just to master for now.

Andrew