gdb/remote: handle target dying just before a stepi

Message ID 20210621195056.580105-1-andrew.burgess@embecosm.com
State New
Headers show
Series
  • gdb/remote: handle target dying just before a stepi
Related show

Commit Message

Andrew Burgess June 21, 2021, 7:50 p.m.
I randomly hit a situation where gdbserver crashed immediately before
I issued a 'stepi' to GDB, it turns out that this causes GDB itself to
crash.

What happens is that as part of the stepi we try to insert some
breakpoints into the inferior, so from insert_breakpoints we figure
out what we want to insert, then, eventually, try to send some packets
to the remote to get the breakpoints inserted.

It is only at this point that GDB realises that the target has gone
away.  This causes GDB to then enter this call stack:

  unpush_and_perror
    remote_unpush_target
      generic_mourn_inferior
        breakpoint_init_inferior
          delete_breakpoint
            update_global_location_list

So, we realise the target is gone and so delete the breakpoints
associated with that target.

GDB then throws a TARGET_CLOSE_ERROR from unpush_and_error.

This error is caught in insert_breakpoints where we then try to print
a nice error saying something like:

  Cannot insert breakpoint %d: some error text here...

To fill in the '%d' we try to read properties of the breakpoint
object.

Which was deleted due to the delete_breakpoint call above.

And so GDB dies...

My proposal in this commit is that, should we catch a
TARGET_CLOSE_ERROR in insert_breakpoints, then we just rethrow the
error.

This will cause the main event loop to print something like:

  Remote connection closed

Which I think is fine, I don't think the user will care much which
particular breakpoint GDB was operating on when the connection closed,
just knowing that the connection closed should be enough I think.

I initially added a test to 'gdb.server/server-kill.exp' for this
issue, however, my first attempt was not good enough, the test was
passing even without my fix.

Turns out that the server-kill.exp test actually kills the PID of the
inferior, not the PID of the server.  This means that gdbserver is
actually able to send a packet to GDB saying that the inferior has
exited prior to gdbserver itself shutting down.  This extra
information was enough to prevent the bug I was seeing manifest.

So, I have extended server-kill.exp to run all of the tests twice, the
first time we still kill the inferior.  On the second run we hard kill
the gdbserver itself, this prevents the server from sending anything
to GDB before it exits.

My new test is only expected to fail in this second mode of
operation (killing gdbserver itself), and without my fix, that is what
I see.

gdb/ChangeLog:

	* breakpoint.c (insert_bp_location): If we catch a
	TARGET_CLOSE_ERROR just rethrow it, the breakpoints might have
	been deleted.

gdb/testsuite/ChangeLog:

	* gdb.server/server-kill.exp: Introduce global kill_pid_of, and
	make use of this in prepare to select which pid we should kill.
	Run all the tests twice with a different kill_pid_of value.
	(prepare): Make use of kill_pid_of.
	(test_stepi): New proc.
---
 gdb/ChangeLog                            |  6 +++
 gdb/breakpoint.c                         |  8 +++
 gdb/testsuite/ChangeLog                  |  8 +++
 gdb/testsuite/gdb.server/server-kill.exp | 68 +++++++++++++++++++-----
 4 files changed, 76 insertions(+), 14 deletions(-)

-- 
2.25.4

Comments

Eli Zaretskii via Gdb-patches June 22, 2021, 1:27 a.m. | #1
On 2021-06-21 3:50 p.m., Andrew Burgess wrote:
> I randomly hit a situation where gdbserver crashed immediately before

> I issued a 'stepi' to GDB, it turns out that this causes GDB itself to

> crash.

> 

> What happens is that as part of the stepi we try to insert some

> breakpoints into the inferior, so from insert_breakpoints we figure

> out what we want to insert, then, eventually, try to send some packets

> to the remote to get the breakpoints inserted.

> 

> It is only at this point that GDB realises that the target has gone

> away.  This causes GDB to then enter this call stack:

> 

>   unpush_and_perror

>     remote_unpush_target

>       generic_mourn_inferior

>         breakpoint_init_inferior

>           delete_breakpoint

>             update_global_location_list

> 

> So, we realise the target is gone and so delete the breakpoints

> associated with that target.

> 

> GDB then throws a TARGET_CLOSE_ERROR from unpush_and_error.

> 

> This error is caught in insert_breakpoints where we then try to print

> a nice error saying something like:

> 

>   Cannot insert breakpoint %d: some error text here...

> 

> To fill in the '%d' we try to read properties of the breakpoint

> object.

> 

> Which was deleted due to the delete_breakpoint call above.

> 

> And so GDB dies...

> 

> My proposal in this commit is that, should we catch a

> TARGET_CLOSE_ERROR in insert_breakpoints, then we just rethrow the

> error.

> 

> This will cause the main event loop to print something like:

> 

>   Remote connection closed

> 

> Which I think is fine, I don't think the user will care much which

> particular breakpoint GDB was operating on when the connection closed,

> just knowing that the connection closed should be enough I think.


I think that makes sense.  It's a very different thing if the remote
replies "failed to insert the breakpoint" and if the connection closes.
In the first case, we can still try to insert other breakpoints, in the
later case everything is broken.  So it makes sense to propagate the
exception to interrupt whatever we were doing.  Whether GDB will be in a
good state after that... that's hard to tell.  But at least, not
crashing is a good first step.

> +# Global control variable used by the proc prepare.  Should be set to

> +# either 'inferior' or 'server'.

> +#

> +# In the proc prepare we start gdbserver and extract a pid, which will

> +# later be killed by calling the proc kill_server.

> +#

> +# When KILL_PID_OF is set to 'inferior' then the pid we kill is that

> +# of the inferior running under gdbserver, when this process dies

> +# gdbserver itself will exit.

> +#

> +# When KILL_PID_OF is set to 'server' then the pid we will is that of


will -> kill

> +	global server_spawn_id

> +

> +	set server_pid [exp_pid -i $server_spawn_id]


I started to use :: to refer to global variables ($::server_spawn_id), I
think it's a bit nicer because you see in the expression itself that
this is a global.  But if you prefer not, just tell me and I won't
mention it again :).

Simon
Andrew Burgess June 22, 2021, 9:04 a.m. | #2
* Simon Marchi <simon.marchi@polymtl.ca> [2021-06-21 21:27:50 -0400]:

> On 2021-06-21 3:50 p.m., Andrew Burgess wrote:

> > I randomly hit a situation where gdbserver crashed immediately before

> > I issued a 'stepi' to GDB, it turns out that this causes GDB itself to

> > crash.

> > 

> > What happens is that as part of the stepi we try to insert some

> > breakpoints into the inferior, so from insert_breakpoints we figure

> > out what we want to insert, then, eventually, try to send some packets

> > to the remote to get the breakpoints inserted.

> > 

> > It is only at this point that GDB realises that the target has gone

> > away.  This causes GDB to then enter this call stack:

> > 

> >   unpush_and_perror

> >     remote_unpush_target

> >       generic_mourn_inferior

> >         breakpoint_init_inferior

> >           delete_breakpoint

> >             update_global_location_list

> > 

> > So, we realise the target is gone and so delete the breakpoints

> > associated with that target.

> > 

> > GDB then throws a TARGET_CLOSE_ERROR from unpush_and_error.

> > 

> > This error is caught in insert_breakpoints where we then try to print

> > a nice error saying something like:

> > 

> >   Cannot insert breakpoint %d: some error text here...

> > 

> > To fill in the '%d' we try to read properties of the breakpoint

> > object.

> > 

> > Which was deleted due to the delete_breakpoint call above.

> > 

> > And so GDB dies...

> > 

> > My proposal in this commit is that, should we catch a

> > TARGET_CLOSE_ERROR in insert_breakpoints, then we just rethrow the

> > error.

> > 

> > This will cause the main event loop to print something like:

> > 

> >   Remote connection closed

> > 

> > Which I think is fine, I don't think the user will care much which

> > particular breakpoint GDB was operating on when the connection closed,

> > just knowing that the connection closed should be enough I think.

> 

> I think that makes sense.  It's a very different thing if the remote

> replies "failed to insert the breakpoint" and if the connection closes.

> In the first case, we can still try to insert other breakpoints, in the

> later case everything is broken.  So it makes sense to propagate the

> exception to interrupt whatever we were doing.  Whether GDB will be in a

> good state after that... that's hard to tell.  But at least, not

> crashing is a good first step.

> 

> > +# Global control variable used by the proc prepare.  Should be set to

> > +# either 'inferior' or 'server'.

> > +#

> > +# In the proc prepare we start gdbserver and extract a pid, which will

> > +# later be killed by calling the proc kill_server.

> > +#

> > +# When KILL_PID_OF is set to 'inferior' then the pid we kill is that

> > +# of the inferior running under gdbserver, when this process dies

> > +# gdbserver itself will exit.

> > +#

> > +# When KILL_PID_OF is set to 'server' then the pid we will is that of

> 

> will -> kill


Fixed.

> 

> > +	global server_spawn_id

> > +

> > +	set server_pid [exp_pid -i $server_spawn_id]

> 

> I started to use :: to refer to global variables ($::server_spawn_id), I

> think it's a bit nicer because you see in the expression itself that

> this is a global.  But if you prefer not, just tell me and I won't

> mention it again :).


No, that seems much nicer.  I fixed up the two globals I added to use
this syntax, and pushed this patch.

Thanks,
Andrew

Patch

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index fb011fc1e0f..0595c6f8cbd 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -2698,6 +2698,14 @@  insert_bp_location (struct bp_location *bl,
 	{
 	  /* Can't set the breakpoint.  */
 
+	  /* If the target has closed then it will have deleted any
+	     breakpoints inserted within the target inferior, as a result
+	     any further attempts to interact with the breakpoint objects
+	     is not possible.  Just rethrow the error.  */
+	  if (bp_excpt.error == TARGET_CLOSE_ERROR)
+	    throw bp_excpt;
+	  gdb_assert (bl->owner != nullptr);
+
 	  /* In some cases, we might not be able to insert a
 	     breakpoint in a shared library that has already been
 	     removed, but we have not yet processed the shlib unload
diff --git a/gdb/testsuite/gdb.server/server-kill.exp b/gdb/testsuite/gdb.server/server-kill.exp
index 80d78f89738..358a2003af8 100644
--- a/gdb/testsuite/gdb.server/server-kill.exp
+++ b/gdb/testsuite/gdb.server/server-kill.exp
@@ -30,6 +30,21 @@  if { [build_executable "failed to prepare" ${testfile}] } {
     return -1
 }
 
+# Global control variable used by the proc prepare.  Should be set to
+# either 'inferior' or 'server'.
+#
+# In the proc prepare we start gdbserver and extract a pid, which will
+# later be killed by calling the proc kill_server.
+#
+# When KILL_PID_OF is set to 'inferior' then the pid we kill is that
+# of the inferior running under gdbserver, when this process dies
+# gdbserver itself will exit.
+#
+# When KILL_PID_OF is set to 'server' then the pid we will is that of
+# the gdbserver itself, this is a much more aggressive strategy and
+# exposes different bugs within GDB.
+set kill_pid_of "inferior"
+
 # Spawn GDBserver, run to main, extract GDBserver's PID and save it in
 # the SERVER_PID global.
 
@@ -37,6 +52,7 @@  proc prepare {} {
     global binfile gdb_prompt srcfile decimal
     global server_pid
     global GDBFLAGS
+    global kill_pid_of
 
     save_vars { GDBFLAGS } {
 	# If GDB and GDBserver are both running locally, set the sysroot to avoid
@@ -54,18 +70,24 @@  proc prepare {} {
 
     gdbserver_run ""
 
-    # Continue past server_pid assignment.
-    gdb_breakpoint ${srcfile}:[gdb_get_line_number "i = 0;"]
-    gdb_continue_to_breakpoint "after server_pid assignment"
-
-    # Get the pid of GDBServer.
-    set test "p server_pid"
-    set server_pid 0
-    gdb_test_multiple $test $test {
-	-re " = ($decimal)\r\n$gdb_prompt $" {
-	    set server_pid $expect_out(1,string)
-	    pass $test
+    if { $kill_pid_of == "inferior" } {
+	# Continue past server_pid assignment.
+	gdb_breakpoint ${srcfile}:[gdb_get_line_number "i = 0;"]
+	gdb_continue_to_breakpoint "after server_pid assignment"
+
+	# Get the pid of GDBServer.
+	set test "p server_pid"
+	set server_pid 0
+	gdb_test_multiple $test $test {
+	    -re " = ($decimal)\r\n$gdb_prompt $" {
+		set server_pid $expect_out(1,string)
+		pass $test
+	    }
 	}
+    } else {
+	global server_spawn_id
+
+	set server_pid [exp_pid -i $server_spawn_id]
     }
 
     if {$server_pid == 0} {
@@ -132,6 +154,24 @@  proc_with_prefix test_unwind_syms {} {
     gdb_test "bt" "(Target disconnected|Remote connection closed|Remote communication error).*"
 }
 
-test_tstatus
-test_unwind_nosyms
-test_unwind_syms
+# Test performing a stepi right after the connection is dropped.
+
+proc_with_prefix test_stepi {} {
+    if ![prepare] {
+	return
+    }
+
+    kill_server
+
+    gdb_test "stepi" "(Target disconnected|Remote connection closed|Remote communication error).*"
+}
+
+# Run each test twice, see the description of KILL_PID_OF earlier in
+# this file for more details.
+
+foreach_with_prefix kill_pid_of { "inferior" "server" } {
+    test_tstatus
+    test_unwind_nosyms
+    test_unwind_syms
+    test_stepi
+}