[PATCHv2] gdb/remote: Restore support for 'S' stop reply packet

Message ID 20200227161704.16480-1-andrew.burgess@embecosm.com
State New
Headers show
Series
  • [PATCHv2] gdb/remote: Restore support for 'S' stop reply packet
Related show

Commit Message

Andrew Burgess Feb. 27, 2020, 4:17 p.m.
Pedro,

Thanks for your feedback.  I agree with all your points, this version
of the patch:

  1. Doesn't query the target for the current thread, but just uses
  the first non-exited thread.

  2. Gives a warning if the target has more than one non-exited
  thread.

  3. Commit message is revised to reflect new implementation.

How's this?

Thanks,
Andrew

---

With this commit:

  commit 5b6d1e4fa4fc6827c7b3f0e99ff120dfa14d65d2
  Date:   Fri Jan 10 20:06:08 2020 +0000

      Multi-target support

There was a regression in GDB's support for older aspects of the
remote protocol.  Specifically, when a target sends the 'S' stop reply
packet, which doesn't include a thread-id, then GDB has to figure out
which thread actually stopped.

Before the above commit GDB figured this out by using inferior_ptid in
process_stop_reply, which contained the ptid of the current
process/thread.  With the above commit the inferior_ptid now has the
value null_ptid inside process_stop_reply, this can be seen in
do_target_wait, where we call switch_to_inferior_no_thread before
calling do_target_wait_1.

The solution I propose in this commit is to use the first non exited
thread of the inferior.  Additionally, GDB will give a warning if a
multi-threaded target stops without passing a thread-id.

There's no test included with this commit, if anyone has any ideas for
how we could test this aspect of the remote protocol (short of writing
a new gdbserver) then I'd love to know.

It is possible to trigger this bug by attaching GDB to a running GDB,
place a breakpoint on remote_parse_stop_reply, and manually change the
contents of buf - when we get a 'T' based stop packet, replace it with
an 'S' based packet, like this:

  (gdb) call memset (buf, "S05\0", 4)

After this the GDB that is performing the remote debugging will crash
with this error:

  inferior.c:279: internal-error: inferior* find_inferior_pid(process_stratum_target*, int): Assertion `pid != 0' failed.

gdb/ChangeLog:

	* remote.c (remote_target::process_stop_reply): Use the first
	non-exited thread if the target didn't pass a thread-id.
---
 gdb/ChangeLog |  5 +++++
 gdb/remote.c  | 18 +++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

-- 
2.14.5

Comments

Pedro Alves Feb. 27, 2020, 7:46 p.m. | #1
On 2/27/20 4:17 PM, Andrew Burgess wrote:

> The solution I propose in this commit is to use the first non exited

> thread of the inferior.


s/of the inferior/of the target/

> Additionally, GDB will give a warning if a

> multi-threaded target stops without passing a thread-id.

> 

> There's no test included with this commit, if anyone has any ideas for

> how we could test this aspect of the remote protocol (short of writing

> a new gdbserver) then I'd love to know.


I forgot to mention this earlier, but I think we could test this by using
gdbserver's --disable-packet=Tthread option (or --disable-packet=threads
to disable vCont as well.).  This sets the disable_packet_Tthread global in
gdbserver, which makes it skip including the thread in the T stop reply.

$ gdbserver --disable-packet
Disableable packets:
  vCont         All vCont packets
  qC            Querying the current thread
  qfThreadInfo  Thread listing
  Tthread       Passing the thread specifier in the T stop reply packet
  threads       All of the above

I added these options several years ago exactly to exercise support
for old non-threaded protocol.  ISTR having documented them, ...
... ah, here:
  https://sourceware.org/ml/gdb-patches/2008-06/msg00501.html

doesn't look like that patch was ever merged.  :-/

>  

> -  /* If no thread/process was reported by the stub, assume the current

> -     inferior.  */

> +  /* If no thread/process was reported by the stub then use the first

> +     non-exited thread in the current inferior.  */


s/current inferior/current target/

>    if (ptid == null_ptid)

> -    ptid = inferior_ptid;

> +    {

> +      for (thread_info *thr : all_non_exited_threads (this))

> +	{

> +	  if (ptid != null_ptid)

> +	    {

> +	      warning (_("multi-threaded target stopped without sending "

> +			 "a thread-id, using first non-exited thread"));


I wonder about making this only warn once, to avoid a flood of warnings.
Like:

    if (!warned)
      {
        warned = true;
        warning (....);
      }

We use that same pattern in other places we only want to warn once.
E.g., parse_memory_map.

Otherwise LGTM.

Thanks,
Pedro Alves
Andrew Burgess Feb. 28, 2020, 2:02 p.m. | #2
* Pedro Alves <palves@redhat.com> [2020-02-27 19:46:12 +0000]:

> On 2/27/20 4:17 PM, Andrew Burgess wrote:

> 

> > The solution I propose in this commit is to use the first non exited

> > thread of the inferior.

> 

> s/of the inferior/of the target/

> 

> > Additionally, GDB will give a warning if a

> > multi-threaded target stops without passing a thread-id.

> > 

> > There's no test included with this commit, if anyone has any ideas for

> > how we could test this aspect of the remote protocol (short of writing

> > a new gdbserver) then I'd love to know.

> 

> I forgot to mention this earlier, but I think we could test this by using

> gdbserver's --disable-packet=Tthread option (or --disable-packet=threads

> to disable vCont as well.).  This sets the disable_packet_Tthread global in

> gdbserver, which makes it skip including the thread in the T stop reply.

> 

> $ gdbserver --disable-packet

> Disableable packets:

>   vCont         All vCont packets

>   qC            Querying the current thread

>   qfThreadInfo  Thread listing

>   Tthread       Passing the thread specifier in the T stop reply packet

>   threads       All of the above

> 

> I added these options several years ago exactly to exercise support

> for old non-threaded protocol.  ISTR having documented them, ...

> ... ah, here:

>   https://sourceware.org/ml/gdb-patches/2008-06/msg00501.html


Thanks, this looks super useful.  I thought I'd write a test using
this feature but it turns out there's already
gdb.server/stop-reply-no-thread.exp which does
--disable-packet=Tthreads.

So, I run the test (without my patch) and ... it passes.  The reason:

  commit 3cada74087687907311b52781354ff551e10a0ed
  Author: Pedro Alves <palves@redhat.com>
  Date:   Thu Jan 11 00:23:04 2018 +0000

      Fix backwards compatibility with old GDBservers (PR remote/22597)

Specifically, this hunk:

  diff --git a/gdb/remote.c b/gdb/remote.c
  index 81c772a5451..a1cd9ae1df3 100644
  --- a/gdb/remote.c
  +++ b/gdb/remote.c
  @@ -6940,7 +6940,13 @@ Packet: '%s'\n"),
                          event->ptid = read_ptid (thr + strlen (";thread:"),
                                                   NULL);
                        else
  -                       event->ptid = magic_null_ptid;
  +                       {
  +                         /* Either the current thread hasn't changed,
  +                            or the inferior is not multi-threaded.
  +                            The event must be for the thread we last
  +                            set as (or learned as being) current.  */
  +                         event->ptid = event->rs->general_thread;
  +                       }
                      }

                    if (rsa == NULL)

This code sets the stop_reply::ptid to the general_thread if we have
no other thread specified in a 'T' packet.  I haven't looked at the
gdbserver code yet, but from your description I don't think I can stop
gdbserver sending a 'T' and revert back to an 'S'.

It seems a little weird that we use general_thread here, I don't see
why that would be any more valid than the continue_thread, and you
might think that if we set a continue_thread just before we execute
the inferior then the stop event is possibly going to be for the
continue thread.

Still, like you say, the design of the old mechanism is horribly
broken anyway, so I suspect there's little point worrying too much
about changing this stuff.

So, my next thought was maybe I should be doing something similar,
instead of "fixing" a missing ptid in process_stop_event, fill in a
valid ptid earlier, so I did this:

  diff --git a/gdb/remote.c b/gdb/remote.c
  index 4ac359ad5d9..67c5f298ee6 100644
  --- a/gdb/remote.c
  +++ b/gdb/remote.c
  @@ -7492,6 +7492,8 @@ Packet: '%s'\n"),
            event->ws.value.sig = (enum gdb_signal) sig;
          else
            event->ws.value.sig = GDB_SIGNAL_UNKNOWN;
  +       if (event->ptid == null_ptid)
  +         event->ptid = event->rs->general_thread;
         }
         break;
       case 'w':          /* Thread exited.  */

And this seems to do the trick.

However, I'm still worried by the code in ::process_stop_event that
says:

  if (ptid == null_ptid)
     ptid = inferior_ptid;

After all, we know that as part of do_target_wait the inferior_ptid is
reset to null_ptid, so this code should be pointless now, right...

So I tried deleting it, and then both the stop-reply-no-thread.exp,
and my local testing with my minimal gdb stub failed because once
again we end up in ::process_stop_event with stop_reply->ptid being
null_ptid - which now is coming from general_thread.

Out of interest I tried replacing both the uses of general_thread (one
in the 'T' packet and one I just added in the 'S' packet processing)
with continue_thread, and this didn't do any better, though now the
assertion that triggers is complaining that we end up using
minus_one_ptid, and this leads to my next thought...

I think that general_thread and continue_thread are more
representative of what GDB has asked of the inferior, rather than what
the inferior is telling GDB (maybe?).  Let me explain my thinking:
GDB updates the general_thread to a valid (something other than
null_ptid, minus_one_ptid, etc) in only two places:

  (a) _AFTER_ processing a stop reply packet, to make the
  general_thread match the thread that stopped, this is based on the
  ptid that was parsed out of the stop reply, or

  (b) When we send a 'H' packet to the remote, setting either the
  general 'g' or continue 'c' thread variable, which is done by GDB
  just before an action.

Now relying on (a) to set general_thread so we can read it on the
_next_ stop clearly makes no sense - what stopped this time has no
relevance to what might stop next time, and similarly, relying on the
value of general_thread set via the 'H' packet is clearly just wrong,
as GDB must send a 'Hc' (and sets the continue_thread) to pick which
thread should continue.

I think what I'm arguing is that using general_thread as a fall-back
when processing 'T' (and 'S') is wrong, we should instead in these
cases deliberately leave the stop event ptid as null_ptid (indicating
that the stop didn't name a thread), and then use my patch (with your
fixes) to pick a non-exited thread.

I include my expanded proposed patch below for your consideration.
I've updated the code and ChangeLog in the patch below, but I haven't
yet updated the commit message - I want to see how this discussion
goes first - but my plan would be to roll some of the above
description into the commit message - if you agree with this change.

Thanks,
Andrew

---

From 478f7a25259a8cf2dbdcccb11612a8f586335438 Mon Sep 17 00:00:00 2001
From: Andrew Burgess <andrew.burgess@embecosm.com>

Date: Thu, 30 Jan 2020 14:35:40 +0000
Subject: [PATCH] gdb/remote: Restore support for 'S' stop reply packet

With this commit:

  commit 5b6d1e4fa4fc6827c7b3f0e99ff120dfa14d65d2
  Date:   Fri Jan 10 20:06:08 2020 +0000

      Multi-target support

There was a regression in GDB's support for older aspects of the
remote protocol.  Specifically, when a target sends the 'S' stop reply
packet, which doesn't include a thread-id, then GDB has to figure out
which thread actually stopped.

Before the above commit GDB figured this out by using inferior_ptid in
process_stop_reply, which contained the ptid of the current
process/thread.  With the above commit the inferior_ptid now has the
value null_ptid inside process_stop_reply, this can be seen in
do_target_wait, where we call switch_to_inferior_no_thread before
calling do_target_wait_1.

The solution I propose in this commit is to use the first non exited
thread of the inferior.  Additionally, GDB will give a warning if a
multi-threaded target stops without passing a thread-id.

There's no test included with this commit, if anyone has any ideas for
how we could test this aspect of the remote protocol (short of writing
a new gdbserver) then I'd love to know.

It is possible to trigger this bug by attaching GDB to a running GDB,
place a breakpoint on remote_parse_stop_reply, and manually change the
contents of buf - when we get a 'T' based stop packet, replace it with
an 'S' based packet, like this:

  (gdb) call memset (buf, "S05\0", 4)

After this the GDB that is performing the remote debugging will crash
with this error:

  inferior.c:279: internal-error: inferior* find_inferior_pid(process_stratum_target*, int): Assertion `pid != 0' failed.

gdb/ChangeLog:

	* remote.c (remote_target::remote_parse_stop_reply): Don't use the
	general_thread if the stop reply is missing a thread-id.
	* remote.c (remote_target::process_stop_reply): Use the first
	non-exited thread if the target didn't pass a thread-id.
---
 gdb/ChangeLog |  7 +++++++
 gdb/remote.c  | 43 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 4a70ab3fb0d..9c3e40fef16 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -7402,18 +7402,14 @@ Packet: '%s'\n"),
 		     reported expedited registers.  */
 		  if (event->ptid == null_ptid)
 		    {
+		      /* If there is no thread-id information then leave
+			 the event->ptid as null_ptid.  Later in
+			 process_stop_reply we will pick a suitable
+			 thread.  */
 		      const char *thr = strstr (p1 + 1, ";thread:");
 		      if (thr != NULL)
 			event->ptid = read_ptid (thr + strlen (";thread:"),
 						 NULL);
-		      else
-			{
-			  /* Either the current thread hasn't changed,
-			     or the inferior is not multi-threaded.
-			     The event must be for the thread we last
-			     set as (or learned as being) current.  */
-			  event->ptid = event->rs->general_thread;
-			}
 		    }
 
 		  if (rsa == NULL)
@@ -7668,10 +7664,35 @@ remote_target::process_stop_reply (struct stop_reply *stop_reply,
   *status = stop_reply->ws;
   ptid = stop_reply->ptid;
 
-  /* If no thread/process was reported by the stub, assume the current
-     inferior.  */
+  /* If no thread/process was reported by the stub then use the first
+     non-exited thread in the current target.  */
   if (ptid == null_ptid)
-    ptid = inferior_ptid;
+    {
+      for (thread_info *thr : all_non_exited_threads (this))
+	{
+	  if (ptid != null_ptid)
+	    {
+	      static bool warned = false;
+
+	      if (!warned)
+		{
+		  /* If you are seeing this warning then the remote target
+		     has multiple threads and either send an 'S' stop
+		     packet, or a 'T' stop packet without a thread-id.  In
+		     both of these cases GDB is unable to know which thread
+		     just stopped and is now having to guess.  The correct
+		     action is to fix the remote target to send the correct
+		     packet (a 'T' packet and include a thread-id).  */
+		  warning (_("multi-threaded target stopped without sending "
+			     "a thread-id, using first non-exited thread"));
+		  warned = true;
+		}
+	      break;
+	    }
+	  ptid = thr->ptid;
+	}
+      gdb_assert (ptid != null_ptid);
+    }
 
   if (status->kind != TARGET_WAITKIND_EXITED
       && status->kind != TARGET_WAITKIND_SIGNALLED
-- 
2.14.5
Pedro Alves Feb. 28, 2020, 5:45 p.m. | #3
On 2/28/20 2:02 PM, Andrew Burgess wrote:
> * Pedro Alves <palves@redhat.com> [2020-02-27 19:46:12 +0000]:

> 

>> On 2/27/20 4:17 PM, Andrew Burgess wrote:

>>

>>> The solution I propose in this commit is to use the first non exited

>>> thread of the inferior.

>>

>> s/of the inferior/of the target/

>>

>>> Additionally, GDB will give a warning if a

>>> multi-threaded target stops without passing a thread-id.

>>>

>>> There's no test included with this commit, if anyone has any ideas for

>>> how we could test this aspect of the remote protocol (short of writing

>>> a new gdbserver) then I'd love to know.

>>

>> I forgot to mention this earlier, but I think we could test this by using

>> gdbserver's --disable-packet=Tthread option (or --disable-packet=threads

>> to disable vCont as well.).  This sets the disable_packet_Tthread global in

>> gdbserver, which makes it skip including the thread in the T stop reply.

>>

>> $ gdbserver --disable-packet

>> Disableable packets:

>>   vCont         All vCont packets

>>   qC            Querying the current thread

>>   qfThreadInfo  Thread listing

>>   Tthread       Passing the thread specifier in the T stop reply packet

>>   threads       All of the above

>>

>> I added these options several years ago exactly to exercise support

>> for old non-threaded protocol.  ISTR having documented them, ...

>> ... ah, here:

>>   https://sourceware.org/ml/gdb-patches/2008-06/msg00501.html

> 

> Thanks, this looks super useful.  I thought I'd write a test using

> this feature but it turns out there's already

> gdb.server/stop-reply-no-thread.exp which does

> --disable-packet=Tthreads.

> 

> So, I run the test (without my patch) and ... it passes.  The reason:

> 

>   commit 3cada74087687907311b52781354ff551e10a0ed

>   Author: Pedro Alves <palves@redhat.com>

>   Date:   Thu Jan 11 00:23:04 2018 +0000

> 

>       Fix backwards compatibility with old GDBservers (PR remote/22597)


Ah...  I did have a feeling of deja vu...

Thanks for digging it out.

> 

> Specifically, this hunk:

> 

>   diff --git a/gdb/remote.c b/gdb/remote.c

>   index 81c772a5451..a1cd9ae1df3 100644

>   --- a/gdb/remote.c

>   +++ b/gdb/remote.c

>   @@ -6940,7 +6940,13 @@ Packet: '%s'\n"),

>                           event->ptid = read_ptid (thr + strlen (";thread:"),

>                                                    NULL);

>                         else

>   -                       event->ptid = magic_null_ptid;

>   +                       {

>   +                         /* Either the current thread hasn't changed,

>   +                            or the inferior is not multi-threaded.

>   +                            The event must be for the thread we last

>   +                            set as (or learned as being) current.  */

>   +                         event->ptid = event->rs->general_thread;

>   +                       }

>                       }

> 

>                     if (rsa == NULL)

> 

> This code sets the stop_reply::ptid to the general_thread if we have

> no other thread specified in a 'T' packet.  I haven't looked at the

> gdbserver code yet, but from your description I don't think I can stop

> gdbserver sending a 'T' and revert back to an 'S'.


Right, you can't.

> 

> It seems a little weird that we use general_thread here, I don't see

> why that would be any more valid than the continue_thread, and you

> might think that if we set a continue_thread just before we execute

> the inferior then the stop event is possibly going to be for the

> continue thread.

> 

> Still, like you say, the design of the old mechanism is horribly

> broken anyway, so I suspect there's little point worrying too much

> about changing this stuff.

> 

> So, my next thought was maybe I should be doing something similar,

> instead of "fixing" a missing ptid in process_stop_event, fill in a

> valid ptid earlier, so I did this:

> 

>   diff --git a/gdb/remote.c b/gdb/remote.c

>   index 4ac359ad5d9..67c5f298ee6 100644

>   --- a/gdb/remote.c

>   +++ b/gdb/remote.c

>   @@ -7492,6 +7492,8 @@ Packet: '%s'\n"),

>             event->ws.value.sig = (enum gdb_signal) sig;

>           else

>             event->ws.value.sig = GDB_SIGNAL_UNKNOWN;

>   +       if (event->ptid == null_ptid)

>   +         event->ptid = event->rs->general_thread;

>          }

>          break;

>        case 'w':          /* Thread exited.  */

> 

> And this seems to do the trick.

> 

> However, I'm still worried by the code in ::process_stop_event that

> says:

> 

>   if (ptid == null_ptid)

>      ptid = inferior_ptid;

> 

> After all, we know that as part of do_target_wait the inferior_ptid is

> reset to null_ptid, so this code should be pointless now, right...


Right.  Relying on inferior_ptid when processing events is just wrong
and should be removed.

> 

> So I tried deleting it, and then both the stop-reply-no-thread.exp,

> and my local testing with my minimal gdb stub failed because once

> again we end up in ::process_stop_event with stop_reply->ptid being

> null_ptid - which now is coming from general_thread.

> 

> Out of interest I tried replacing both the uses of general_thread (one

> in the 'T' packet and one I just added in the 'S' packet processing)

> with continue_thread, and this didn't do any better, though now the

> assertion that triggers is complaining that we end up using

> minus_one_ptid, and this leads to my next thought...


Right.

> 

> I think that general_thread and continue_thread are more

> representative of what GDB has asked of the inferior, rather than what

> the inferior is telling GDB (maybe?).  Let me explain my thinking:

> GDB updates the general_thread to a valid (something other than

> null_ptid, minus_one_ptid, etc) in only two places:

> 

>   (a) _AFTER_ processing a stop reply packet, to make the

>   general_thread match the thread that stopped, this is based on the

>   ptid that was parsed out of the stop reply, or

> 

>   (b) When we send a 'H' packet to the remote, setting either the

>   general 'g' or continue 'c' thread variable, which is done by GDB

>   just before an action.

> 

> Now relying on (a) to set general_thread so we can read it on the

> _next_ stop clearly makes no sense - what stopped this time has no

> relevance to what might stop next time, and similarly, relying on the

> value of general_thread set via the 'H' packet is clearly just wrong,

> as GDB must send a 'Hc' (and sets the continue_thread) to pick which

> thread should continue.


The theory of the original fix was that if the stub isn't sending
a thread id in the stop reply, then there's only one thread in
the remote target, so the thread recorded in general_thread is
going to be the right thread anyway.  As you noticed, continue_thread
is going to be minus_one_ptid when we tell the stub to resume
all threads, so continue_thread was a better choice.  I think.

> 

> I think what I'm arguing is that using general_thread as a fall-back

> when processing 'T' (and 'S') is wrong, we should instead in these

> cases deliberately leave the stop event ptid as null_ptid (indicating

> that the stop didn't name a thread), and then use my patch (with your

> fixes) to pick a non-exited thread.


That sounds good to me.  Seems safer / better than relying on
general_thread pointing to a thread, since in some cases it can
point to a while process, or to no process/thread.

>  

>  		  if (rsa == NULL)

> @@ -7668,10 +7664,35 @@ remote_target::process_stop_reply (struct stop_reply *stop_reply,

>    *status = stop_reply->ws;

>    ptid = stop_reply->ptid;

>  

> -  /* If no thread/process was reported by the stub, assume the current

> -     inferior.  */

> +  /* If no thread/process was reported by the stub then use the first

> +     non-exited thread in the current target.  */

>    if (ptid == null_ptid)

> -    ptid = inferior_ptid;

> +    {

> +      for (thread_info *thr : all_non_exited_threads (this))

> +	{

> +	  if (ptid != null_ptid)

> +	    {

> +	      static bool warned = false;

> +

> +	      if (!warned)

> +		{

> +		  /* If you are seeing this warning then the remote target

> +		     has multiple threads and either send an 'S' stop


You mean "send" -> "sent", I think.

Thanks,
Pedro Alves

Patch

diff --git a/gdb/remote.c b/gdb/remote.c
index 4a70ab3fb0d..4ac359ad5d9 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -7668,10 +7668,22 @@  remote_target::process_stop_reply (struct stop_reply *stop_reply,
   *status = stop_reply->ws;
   ptid = stop_reply->ptid;
 
-  /* If no thread/process was reported by the stub, assume the current
-     inferior.  */
+  /* If no thread/process was reported by the stub then use the first
+     non-exited thread in the current inferior.  */
   if (ptid == null_ptid)
-    ptid = inferior_ptid;
+    {
+      for (thread_info *thr : all_non_exited_threads (this))
+	{
+	  if (ptid != null_ptid)
+	    {
+	      warning (_("multi-threaded target stopped without sending "
+			 "a thread-id, using first non-exited thread"));
+	      break;
+	    }
+	  ptid = thr->ptid;
+	}
+      gdb_assert (ptid != null_ptid);
+    }
 
   if (status->kind != TARGET_WAITKIND_EXITED
       && status->kind != TARGET_WAITKIND_SIGNALLED