[gdb/testsuite] Fix check-libthread-db.exp FAILs with glibc 2.33

Message ID 20210707140950.GA2241@delia
State New
Headers show
Series
  • [gdb/testsuite] Fix check-libthread-db.exp FAILs with glibc 2.33
Related show

Commit Message

Tom de Vries July 7, 2021, 2:09 p.m.
Hi,

When running test-case gdb.threads/check-libthread-db.exp on openSUSE
Tumbleweed with glibc 2.33, I get:
...
(gdb) maint check libthread-db^M
Running libthread_db integrity checks:^M
  Got thread 0x7ffff7c79b80 => 9354 => 0x7ffff7c79b80; errno = 0 ... OK^M
libthread_db integrity checks passed.^M
(gdb) FAIL: gdb.threads/check-libthread-db.exp: user-initiated check: libpthread.so not initialized (pattern 2)
...

The test-case expects instead:
...
  Got thread 0x0 => 9354 => 0x0 ... OK^M
...
which is what I get on openSUSE Leap 15.2 with glibc 2.26, and what is
described in the test-case like this:
...
    # libthread_db should fake a single thread with th_unique == NULL.
...

Using a breakpoint on check_thread_db_callback we can compare the two
scenarios, and find that in the latter case we hit this code in glibc function
iterate_thread_list in nptl_db/td_ta_thr_iter.c:
...
  if (next == 0 && fake_empty)
    {
      /* __pthread_initialize_minimal has not run.  There is just the main
         thread to return.  We cannot rely on its thread register.  They
         sometimes contain garbage that would confuse us, left by the
         kernel at exec.  So if it looks like initialization is incomplete,
         we only fake a special descriptor for the initial thread.  */
      td_thrhandle_t th = { ta, 0 };
      return callback (&th, cbdata_p) != 0 ? TD_DBERR : TD_OK;
    }
...
while in the former case we don't because this preceding statement doesn't
result in next == 0:
...
  err = DB_GET_FIELD (next, ta, head, list_t, next, 0);
...

Note that the comment mentions __pthread_initialize_minimal, but in both cases
it has already run before we hit the callback, so it's possible the comment is
no longer accurate.

Anyway, the results do not look wrong, so fix this by updating the regexp
patterns to agree with what libthread-db is telling us.

Tested on x86_64-linux, both with glibc 2.33 and 2.26.

Any comments?

Thanks,
- Tom

[gdb/testsuite] Fix check-libthread-db.exp FAILs with glibc 2.33

gdb/testsuite/ChangeLog:

2021-07-07  Tom de Vries  <tdevries@suse.de>

	PR testsuite/27690
	* gdb.threads/check-libthread-db.exp: Update patterns for glibc 2.33.

---
 gdb/testsuite/gdb.threads/check-libthread-db.exp | 37 ++++++++++++++----------
 1 file changed, 21 insertions(+), 16 deletions(-)

Comments

will schmidt via Gdb-patches July 13, 2021, 4:49 a.m. | #1
On 2021-07-07 10:09 a.m., Tom de Vries wrote:
> Hi,

> 

> When running test-case gdb.threads/check-libthread-db.exp on openSUSE

> Tumbleweed with glibc 2.33, I get:

> ...

> (gdb) maint check libthread-db^M

> Running libthread_db integrity checks:^M

>   Got thread 0x7ffff7c79b80 => 9354 => 0x7ffff7c79b80; errno = 0 ... OK^M

> libthread_db integrity checks passed.^M

> (gdb) FAIL: gdb.threads/check-libthread-db.exp: user-initiated check: libpthread.so not initialized (pattern 2)

> ...

> 

> The test-case expects instead:

> ...

>   Got thread 0x0 => 9354 => 0x0 ... OK^M

> ...

> which is what I get on openSUSE Leap 15.2 with glibc 2.26, and what is

> described in the test-case like this:

> ...

>     # libthread_db should fake a single thread with th_unique == NULL.

> ...

> 

> Using a breakpoint on check_thread_db_callback we can compare the two

> scenarios, and find that in the latter case we hit this code in glibc function

> iterate_thread_list in nptl_db/td_ta_thr_iter.c:

> ...

>   if (next == 0 && fake_empty)

>     {

>       /* __pthread_initialize_minimal has not run.  There is just the main

>          thread to return.  We cannot rely on its thread register.  They

>          sometimes contain garbage that would confuse us, left by the

>          kernel at exec.  So if it looks like initialization is incomplete,

>          we only fake a special descriptor for the initial thread.  */

>       td_thrhandle_t th = { ta, 0 };

>       return callback (&th, cbdata_p) != 0 ? TD_DBERR : TD_OK;

>     }

> ...

> while in the former case we don't because this preceding statement doesn't

> result in next == 0:

> ...

>   err = DB_GET_FIELD (next, ta, head, list_t, next, 0);

> ...

> 

> Note that the comment mentions __pthread_initialize_minimal, but in both cases

> it has already run before we hit the callback, so it's possible the comment is

> no longer accurate.

> 

> Anyway, the results do not look wrong, so fix this by updating the regexp

> patterns to agree with what libthread-db is telling us.

> 

> Tested on x86_64-linux, both with glibc 2.33 and 2.26.

> 

> Any comments?


I got a bit lost in the glibc code, but I am wondering if this change in
behavior might have been caused by this glibc change:

  https://gitlab.com/gnutools/glibc/-/commit/1daccf403b1bd86370eb94edca794dc106d02039

Before this, the stack_user list was global variables.  This is one of
the two lists that get walked to enumerate the threads.  Since it isn't
initialized statically, it got placed in .bss and is zeroed-out by
default, hence why thread-db expected to read next == 0, I guess.

Or, it may be that the moment that this nptl minimal initialization is
done has changed, at least relative to where we do our checks.  It looks
like __pthread_initialize_minimal_internal is called as a constructor.
Has it always been this way?  By the time we stop on the solib load
event, have the constructors of the lib ran?

In any case, I agree that this looks benign, the regexp checks we have
exactly one thread.  But I would still be curious to know the real
reason for the change in behavior.

Simon
Tom de Vries July 13, 2021, 12:59 p.m. | #2
On 7/13/21 6:49 AM, Simon Marchi wrote:
> 

> 

> On 2021-07-07 10:09 a.m., Tom de Vries wrote:

>> Hi,

>>

>> When running test-case gdb.threads/check-libthread-db.exp on openSUSE

>> Tumbleweed with glibc 2.33, I get:

>> ...

>> (gdb) maint check libthread-db^M

>> Running libthread_db integrity checks:^M

>>   Got thread 0x7ffff7c79b80 => 9354 => 0x7ffff7c79b80; errno = 0 ... OK^M

>> libthread_db integrity checks passed.^M

>> (gdb) FAIL: gdb.threads/check-libthread-db.exp: user-initiated check: libpthread.so not initialized (pattern 2)

>> ...

>>

>> The test-case expects instead:

>> ...

>>   Got thread 0x0 => 9354 => 0x0 ... OK^M

>> ...

>> which is what I get on openSUSE Leap 15.2 with glibc 2.26, and what is

>> described in the test-case like this:

>> ...

>>     # libthread_db should fake a single thread with th_unique == NULL.

>> ...

>>

>> Using a breakpoint on check_thread_db_callback we can compare the two

>> scenarios, and find that in the latter case we hit this code in glibc function

>> iterate_thread_list in nptl_db/td_ta_thr_iter.c:

>> ...

>>   if (next == 0 && fake_empty)

>>     {

>>       /* __pthread_initialize_minimal has not run.  There is just the main

>>          thread to return.  We cannot rely on its thread register.  They

>>          sometimes contain garbage that would confuse us, left by the

>>          kernel at exec.  So if it looks like initialization is incomplete,

>>          we only fake a special descriptor for the initial thread.  */

>>       td_thrhandle_t th = { ta, 0 };

>>       return callback (&th, cbdata_p) != 0 ? TD_DBERR : TD_OK;

>>     }

>> ...

>> while in the former case we don't because this preceding statement doesn't

>> result in next == 0:

>> ...

>>   err = DB_GET_FIELD (next, ta, head, list_t, next, 0);

>> ...

>>

>> Note that the comment mentions __pthread_initialize_minimal, but in both cases

>> it has already run before we hit the callback, so it's possible the comment is

>> no longer accurate.

>>

>> Anyway, the results do not look wrong, so fix this by updating the regexp

>> patterns to agree with what libthread-db is telling us.

>>

>> Tested on x86_64-linux, both with glibc 2.33 and 2.26.

>>

>> Any comments?

> 

> I got a bit lost in the glibc code, but I am wondering if this change in

> behavior might have been caused by this glibc change:

> 

>   https://gitlab.com/gnutools/glibc/-/commit/1daccf403b1bd86370eb94edca794dc106d02039

> 


It is, I've bisect the change in behavior to precisely that commit.

> Before this, the stack_user list was global variables.  This is one of

> the two lists that get walked to enumerate the threads.  Since it isn't

> initialized statically, it got placed in .bss and is zeroed-out by

> default, hence why thread-db expected to read next == 0, I guess.

> 

> Or, it may be that the moment that this nptl minimal initialization is

> done has changed, at least relative to where we do our checks.  It looks

> like __pthread_initialize_minimal_internal is called as a constructor.

> Has it always been this way?  By the time we stop on the solib load

> event, have the constructors of the lib ran?

> 


Before the commit, we have:

1. Initial event, hitting _dl_debug_state:
...
Stopped due to shared library event (no libraries added or removed)
...

2. Loading some libs:
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/home/vries/glibc/build/nptl_db/libthread_db.so.1".
Stopped due to shared library event:
  Inferior loaded /home/vries/glibc/build/nptl/libpthread.so.0
    /home/vries/glibc/build/math/libm.so.6
    /home/vries/glibc/build/libc.so.6
...

3. __pthread_initialize_minimal_internal entry, at which point we still
have null __stack_user:
...
Continuing.

Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:227
227     {
(gdb) p __stack_user
$2 = {next = 0x0, prev = 0x0}
...

4. __pthread_initialize_minimal_internal exit, at which point we have
initialized stack_user:
...
(gdb) fin
Run till exit from #0  __pthread_initialize_minimal_internal () at
nptl-init.c:227
_init () at ../sysdeps/x86_64/crtn.S:40
40              addq $8, %rsp
(gdb) p __stack_user
$3 = {next = 0x7ffff7ff2e40, prev = 0x7ffff7ff2e40}
...

After the commit:

1. Initial event, hitting _dl_debug_state:
...
Stopped due to shared library event (no libraries added or removed)
...

2. We hit this list_add which adds the main thread to dl_stack_user:
...
(gdb) up
#1  init_tls (naudit=naudit@entry=0) at rtld.c:804
804       list_add (&THREAD_SELF->list, &GL (dl_stack_user));
...

3. Loading some libs:
...
(gdb) c
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/home/vries/glibc/build/nptl_db/libthread_db.so.1".
Stopped due to shared library event:
  Inferior loaded /home/vries/glibc/build/nptl/libpthread.so.0
    /home/vries/glibc/build/math/libm.so.6
    /home/vries/glibc/build/libc.so.6
...

4. __pthread_initialize_minimal_internal entry:
...
Continuing.

Breakpoint 2, __pthread_initialize_minimal_internal () at nptl-init.c:227
227     {
...

So, it doesn't look like the moment when
__pthread_initialize_minimal_internal is called changed, just the
location of where stack_user is initialized.

I'll try to update the log message a bit and resubmit.

Thanks,
- Tom

> In any case, I agree that this looks benign, the regexp checks we have

> exactly one thread.  But I would still be curious to know the real

> reason for the change in behavior.

> 

> Simon

>
Tom de Vries July 13, 2021, 1:15 p.m. | #3
On 7/13/21 2:59 PM, Tom de Vries wrote:
> I'll try to update the log message a bit and resubmit.


Any further comments?

Thanks,
- Tom
[gdb/testsuite] Fix check-libthread-db.exp FAILs with glibc 2.33

When running test-case gdb.threads/check-libthread-db.exp on openSUSE
Tumbleweed with glibc 2.33, I get:
...
(gdb) maint check libthread-db^M
Running libthread_db integrity checks:^M
  Got thread 0x7ffff7c79b80 => 9354 => 0x7ffff7c79b80; errno = 0 ... OK^M
libthread_db integrity checks passed.^M
(gdb) FAIL: gdb.threads/check-libthread-db.exp: user-initiated check: \
  libpthread.so not initialized (pattern 2)
...

The test-case expects instead:
...
  Got thread 0x0 => 9354 => 0x0 ... OK^M
...
which is what I get on openSUSE Leap 15.2 with glibc 2.26, and what is
described in the test-case like this:
...
    # libthread_db should fake a single thread with th_unique == NULL.
...

Using a breakpoint on check_thread_db_callback we can compare the two
scenarios, and find that in the latter case we hit this code in glibc function
iterate_thread_list in nptl_db/td_ta_thr_iter.c:
...
  if (next == 0 && fake_empty)
    {
      /* __pthread_initialize_minimal has not run.  There is just the main
         thread to return.  We cannot rely on its thread register.  They
         sometimes contain garbage that would confuse us, left by the
         kernel at exec.  So if it looks like initialization is incomplete,
         we only fake a special descriptor for the initial thread.  */
      td_thrhandle_t th = { ta, 0 };
      return callback (&th, cbdata_p) != 0 ? TD_DBERR : TD_OK;
    }
...
while in the former case we don't because this preceding statement doesn't
result in next == 0:
...
  err = DB_GET_FIELD (next, ta, head, list_t, next, 0);
...

Note that the comment mentions __pthread_initialize_minimal, but in both cases
it has already run before we hit the callback, so it's possible the comment is
no longer accurate.

The change in behaviour bisect to glibc commit 1daccf403b "nptl: Move stack
list variables into _rtld_global", which moves the initialization of stack
list variables such as __stack_user to an earlier moment, which explains well
enough the observed difference.

Fix this by updating the regexp patterns to agree with what libthread-db is
telling us.

Tested on x86_64-linux, both with glibc 2.33 and 2.26.

gdb/testsuite/ChangeLog:

2021-07-07  Tom de Vries  <tdevries@suse.de>

	PR testsuite/27690
	* gdb.threads/check-libthread-db.exp: Update patterns for glibc 2.33.

---
 gdb/testsuite/gdb.threads/check-libthread-db.exp | 37 ++++++++++++++----------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/check-libthread-db.exp b/gdb/testsuite/gdb.threads/check-libthread-db.exp
index dfb3bad3712..6ecf40db6a9 100644
--- a/gdb/testsuite/gdb.threads/check-libthread-db.exp
+++ b/gdb/testsuite/gdb.threads/check-libthread-db.exp
@@ -32,6 +32,13 @@ if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" \
     return -1
 }
 
+# Case 1: libthread_db fakes a single thread with th_unique == NULL.
+set thread_re1 "0 => $decimal => 0"
+# Case 2: libthread_db already initialized.
+set thread_re2 "$hex => $decimal => ${hex}(; errno = 0)?"
+# Match either of the two cases.
+set initial_thread_re "($thread_re1|$thread_re2)"
+
 with_test_prefix "user-initiated check" {
 
     # User-initiated check with libthread_db not loaded.
@@ -49,18 +56,16 @@ with_test_prefix "user-initiated check" {
 	"No libthread_db loaded" \
 	"no libpthread.so loaded"
 
-
-    # User-initiated check with NPTL uninitialized.
-    # libthread_db should fake a single thread with th_unique == NULL.
+    # User-initiated check with NPTL possibly uninitialized.
     gdb_test "continue" \
 	".*Stopped due to shared library event.*Inferior loaded .*libpthread.*"
 
     gdb_test_sequence "maint check libthread-db" \
-	"libpthread.so not initialized" {
-	    "\[\r\n\]+Running libthread_db integrity checks:"
-	    "\[\r\n\]+\[ \]+Got thread 0x0 => \[0-9\]+ => 0x0 ... OK"
-	    "\[\r\n\]+libthread_db integrity checks passed."
-	}
+	"libpthread.so possibly not initialized" \
+	[list \
+	     "\[\r\n\]+Running libthread_db integrity checks:" \
+	     "\[\r\n\]+\[ \]+Got thread $initial_thread_re ... OK" \
+	     "\[\r\n\]+libthread_db integrity checks passed."]
 
     # User-initiated check with NPTL fully operational.
     gdb_test_no_output "set stop-on-solib-events 0"
@@ -78,8 +83,8 @@ with_test_prefix "user-initiated check" {
 
 with_test_prefix "automated load-time check" {
 
-    # Automated load-time check with NPTL uninitialized.
-    with_test_prefix "libpthread.so not initialized" {
+    # Automated load-time check with NPTL possibly uninitialized.
+    with_test_prefix "libpthread.so possibly not initialized" {
 	clean_restart ${binfile}
 
 	gdb_test_no_output "maint set check-libthread-db 1"
@@ -88,12 +93,12 @@ with_test_prefix "automated load-time check" {
 	gdb_run_cmd
 
 	gdb_test_sequence "" \
-	    "check debug libthread-db output" {
-		"\[\r\n\]+Running libthread_db integrity checks:"
-		"\[\r\n\]+\[ \]+Got thread 0x0 => \[0-9\]+ => 0x0 ... OK"
-		"\[\r\n\]+libthread_db integrity checks passed."
-		"\[\r\n\]+[Thread debugging using libthread_db enabled]"
-	    }
+	    "check debug libthread-db output" \
+	    [list \
+		 "\[\r\n\]+Running libthread_db integrity checks:" \
+		 "\[\r\n\]+\[ \]+Got thread $initial_thread_re ... OK" \
+		 "\[\r\n\]+libthread_db integrity checks passed." \
+		 "\[\r\n\]+\\\[Thread debugging using libthread_db enabled\\\]"]
     }
 
     # Automated load-time check with NPTL fully operational.
will schmidt via Gdb-patches July 13, 2021, 1:31 p.m. | #4
On 2021-07-13 9:15 a.m., Tom de Vries wrote:
> On 7/13/21 2:59 PM, Tom de Vries wrote:

>> I'll try to update the log message a bit and resubmit.

> 

> Any further comments?

> 

> Thanks,

> - Tom


You explanation of the sequence of events was very clear, thanks.  This
LGTM.

Simon

Patch

diff --git a/gdb/testsuite/gdb.threads/check-libthread-db.exp b/gdb/testsuite/gdb.threads/check-libthread-db.exp
index dfb3bad3712..6ecf40db6a9 100644
--- a/gdb/testsuite/gdb.threads/check-libthread-db.exp
+++ b/gdb/testsuite/gdb.threads/check-libthread-db.exp
@@ -32,6 +32,13 @@  if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" \
     return -1
 }
 
+# Case 1: libthread_db fakes a single thread with th_unique == NULL.
+set thread_re1 "0 => $decimal => 0"
+# Case 2: libthread_db already initialized.
+set thread_re2 "$hex => $decimal => ${hex}(; errno = 0)?"
+# Match either of the two cases.
+set initial_thread_re "($thread_re1|$thread_re2)"
+
 with_test_prefix "user-initiated check" {
 
     # User-initiated check with libthread_db not loaded.
@@ -49,18 +56,16 @@  with_test_prefix "user-initiated check" {
 	"No libthread_db loaded" \
 	"no libpthread.so loaded"
 
-
-    # User-initiated check with NPTL uninitialized.
-    # libthread_db should fake a single thread with th_unique == NULL.
+    # User-initiated check with NPTL possibly uninitialized.
     gdb_test "continue" \
 	".*Stopped due to shared library event.*Inferior loaded .*libpthread.*"
 
     gdb_test_sequence "maint check libthread-db" \
-	"libpthread.so not initialized" {
-	    "\[\r\n\]+Running libthread_db integrity checks:"
-	    "\[\r\n\]+\[ \]+Got thread 0x0 => \[0-9\]+ => 0x0 ... OK"
-	    "\[\r\n\]+libthread_db integrity checks passed."
-	}
+	"libpthread.so possibly not initialized" \
+	[list \
+	     "\[\r\n\]+Running libthread_db integrity checks:" \
+	     "\[\r\n\]+\[ \]+Got thread $initial_thread_re ... OK" \
+	     "\[\r\n\]+libthread_db integrity checks passed."]
 
     # User-initiated check with NPTL fully operational.
     gdb_test_no_output "set stop-on-solib-events 0"
@@ -78,8 +83,8 @@  with_test_prefix "user-initiated check" {
 
 with_test_prefix "automated load-time check" {
 
-    # Automated load-time check with NPTL uninitialized.
-    with_test_prefix "libpthread.so not initialized" {
+    # Automated load-time check with NPTL possibly uninitialized.
+    with_test_prefix "libpthread.so possibly not initialized" {
 	clean_restart ${binfile}
 
 	gdb_test_no_output "maint set check-libthread-db 1"
@@ -88,12 +93,12 @@  with_test_prefix "automated load-time check" {
 	gdb_run_cmd
 
 	gdb_test_sequence "" \
-	    "check debug libthread-db output" {
-		"\[\r\n\]+Running libthread_db integrity checks:"
-		"\[\r\n\]+\[ \]+Got thread 0x0 => \[0-9\]+ => 0x0 ... OK"
-		"\[\r\n\]+libthread_db integrity checks passed."
-		"\[\r\n\]+[Thread debugging using libthread_db enabled]"
-	    }
+	    "check debug libthread-db output" \
+	    [list \
+		 "\[\r\n\]+Running libthread_db integrity checks:" \
+		 "\[\r\n\]+\[ \]+Got thread $initial_thread_re ... OK" \
+		 "\[\r\n\]+libthread_db integrity checks passed." \
+		 "\[\r\n\]+\\\[Thread debugging using libthread_db enabled\\\]"]
     }
 
     # Automated load-time check with NPTL fully operational.