[05/11] nptl: Deallocate the thread stack on setup failure (BZ #19511)

Message ID 20210526165728.1772546-6-adhemerval.zanella@linaro.org
State New
Headers show
Series
  • nptl: pthread cancellation refactor
Related show

Commit Message

Florian Weimer via Libc-alpha May 26, 2021, 4:57 p.m.
To setup either the thread scheduling parameters or affinity,
pthread_create enforce synchronization on created thread wait until
its parent either unlock the 'struct pthread' or send a cancellation
signal if a failure occurs.

However, cancelling the thread does not deallocate the newly created
stack since cancellation expects that a pthread_join to deallocate any
allocated thread resouces (threads stack or TLS).

I used a different strategy than the one proposed on BZ#19511
comment #4.  Making the parent responsible for the cleanup requires
additional synchronization similar to what pthread_join does.  Instead,
this patch reassigns an unused 'struct pthread' member,
parent_cancelhandling_unsed, to indicate whether the setup has failed
and set the thread itself to deallocate the allocated resouces (similar
to what detached mode does).

This strategy also simplifies by not using thread cancellation and
thus not running libgcc_so load in the signal handler (which is
avoided in thread cancellation since 'pthread_cancel' is the one
responsible to dlopen libgcc_s).

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
---
 nptl/allocatestack.c  |  1 +
 nptl/descr.h          |  5 ++--
 nptl/pthread_create.c | 56 +++++++++++++++++++------------------------
 3 files changed, 29 insertions(+), 33 deletions(-)

-- 
2.30.2

Comments

Florian Weimer via Libc-alpha May 26, 2021, 5:33 p.m. | #1
* Adhemerval Zanella via Libc-alpha:

> @@ -819,9 +809,13 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,

>  	   CONCURRENCY NOTES above).  We can assert that STOPPED_START

>  	   must have been true because thread creation didn't fail, but

>  	   thread attribute setting did.  */

> -	/* See bug 19511 which explains why doing nothing here is a

> -	   resource leak for a joinable thread.  */

> -	assert (stopped_start);

> +        {

> +	  assert (stopped_start);

> +          if (stopped_start)

> +  	    /* State (a), we own PD. The thread blocked on this lock will

> +               finish due setup_failed being set.  */

> +	    lll_unlock (pd->lock, LLL_PRIVATE);

> +        }

>        else

>  	{

>  	  /* State (e) and we have ownership of PD (see CONCURRENCY


The assert followed by the if doesn't look right.  One should check
setup_failed, I assume.  Is there any way to trigger the broken assert
or the hang from the missing wakeup?

I happend to have looked at this today for the sigaltstack stuff.

I felt that it would be simpler to perform the kernel-side setup
(affinity and scheduling policy) on the new thread.  Essentially use
stopped_start mode unconditionally.  And use a separate lock word for
the barrier/latch, instead of reusing the pd->lock.

The advantage is that we can pass an arbitrary temporary object into the
internal start function, such as the signal mask.  The creating thread
blocks until the initialization has happened and the immediate fate of
the thread has been decided.  After that point the creating thread
deallocates the temporary object, and if the thread couldn't start, any
thread-related data.  Otherwise the new thread takes ownership of that
data.

I think it will be much easier to explain what is going on in the
concurrency notes.  We have an additional synchronization step on all
created threads, but I doubt that this matters in practice.

What do you think?  I'm not sure if the change should happen as part of
this series.

Thanks,
Florian
Florian Weimer via Libc-alpha May 26, 2021, 5:51 p.m. | #2
On 26/05/2021 14:33, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:

> 

>> @@ -819,9 +809,13 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,

>>  	   CONCURRENCY NOTES above).  We can assert that STOPPED_START

>>  	   must have been true because thread creation didn't fail, but

>>  	   thread attribute setting did.  */

>> -	/* See bug 19511 which explains why doing nothing here is a

>> -	   resource leak for a joinable thread.  */

>> -	assert (stopped_start);

>> +        {

>> +	  assert (stopped_start);

>> +          if (stopped_start)

>> +  	    /* State (a), we own PD. The thread blocked on this lock will

>> +               finish due setup_failed being set.  */

>> +	    lll_unlock (pd->lock, LLL_PRIVATE);

>> +        }

>>        else

>>  	{

>>  	  /* State (e) and we have ownership of PD (see CONCURRENCY

> 

> The assert followed by the if doesn't look right.  One should check

> setup_failed, I assume.  Is there any way to trigger the broken assert

> or the hang from the missing wakeup?


This just check what is described in the CONCURRENCY NOTES:

204    Axioms:                                                                                          
205 
206    * The create_thread function can never set stopped_start to false.                               
207    * The created thread can read stopped_start but *never write to it*.    

> 

> I happend to have looked at this today for the sigaltstack stuff.

> 

> I felt that it would be simpler to perform the kernel-side setup

> (affinity and scheduling policy) on the new thread.  Essentially use

> stopped_start mode unconditionally.  And use a separate lock word for

> the barrier/latch, instead of reusing the pd->lock.

> 

> The advantage is that we can pass an arbitrary temporary object into the

> internal start function, such as the signal mask.  The creating thread

> blocks until the initialization has happened and the immediate fate of

> the thread has been decided.  After that point the creating thread

> deallocates the temporary object, and if the thread couldn't start, any

> thread-related data.  Otherwise the new thread takes ownership of that

> data.

> 

> I think it will be much easier to explain what is going on in the

> concurrency notes.  We have an additional synchronization step on all

> created threads, but I doubt that this matters in practice.

> 

> What do you think?  I'm not sure if the change should happen as part of

> this series.


I see two potential issues on this approach:

  1. The cpuset information is tied to the pthread_attr_t and can be
     potentially large, it means that if we want to make the created
     thread to consume we will need to either duplicate, add something
     like a reference count, or add more synchronization for the 
     consumer.  None is really simpler to implement and I don't think 
     it is really an improvement.

  2. For both schedule parameters and affinity ideally we want to inform
     right away to user if the operation has failed.  If we move it to
     thread itself, it still would require the synchronization to inform
     pthread_create the operation has failed.  I don't see much improvement,
     the pthread_create latency will be essentially the same.

After reading the CONCURRENCY NOTES (great work on this btw), I think the
main issue on the resource deallocation here is mixing pthread cancellation
and thread creation.
Florian Weimer via Libc-alpha May 26, 2021, 5:58 p.m. | #3
* Adhemerval Zanella:

> On 26/05/2021 14:33, Florian Weimer wrote:

>> * Adhemerval Zanella via Libc-alpha:

>> 

>>> @@ -819,9 +809,13 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,

>>>  	   CONCURRENCY NOTES above).  We can assert that STOPPED_START

>>>  	   must have been true because thread creation didn't fail, but

>>>  	   thread attribute setting did.  */

>>> -	/* See bug 19511 which explains why doing nothing here is a

>>> -	   resource leak for a joinable thread.  */

>>> -	assert (stopped_start);

>>> +        {

>>> +	  assert (stopped_start);

>>> +          if (stopped_start)

>>> +  	    /* State (a), we own PD. The thread blocked on this lock will

>>> +               finish due setup_failed being set.  */

>>> +	    lll_unlock (pd->lock, LLL_PRIVATE);

>>> +        }

>>>        else

>>>  	{

>>>  	  /* State (e) and we have ownership of PD (see CONCURRENCY

>> 

>> The assert followed by the if doesn't look right.  One should check

>> setup_failed, I assume.  Is there any way to trigger the broken assert

>> or the hang from the missing wakeup?

>

> This just check what is described in the CONCURRENCY NOTES:

>

> 204    Axioms:                                                                                          

> 205 

> 206    * The create_thread function can never set stopped_start to false.                               

> 207    * The created thread can read stopped_start but *never write to it*.   


I meant this:

+	  assert (stopped_start);
+         if (stopped_start)

It doesn't look right to me.

>> I happend to have looked at this today for the sigaltstack stuff.

>> 

>> I felt that it would be simpler to perform the kernel-side setup

>> (affinity and scheduling policy) on the new thread.  Essentially use

>> stopped_start mode unconditionally.  And use a separate lock word for

>> the barrier/latch, instead of reusing the pd->lock.

>> 

>> The advantage is that we can pass an arbitrary temporary object into the

>> internal start function, such as the signal mask.  The creating thread

>> blocks until the initialization has happened and the immediate fate of

>> the thread has been decided.  After that point the creating thread

>> deallocates the temporary object, and if the thread couldn't start, any

>> thread-related data.  Otherwise the new thread takes ownership of that

>> data.

>> 

>> I think it will be much easier to explain what is going on in the

>> concurrency notes.  We have an additional synchronization step on all

>> created threads, but I doubt that this matters in practice.

>> 

>> What do you think?  I'm not sure if the change should happen as part of

>> this series.

>

> I see two potential issues on this approach:

>

>   1. The cpuset information is tied to the pthread_attr_t and can be

>      potentially large, it means that if we want to make the created

>      thread to consume we will need to either duplicate, add something

>      like a reference count, or add more synchronization for the 

>      consumer.  None is really simpler to implement and I don't think 

>      it is really an improvement.


No, I think that's covered by the temporary object thing.  The creating
thread will block until it's guaranteed that the new thread won't need
this data anymore.

>   2. For both schedule parameters and affinity ideally we want to inform

>      right away to user if the operation has failed.  If we move it to

>      thread itself, it still would require the synchronization to inform

>      pthread_create the operation has failed.  I don't see much improvement,

>      the pthread_create latency will be essentially the same.


Correct.  But it will be much simpler because we can treat all
kernel-side initialization the same.  For example, sigaltstack might
fail with invalid flags, too.  And we'd definitely have to run that on
the new thread.

Thanks,
Florian
Andreas Schwab May 26, 2021, 6:21 p.m. | #4
Please use consistent indentation.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Florian Weimer via Libc-alpha May 26, 2021, 6:40 p.m. | #5
On 26/05/2021 15:21, Andreas Schwab wrote:
> Please use consistent indentation.

> 

> Andreas.

> 


I think I used whitespaces only, I will fix it.
Florian Weimer via Libc-alpha May 26, 2021, 7:19 p.m. | #6
On 26/05/2021 14:58, Florian Weimer wrote:
> * Adhemerval Zanella:

> 

>> On 26/05/2021 14:33, Florian Weimer wrote:

>>> * Adhemerval Zanella via Libc-alpha:

>>>

>>>> @@ -819,9 +809,13 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,

>>>>  	   CONCURRENCY NOTES above).  We can assert that STOPPED_START

>>>>  	   must have been true because thread creation didn't fail, but

>>>>  	   thread attribute setting did.  */

>>>> -	/* See bug 19511 which explains why doing nothing here is a

>>>> -	   resource leak for a joinable thread.  */

>>>> -	assert (stopped_start);

>>>> +        {

>>>> +	  assert (stopped_start);

>>>> +          if (stopped_start)

>>>> +  	    /* State (a), we own PD. The thread blocked on this lock will

>>>> +               finish due setup_failed being set.  */

>>>> +	    lll_unlock (pd->lock, LLL_PRIVATE);

>>>> +        }

>>>>        else

>>>>  	{

>>>>  	  /* State (e) and we have ownership of PD (see CONCURRENCY

>>>

>>> The assert followed by the if doesn't look right.  One should check

>>> setup_failed, I assume.  Is there any way to trigger the broken assert

>>> or the hang from the missing wakeup?

>>

>> This just check what is described in the CONCURRENCY NOTES:

>>

>> 204    Axioms:                                                                                          

>> 205 

>> 206    * The create_thread function can never set stopped_start to false.                               

>> 207    * The created thread can read stopped_start but *never write to it*.   

> 

> I meant this:

> 

> +	  assert (stopped_start);

> +         if (stopped_start)

> 

> It doesn't look right to me.


Yeah, the 'if' check is not really required.  It will hit this code path
only if clone has succeeded ('thread_ran' is true) but either 
sched_setaffinity or sched_setscheduler has failed (in this case
'stopped_start' will be always true).

I changed to:

  if (thread_ran)
    {
      /* State (c) or (d) and we may not have PD ownership (see
         CONCURRENCY NOTES above).  We can assert that STOPPED_START
         must have been true because thread creation didn't fail, but
         thread attribute setting did.  */
      assert (stopped_start);
      /* State (a), we own PD. The thread blocked on this lock will
         finish due setup_failed being set.
         The thread will be responsible to cleanup any allocated
         resource.  */
      lll_unlock (pd->lock, LLL_PRIVATE).
    }

> 

>>> I happend to have looked at this today for the sigaltstack stuff.

>>>

>>> I felt that it would be simpler to perform the kernel-side setup

>>> (affinity and scheduling policy) on the new thread.  Essentially use

>>> stopped_start mode unconditionally.  And use a separate lock word for

>>> the barrier/latch, instead of reusing the pd->lock.

>>>

>>> The advantage is that we can pass an arbitrary temporary object into the

>>> internal start function, such as the signal mask.  The creating thread

>>> blocks until the initialization has happened and the immediate fate of

>>> the thread has been decided.  After that point the creating thread

>>> deallocates the temporary object, and if the thread couldn't start, any

>>> thread-related data.  Otherwise the new thread takes ownership of that

>>> data.

>>>

>>> I think it will be much easier to explain what is going on in the

>>> concurrency notes.  We have an additional synchronization step on all

>>> created threads, but I doubt that this matters in practice.

>>>

>>> What do you think?  I'm not sure if the change should happen as part of

>>> this series.

>>

>> I see two potential issues on this approach:

>>

>>   1. The cpuset information is tied to the pthread_attr_t and can be

>>      potentially large, it means that if we want to make the created

>>      thread to consume we will need to either duplicate, add something

>>      like a reference count, or add more synchronization for the 

>>      consumer.  None is really simpler to implement and I don't think 

>>      it is really an improvement.

> 

> No, I think that's covered by the temporary object thing.  The creating

> thread will block until it's guaranteed that the new thread won't need

> this data anymore.

> 

>>   2. For both schedule parameters and affinity ideally we want to inform

>>      right away to user if the operation has failed.  If we move it to

>>      thread itself, it still would require the synchronization to inform

>>      pthread_create the operation has failed.  I don't see much improvement,

>>      the pthread_create latency will be essentially the same.

> 

> Correct.  But it will be much simpler because we can treat all

> kernel-side initialization the same.  For example, sigaltstack might

> fail with invalid flags, too.  And we'd definitely have to run that on

> the new thread.


The small downside is we will need to consume some of the thread own stack
to pass such information, but it can be only a pointer in any case.

But I still see setting the scheduling/affinity synchronous to have some
advantage: we can fast exit in the case of failure and join the thread
on pthread_create (instead of making it act a detached one).  This make
the failure case not 'leak' the thread context and resource out of
pthread_create.

It might be somewhat complex to code it right now, since we need to take
care of destructor for thread_local variables and thread-local data,
__libc_thread_freeres, etc.

That is also why I used the current unwind mechanism to exit in case of
setup failure (maybe instead of adding the setup_failed on the IS_DETACHED
check the patch could just setup the thread as detached, the pd->joinid
could be set non-atomically since it owns the pd).

In any case, since we do not implement this I don't have a preference.
Your arbitrary temporary object seems ok and should work on the
scheduler/affinity setup.
Florian Weimer via Libc-alpha May 26, 2021, 7:26 p.m. | #7
On 26/05/2021 13:57, Adhemerval Zanella wrote:
> @@ -435,10 +424,11 @@ start_thread (void *arg)

>  

>  	  /* And give it up right away.  */

>  	  lll_unlock (pd->lock, LLL_PRIVATE);

> -

> -	  LIBC_CANCEL_RESET (oldtype);

>  	}

>  

> +      if (pd->setup_failed == 1)

> +        __do_cancel ();

> +

>        LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg);

>  

>        /* Run the code the user provided.  */


And this is wrong, the created thread does not own the PD anymore
since for setup_failed == 1, pd->stopped_start will be true.

We need to call __do_cancel in the if above.
Florian Weimer via Libc-alpha May 27, 2021, 7:43 a.m. | #8
* Adhemerval Zanella:

> On 26/05/2021 13:57, Adhemerval Zanella wrote:

>> @@ -435,10 +424,11 @@ start_thread (void *arg)

>>  

>>  	  /* And give it up right away.  */

>>  	  lll_unlock (pd->lock, LLL_PRIVATE);

>> -

>> -	  LIBC_CANCEL_RESET (oldtype);

>>  	}

>>  

>> +      if (pd->setup_failed == 1)

>> +        __do_cancel ();

>> +

>>        LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg);

>>  

>>        /* Run the code the user provided.  */

>

> And this is wrong, the created thread does not own the PD anymore

> since for setup_failed == 1, pd->stopped_start will be true.

>

> We need to call __do_cancel in the if above.


Okay, I will wait for the new version before I start a real review.

Thanks,
Florian

Patch

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index dc81a2ca73..2114bd2e27 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -161,6 +161,7 @@  get_cached_stack (size_t *sizep, void **memp)
   /* Cancellation handling is back to the default.  */
   result->cancelhandling = 0;
   result->cleanup = NULL;
+  result->setup_failed = 0;
 
   /* No pending event.  */
   result->nextevent = NULL;
diff --git a/nptl/descr.h b/nptl/descr.h
index 3de9535449..9d8297b45f 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -340,8 +340,9 @@  struct pthread
   /* True if thread must stop at startup time.  */
   bool stopped_start;
 
-  /* Formerly used for dealing with cancellation.  */
-  int parent_cancelhandling_unsed;
+  /* Indicate that a thread creation setup has failed (for instance the
+     scheduler or affinity).  */
+  int setup_failed;
 
   /* Lock to synchronize access to the descriptor.  */
   int lock;
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index b53e3f30a0..d6b907827a 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -176,8 +176,6 @@  late_init (void)
        state.  Ownership cannot be released to the process via the
        return of pthread_create since a non-zero result entails PD is
        undefined and therefore cannot be joined to free the resources.
-       We privately call pthread_join on the thread to finish handling
-       the resource shutdown (Or at least we should, see bug 19511).
 
    (e) If the thread creation failed and THREAD_RAN is false, then the
        creating thread retains ownership of PD and must cleanup state.
@@ -313,28 +311,19 @@  static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
   /* Now we have the possibility to set scheduling parameters etc.  */
   if (attr != NULL)
     {
-      int res;
-
       /* Set the affinity mask if necessary.  */
       if (need_setaffinity)
 	{
 	  assert (*stopped_start);
 
-	  res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
-				       attr->extension->cpusetsize,
-				       attr->extension->cpuset);
-
+	  int res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
+				           attr->extension->cpusetsize,
+				           attr->extension->cpuset);
 	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
-	  err_out:
-	    {
-	      /* The operation failed.  We have to kill the thread.
-		 We let the normal cancellation mechanism do the work.  */
-
-	      pid_t pid = __getpid ();
-	      INTERNAL_SYSCALL_CALL (tgkill, pid, pd->tid, SIGCANCEL);
-
-	      return INTERNAL_SYSCALL_ERRNO (res);
-	    }
+            {
+              pd->setup_failed = 1;
+              return INTERNAL_SYSCALL_ERRNO (res);
+            }
 	}
 
       /* Set the scheduling parameters.  */
@@ -342,11 +331,13 @@  static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
 	{
 	  assert (*stopped_start);
 
-	  res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
-				       pd->schedpolicy, &pd->schedparam);
-
+	  int res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
+				           pd->schedpolicy, &pd->schedparam);
 	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
-	    goto err_out;
+            {
+              pd->setup_failed = 1;
+              return INTERNAL_SYSCALL_ERRNO (res);
+            }
 	}
     }
 
@@ -426,8 +417,6 @@  start_thread (void *arg)
 	 have ownership (see CONCURRENCY NOTES above).  */
       if (__glibc_unlikely (pd->stopped_start))
 	{
-	  int oldtype = LIBC_CANCEL_ASYNC ();
-
 	  /* Get the lock the parent locked to force synchronization.  */
 	  lll_lock (pd->lock, LLL_PRIVATE);
 
@@ -435,10 +424,11 @@  start_thread (void *arg)
 
 	  /* And give it up right away.  */
 	  lll_unlock (pd->lock, LLL_PRIVATE);
-
-	  LIBC_CANCEL_RESET (oldtype);
 	}
 
+      if (pd->setup_failed == 1)
+        __do_cancel ();
+
       LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg);
 
       /* Run the code the user provided.  */
@@ -563,8 +553,8 @@  start_thread (void *arg)
       pd->setxid_futex = 0;
     }
 
-  /* If the thread is detached free the TCB.  */
-  if (IS_DETACHED (pd))
+  /* If the thread is detached or an setup error occurred, free the TCB.  */
+  if (IS_DETACHED (pd) || pd->setup_failed == 1)
     /* Free the TCB.  */
     __nptl_free_tcb (pd);
 
@@ -819,9 +809,13 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	   CONCURRENCY NOTES above).  We can assert that STOPPED_START
 	   must have been true because thread creation didn't fail, but
 	   thread attribute setting did.  */
-	/* See bug 19511 which explains why doing nothing here is a
-	   resource leak for a joinable thread.  */
-	assert (stopped_start);
+        {
+	  assert (stopped_start);
+          if (stopped_start)
+  	    /* State (a), we own PD. The thread blocked on this lock will
+               finish due setup_failed being set.  */
+	    lll_unlock (pd->lock, LLL_PRIVATE);
+        }
       else
 	{
 	  /* State (e) and we have ownership of PD (see CONCURRENCY