nscd: bump GC cycle during cache pruning (bug 26130)

Message ID mvmeeqd3hrd.fsf@suse.de
State Superseded
Headers show
Series
  • nscd: bump GC cycle during cache pruning (bug 26130)
Related show

Commit Message

Andreas Schwab June 17, 2020, 2:10 p.m.
While nscd prunes a cache it becomes inconsistent temporarily, which is
visible to clients if that cache is shared.  Bump the GC cycle counter so
that the clients notice the modification window.
---
 nscd/cache.c | 9 +++++++++
 1 file changed, 9 insertions(+)

-- 
2.26.2


-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Comments

Adhemerval Zanella via Libc-alpha June 29, 2020, 8:27 a.m. | #1
* Andreas Schwab:

> While nscd prunes a cache it becomes inconsistent temporarily, which is

> visible to clients if that cache is shared.  Bump the GC cycle counter so

> that the clients notice the modification window.

> ---

>  nscd/cache.c | 9 +++++++++

>  1 file changed, 9 insertions(+)

>

> diff --git a/nscd/cache.c b/nscd/cache.c

> index 94163d9ce3..2ceac94c23 100644

> --- a/nscd/cache.c

> +++ b/nscd/cache.c

> @@ -453,6 +453,11 @@ prune_cache (struct database_dyn *table, time_t now, int fd)

>  	  pthread_rwlock_wrlock (&table->lock);

>  	}

>  

> +      /* Now we start modifying the data.  Make sure all readers of the

> +	 data are aware of this and temporarily don't use the data.  */

> +      ++table->head->gc_cycle;

> +      assert ((table->head->gc_cycle & 1) == 1);

> +

>        while (first <= last)

>  	{

>  	  if (mark[first])

> @@ -493,6 +498,10 @@ prune_cache (struct database_dyn *table, time_t now, int fd)

>  	  ++first;

>  	}

>  

> +      /* Now we are done modifying the data.  */

> +      ++table->head->gc_cycle;

> +      assert ((table->head->gc_cycle & 1) == 0);

> +

>        /* It's all done.  */

>        pthread_rwlock_unlock (&table->lock);


I think this needs barriers after and before the increments.

Thanks,
Florian
Andreas Schwab June 29, 2020, 10:35 a.m. | #2
On Jun 29 2020, Florian Weimer wrote:

> I think this needs barriers after and before the increments.


Why doesn't gc need those barriers?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Adhemerval Zanella via Libc-alpha June 29, 2020, 10:48 a.m. | #3
* Andreas Schwab:

> On Jun 29 2020, Florian Weimer wrote:

>

>> I think this needs barriers after and before the increments.

>

> Why doesn't gc need those barriers?


I think it needs them as well.  I thought we had them there. 8-(

Thanks,
Florian
Andreas Schwab June 29, 2020, 2:09 p.m. | #4
On Jun 29 2020, Florian Weimer via Libc-alpha wrote:

> * Andreas Schwab:

>

>> On Jun 29 2020, Florian Weimer wrote:

>>

>>> I think this needs barriers after and before the increments.

>>

>> Why doesn't gc need those barriers?

>

> I think it needs them as well.  I thought we had them there. 8-(


Should they use atomic_increment?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Adhemerval Zanella via Libc-alpha June 29, 2020, 2:14 p.m. | #5
On 6/29/20 10:09 AM, Andreas Schwab wrote:
> On Jun 29 2020, Florian Weimer via Libc-alpha wrote:

> 

>> * Andreas Schwab:

>>

>>> On Jun 29 2020, Florian Weimer wrote:

>>>

>>>> I think this needs barriers after and before the increments.

>>>

>>> Why doesn't gc need those barriers?

>>

>> I think it needs them as well.  I thought we had them there. 8-(

> 

> Should they use atomic_increment?


There are many concurrency problems with the cache, and we do see
these issues with downstream customers. Torvald reviewed this at
one point and produced a patch that I've attached to bug 25888.

Our latest attempt to fix this is here:
https://sourceware.org/bugzilla/show_bug.cgi?id=25888
https://sourceware.org/bugzilla/attachment.cgi?id=12493

-- 
Cheers,
Carlos.
Adhemerval Zanella via Libc-alpha June 29, 2020, 2:16 p.m. | #6
* Andreas Schwab:

> On Jun 29 2020, Florian Weimer via Libc-alpha wrote:

>

>> * Andreas Schwab:

>>

>>> On Jun 29 2020, Florian Weimer wrote:

>>>

>>>> I think this needs barriers after and before the increments.

>>>

>>> Why doesn't gc need those barriers?

>>

>> I think it needs them as well.  I thought we had them there. 8-(

>

> Should they use atomic_increment?


Yes, that as well.  I don't know if the legacy atomic_increment function
implies a barrier.  I think we need the equivalent of __atomic_fetch_add
(or __atomic_add_fetch) with __ATOMIC_ACQ_REL, and our <atomic.h> does
not have this, and we are not supposed to use the compiler built-ins.

atomic_fetch_add_relaxed together with atomic_full_barrier should work,
though.

Thanks,
Florian
Andreas Schwab June 29, 2020, 2:20 p.m. | #7
On Jun 29 2020, Carlos O'Donell via Libc-alpha wrote:

> https://sourceware.org/bugzilla/attachment.cgi?id=12493


That doesn't fix this bug.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Patch

diff --git a/nscd/cache.c b/nscd/cache.c
index 94163d9ce3..2ceac94c23 100644
--- a/nscd/cache.c
+++ b/nscd/cache.c
@@ -453,6 +453,11 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 	  pthread_rwlock_wrlock (&table->lock);
 	}
 
+      /* Now we start modifying the data.  Make sure all readers of the
+	 data are aware of this and temporarily don't use the data.  */
+      ++table->head->gc_cycle;
+      assert ((table->head->gc_cycle & 1) == 1);
+
       while (first <= last)
 	{
 	  if (mark[first])
@@ -493,6 +498,10 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 	  ++first;
 	}
 
+      /* Now we are done modifying the data.  */
+      ++table->head->gc_cycle;
+      assert ((table->head->gc_cycle & 1) == 0);
+
       /* It's all done.  */
       pthread_rwlock_unlock (&table->lock);