OpenACC 'attach'/'detach' has no business affecting user-visible reference counting (was: [PATCH 07/13] OpenACC 2.6 deep copy: libgomp parts)

Message ID 87k10gr06m.fsf@euler.schwinge.homeip.net
State New
Headers show
Series
  • OpenACC 'attach'/'detach' has no business affecting user-visible reference counting (was: [PATCH 07/13] OpenACC 2.6 deep copy: libgomp parts)
Related show

Commit Message

Thomas Schwinge June 9, 2020, 10:41 a.m.
Hi Julian!

On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com> wrote:
> On Fri, 5 Jun 2020 13:17:09 +0200

> Thomas Schwinge <thomas@codesourcery.com> wrote:

>> On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com>

>> wrote:

>> > This part contains the libgomp runtime support for the

>> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds

>>

>> > --- a/libgomp/target.c

>> > +++ b/libgomp/target.c

>>

>> > @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,

>>

>> > +        case GOMP_MAP_ATTACH:

>> > +          {

>> > +            cur_node.host_start = (uintptr_t) hostaddrs[i];

>> > +            cur_node.host_end = cur_node.host_start + sizeof (void *);

>> > +            splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);

>> > +            if (n != NULL)

>> > +              {

>> > +                tgt->list[i].key = n;

>> > +                tgt->list[i].offset = cur_node.host_start - n->host_start;

>> > +                tgt->list[i].length = n->host_end - n->host_start;

>> > +                tgt->list[i].copy_from = false;

>> > +                tgt->list[i].always_copy_from = false;

>> > +                tgt->list[i].do_detach

>> > +                  = (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);

>> > +                n->refcount++;

>> > +              }

>> > +            else

>> > +              {

>> > +                gomp_mutex_unlock (&devicep->lock);

>> > +                gomp_fatal ("outer struct not mapped for attach");

>> > +              }

>> > +            gomp_attach_pointer (devicep, aq, mem_map, n,

>> > +                                 (uintptr_t) hostaddrs[i], sizes[i],

>> > +                                 cbufp);

>> > +            continue;

>> > +          }

>>

>> For the OpenACC runtime API 'acc_attach' etc. routines they don't, so

>> what's the conceptual reason that for the corresponding OpenACC

>> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in

>> reference counting ('n->refcount++' above)?  I understand OpenACC

>> 'attach'/'detach' clauses to be simple "executable clauses", which

>> just update some values somewhere (say, like

>> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,

>> thus wouldn't appear to need reference counting?

>

> IIUC, n->refcount is not directly the "structural reference count" as

> seen at source level, but rather counts the number of target_var_descs

> in the lists appended to each target_mem_desc -- and GOMP_MAP_ATTACH

> have variable entries in those lists.


That may be OK if that's purely an implementation detail that isn't
visible to the user, however:

> That's not the case for the API

> routines.


As I had mentioned, the problem is: in contrast to 'acc_attach', an
OpenACC 'enter data' directive with 'attach' clause currently uses this
same reference-counted code path, and thus such an 'attach' without
corresponding 'detach' inhibits unmapping; see
'libgomp.oacc-c-c++-common/mdc-refcount-1.c' in the attached patch
"OpenACC 'attach'/'detach' has no business affecting user-visible
reference counting".

That patch seemed to be the logical next step then, to unify the code
paths for 'acc_attach' and 'enter data' directive with 'attach' clause
(which have to act in the same way).  That's (conceptually) somewhat
similar to what you had proposed as part of
<http://mid.mail-archive.com/b23ea71697f77d8214411a3e1348e9dee496e5a6.1590182783.git.julian@codesourcery.com>.
(But all these things really need to be discussed individually...)

However, that patch regresses
'libgomp.oacc-fortran/deep-copy-6-no_finalize.F90', and also the
'deep-copy-7b2f-2.c', and 'deep-copy-7cf.c' that I'm attaching here.  I
have not yet made an attempts to understand these regressions.  It may be
that a Detach Action actually effects an (attached) device pointer being
copied back to the host, and then disturbing things -- and if that, then
it may be a bug in libgomp, or in the test case.  ;-)


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Comments

Julian Brown June 9, 2020, 12:23 p.m. | #1
On Tue, 9 Jun 2020 12:41:21 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!

> 

> On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com>

> wrote:

> >> For the OpenACC runtime API 'acc_attach' etc. routines they don't,

> >> so what's the conceptual reason that for the corresponding OpenACC

> >> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in

> >> reference counting ('n->refcount++' above)?  I understand OpenACC

> >> 'attach'/'detach' clauses to be simple "executable clauses", which

> >> just update some values somewhere (say, like

> >> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,

> >> thus wouldn't appear to need reference counting?  

> >

> > IIUC, n->refcount is not directly the "structural reference count"

> > as seen at source level, but rather counts the number of

> > target_var_descs in the lists appended to each target_mem_desc --

> > and GOMP_MAP_ATTACH have variable entries in those lists.  

> 

> That may be OK if that's purely an implementation detail that isn't

> visible to the user, however:

> 

> > That's not the case for the API

> > routines.  

> 

> As I had mentioned, the problem is: in contrast to 'acc_attach', an

> OpenACC 'enter data' directive with 'attach' clause currently uses

> this same reference-counted code path, and thus such an 'attach'

> without corresponding 'detach' inhibits unmapping; see

> 'libgomp.oacc-c-c++-common/mdc-refcount-1.c' in the attached patch

> "OpenACC 'attach'/'detach' has no business affecting user-visible

> reference counting".


Hmm, right. That's quite a problem from an implementation perspective:
the "attach" clause in the target_mem_desc's var list is what triggers
the "detach" operation (for structured data lifetimes). Having those
references "not count" is quite an ugly wrinkle.

I'll think about that some more...

> That patch seemed to be the logical next step then, to unify the code

> paths for 'acc_attach' and 'enter data' directive with 'attach' clause

> (which have to act in the same way).  That's (conceptually) somewhat

> similar to what you had proposed as part of

> <http://mid.mail-archive.com/b23ea71697f77d8214411a3e1348e9dee496e5a6.1590182783.git.julian@codesourcery.com>.

> (But all these things really need to be discussed individually...)

> 

> However, that patch regresses

> 'libgomp.oacc-fortran/deep-copy-6-no_finalize.F90', and also the

> 'deep-copy-7b2f-2.c', and 'deep-copy-7cf.c' that I'm attaching here.

> I have not yet made an attempts to understand these regressions.  It

> may be that a Detach Action actually effects an (attached) device

> pointer being copied back to the host, and then disturbing things --

> and if that, then it may be a bug in libgomp, or in the test case.

> ;-)


I haven't (even) quite absorbed what you are trying to test with the "no
finalize" version of the deep-copy-6.f90 test case... I probably need
to go back and re-read the spec. IIRC, my understanding was that
copying out a data item that still has multiple attachments would *not*
automatically perform a detachment. Thus, attaches & detaches have to
balance (at least without "finalize"). But maybe I was wrong about that!

Thanks,

Julian
Julian Brown June 18, 2020, 6:21 p.m. | #2
Hi!

On Tue, 9 Jun 2020 12:41:21 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!

> 

> On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com>

> wrote:

> > On Fri, 5 Jun 2020 13:17:09 +0200

> > Thomas Schwinge <thomas@codesourcery.com> wrote:  

> >> On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com>

> >> wrote:  

> >> > This part contains the libgomp runtime support for the

> >> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds    

> >>   

> >> > --- a/libgomp/target.c

> >> > +++ b/libgomp/target.c    

> >>   

> >> > @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct

> >> > gomp_device_descr *devicep,    

> >>   

> >> > +	      case GOMP_MAP_ATTACH:

> >> > +		{

> >> > +		  cur_node.host_start = (uintptr_t)

> >> > hostaddrs[i];

> >> > +		  cur_node.host_end = cur_node.host_start +

> >> > sizeof (void *);

> >> > +		  splay_tree_key n = splay_tree_lookup

> >> > (mem_map, &cur_node);

> >> > +		  if (n != NULL)

> >> > +		    {

> >> > +		      tgt->list[i].key = n;

> >> > +		      tgt->list[i].offset = cur_node.host_start

> >> > - n->host_start;

> >> > +		      tgt->list[i].length = n->host_end -

> >> > n->host_start;

> >> > +		      tgt->list[i].copy_from = false;

> >> > +		      tgt->list[i].always_copy_from = false;

> >> > +		      tgt->list[i].do_detach

> >> > +			= (pragma_kind !=

> >> > GOMP_MAP_VARS_OPENACC_ENTER_DATA);

> >> > +		      n->refcount++;

> >> > +		    }

> >> > +		  else

> >> > +		    {

> >> > +		      gomp_mutex_unlock (&devicep->lock);

> >> > +		      gomp_fatal ("outer struct not mapped for

> >> > attach");

> >> > +		    }

> >> > +		  gomp_attach_pointer (devicep, aq, mem_map, n,

> >> > +				       (uintptr_t)

> >> > hostaddrs[i], sizes[i],

> >> > +				       cbufp);

> >> > +		  continue;

> >> > +		}    

> >> 

> >> For the OpenACC runtime API 'acc_attach' etc. routines they don't,

> >> so what's the conceptual reason that for the corresponding OpenACC

> >> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in

> >> reference counting ('n->refcount++' above)?  I understand OpenACC

> >> 'attach'/'detach' clauses to be simple "executable clauses", which

> >> just update some values somewhere (say, like

> >> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,

> >> thus wouldn't appear to need reference counting?  

> >

> > IIUC, n->refcount is not directly the "structural reference count"

> > as seen at source level, but rather counts the number of

> > target_var_descs in the lists appended to each target_mem_desc --

> > and GOMP_MAP_ATTACH have variable entries in those lists.  

> 

> That may be OK if that's purely an implementation detail that isn't

> visible to the user, however:

> 

> > That's not the case for the API

> > routines.  

> 

> As I had mentioned, the problem is: in contrast to 'acc_attach', an

> OpenACC 'enter data' directive with 'attach' clause currently uses

> this same reference-counted code path, and thus such an 'attach'

> without corresponding 'detach' inhibits unmapping; [...]


The attached patch stops attach/detach operations from affecting
reference counts (either structured or dynamic). This isn't as invasive
as I'd imagined: we can extend the use of the "do_detach" flag in
target_mem_descs' variable lists to mark mappings that correspond to
attach operations, then use that flag to avoid refcount
increment/decrements. (The flag should possibly be renamed now.)

I've modified the refcount self-testing code successfully to work with
this new scheme too, in case that's helpful. I'll send the patches for
that separately.

Tested with offloading to NVPTX. OK?

Thanks,

Julian

ChangeLog

	libgomp/
	* oacc-mem.c (goacc_enter_data_internal): Don't affect
	reference counts for attach mappings.
	(goacc_exit_data_internal): Don't affect reference counts for
	detach mappings.
	* target.c (gomp_map_vars_existing): Don't affect reference
	counts for attach mappings.
	(gomp_map_vars_internal): Set do_detach flag unconditionally to
	mark attach mappings.
	(gomp_unmap_vars_internal): Use above flag to prevent affecting
	reference count for attach mappings.
	* testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90:
	Mark test as shouldfail.
	* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
	gracefully in no-finalize mode.
commit e5fd3efda7f176f035c5ed2e3095d4a49a780864
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Jun 18 05:11:08 2020 -0700

    [OpenACC] Deep copy attach/detach should not affect reference counts
    
            libgomp/
            * oacc-mem.c (goacc_enter_data_internal): Don't affect reference counts
            for attach mappings.
            (goacc_exit_data_internal): Don't affect reference counts for detach
            mappings.
            * target.c (gomp_map_vars_existing): Don't affect reference counts for
            attach mappings.
            (gomp_map_vars_internal): Set do_detach flag unconditionally to mark
            attach mappings.
            (gomp_unmap_vars_internal): Use above flag to prevent affecting
            reference count for attach mappings.
            * testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c: New test.
            * testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c: Likewise.
            * testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark
            test as shouldfail.
            * testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
            gracefully in no-finalize mode.

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index f852652c048..9bb5887fc5e 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1095,8 +1095,11 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 
 	  /* A standalone attach clause.  */
 	  if ((kinds[i] & 0xff) == GOMP_MAP_ATTACH)
-	    gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n,
-				 (uintptr_t) h, s, NULL);
+	    {
+	      gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n,
+				   (uintptr_t) h, s, NULL);
+	      continue;
+	    }
 	  else if (h + s > (void *) n->host_end)
 	    {
 	      gomp_mutex_unlock (&acc_dev->lock);
@@ -1131,7 +1134,9 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	    if (tgt->list[j].key == n)
 	      {
 		for (size_t k = 0; k < groupnum; k++)
-		  if (j + k < tgt->list_count && tgt->list[j + k].key)
+		  if (j + k < tgt->list_count
+		      && tgt->list[j + k].key
+		      && !tgt->list[j + k].do_detach)
 		    {
 		      tgt->list[j + k].key->refcount++;
 		      tgt->list[j + k].key->dynamic_refcount++;
@@ -1156,7 +1161,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	  for (size_t j = 0; j < tgt->list_count; j++)
 	    {
 	      n = tgt->list[j].key;
-	      if (n)
+	      if (n && !tgt->list[j].do_detach)
 		n->dynamic_refcount++;
 	    }
 	}
@@ -1265,14 +1270,10 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	case GOMP_MAP_POINTER:
 	case GOMP_MAP_DELETE:
 	case GOMP_MAP_RELEASE:
-	case GOMP_MAP_DETACH:
-	case GOMP_MAP_FORCE_DETACH:
 	  {
 	    struct splay_tree_key_s cur_node;
 	    size_t size;
-	    if (kind == GOMP_MAP_POINTER
-		|| kind == GOMP_MAP_DETACH
-		|| kind == GOMP_MAP_FORCE_DETACH)
+	    if (kind == GOMP_MAP_POINTER)
 	      size = sizeof (void *);
 	    else
 	      size = sizes[i];
@@ -1339,6 +1340,11 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	     'GOMP_MAP_STRUCT's anymore.  */
 	  break;
 
+	case GOMP_MAP_DETACH:
+	case GOMP_MAP_FORCE_DETACH:
+	  /* These are no-ops here: handled above.  */
+	  break;
+
 	default:
 	  gomp_fatal (">>>> goacc_exit_data_internal UNHANDLED kind 0x%.2x",
 			  kind);
diff --git a/libgomp/target.c b/libgomp/target.c
index 3f2becdae0e..406a1e39d98 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -382,7 +382,7 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep,
 			(void *) newn->host_start,
 			newn->host_end - newn->host_start, cbuf);
 
-  if (oldn->refcount != REFCOUNT_INFINITY)
+  if (oldn->refcount != REFCOUNT_INFINITY && kind != GOMP_MAP_ATTACH)
     oldn->refcount++;
 }
 
@@ -1092,9 +1092,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		      tgt->list[i].length = n->host_end - n->host_start;
 		      tgt->list[i].copy_from = false;
 		      tgt->list[i].always_copy_from = false;
-		      tgt->list[i].do_detach
-			= (pragma_kind != GOMP_MAP_VARS_ENTER_DATA);
-		      n->refcount++;
+		      tgt->list[i].do_detach = true;
 		    }
 		  else
 		    {
@@ -1442,7 +1440,7 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
   for (i = 0; i < tgt->list_count; i++)
     {
       splay_tree_key k = tgt->list[i].key;
-      if (k == NULL)
+      if (k == NULL || tgt->list[i].do_detach)
 	continue;
 
       bool do_unmap = false;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c
new file mode 100644
index 00000000000..bb4d95310e6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c
@@ -0,0 +1,50 @@
+/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
+
+#include <openacc.h>
+#include <assert.h>
+
+#define N 1024
+
+struct mystr {
+  int pad;
+  int *data;
+};
+
+int
+main (int argc, char *argv[])
+{
+  int arr[N];
+  struct mystr s;
+
+  s.data = arr;
+
+  acc_copyin (&s, sizeof (s));
+  acc_create (s.data, N * sizeof (int));
+
+  for (int i = 0; i < 20; i++)
+    {
+#ifdef ATTACH_VIA_DIRECTIVE
+      #pragma acc enter data attach(s.data)
+
+      acc_detach ((void **) &s.data);
+#else
+      acc_attach ((void **) &s.data);
+
+      #pragma acc exit data detach(s.data)
+#endif
+    }
+
+  assert (acc_is_present (arr, N * sizeof (int)));
+  assert (acc_is_present (&s, sizeof (s)));
+
+  acc_delete (arr, N * sizeof (int));
+
+  assert (!acc_is_present (arr, N * sizeof (int)));
+
+  acc_copyout (&s, sizeof (s));
+
+  assert (!acc_is_present (&s, sizeof (s)));
+  assert (s.data == arr);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c
new file mode 100644
index 00000000000..6b5371f0e48
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c
@@ -0,0 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
+/* { dg-additional-options "-DATTACH_VIA_DIRECTIVE" } */
+
+#include "attach-detach-rc-1.c"
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90 b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
index ed4f10e7a3f..ad8da71d7c9 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
@@ -1,8 +1,14 @@
 ! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }
 
 /* Nullify the 'finalize' clause, which disturbs reference counting.  */
 #define finalize
 #include "deep-copy-6.f90"
 
 ! { dg-output ".*CheCKpOInT1(\n|\r\n|\r)" }
-! { dg-output ".CheCKpOInT2(\n|\r\n|\r)" }
+! { dg-output ".*CheCKpOInT2(\n|\r\n|\r)" }
+
+! Without the finalize, we do not detach properly so the host sees a device
+! pointer, and fails with this STOP code.
+! { dg-output "STOP 7(\n|\r\n|\r)+" }
+! { dg-shouldfail "" }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
index eb7d3ca160e..1a291c17241 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
@@ -12,11 +12,14 @@ program dtype
   end type mytype
   integer i
 
-  type(mytype) :: var
+  type(mytype), target :: var
+  integer, pointer :: hostptr(:)
 
   allocate(var%a(1:n))
   allocate(var%b(1:n))
 
+  hostptr => var%a
+
 !$acc data copy(var)
 
   do i = 1, n
@@ -53,6 +56,8 @@ program dtype
 
 !$acc end data
 
+  if (.not. associated(hostptr, var%a)) stop 7
+
   do i = 1,4
     if (var%a(i) .ne. 0) stop 1
     if (var%b(i) .ne. 0) stop 2
Thomas Schwinge July 16, 2020, 8:35 a.m. | #3
Hi Julian!

On 2020-06-18T19:21:57+0100, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 9 Jun 2020 12:41:21 +0200

> Thomas Schwinge <thomas@codesourcery.com> wrote:

>> On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com>

>> wrote:

>> > On Fri, 5 Jun 2020 13:17:09 +0200

>> > Thomas Schwinge <thomas@codesourcery.com> wrote:

>> >> On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com>

>> >> wrote:

>> >> > This part contains the libgomp runtime support for the

>> >> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds

>> >>

>> >> > --- a/libgomp/target.c

>> >> > +++ b/libgomp/target.c

>> >>

>> >> > @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,

>> >>

>> >> > +             case GOMP_MAP_ATTACH:

>> >> > +               {

>> >> > +                 cur_node.host_start = (uintptr_t) hostaddrs[i];

>> >> > +                 cur_node.host_end = cur_node.host_start + sizeof (void *);

>> >> > +                 splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);

>> >> > +                 if (n != NULL)

>> >> > +                   {

>> >> > +                     tgt->list[i].key = n;

>> >> > +                     tgt->list[i].offset = cur_node.host_start - n->host_start;

>> >> > +                     tgt->list[i].length = n->host_end - n->host_start;

>> >> > +                     tgt->list[i].copy_from = false;

>> >> > +                     tgt->list[i].always_copy_from = false;

>> >> > +                     tgt->list[i].do_detach

>> >> > +                       = (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);

>> >> > +                     n->refcount++;

>> >> > +                   }

>> >> > +                 else

>> >> > +                   {

>> >> > +                     gomp_mutex_unlock (&devicep->lock);

>> >> > +                     gomp_fatal ("outer struct not mapped for attach");

>> >> > +                   }

>> >> > +                 gomp_attach_pointer (devicep, aq, mem_map, n,

>> >> > +                                      (uintptr_t) hostaddrs[i], sizes[i],

>> >> > +                                      cbufp);

>> >> > +                 continue;

>> >> > +               }

>> >>

>> >> For the OpenACC runtime API 'acc_attach' etc. routines they don't,

>> >> so what's the conceptual reason that for the corresponding OpenACC

>> >> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in

>> >> reference counting ('n->refcount++' above)?  I understand OpenACC

>> >> 'attach'/'detach' clauses to be simple "executable clauses", which

>> >> just update some values somewhere (say, like

>> >> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,

>> >> thus wouldn't appear to need reference counting?

>> >

>> > IIUC, n->refcount is not directly the "structural reference count"

>> > as seen at source level, but rather counts the number of

>> > target_var_descs in the lists appended to each target_mem_desc --

>> > and GOMP_MAP_ATTACH have variable entries in those lists.

>>

>> That may be OK if that's purely an implementation detail that isn't

>> visible to the user, however:

>>

>> > That's not the case for the API

>> > routines.

>>

>> As I had mentioned, the problem is: in contrast to 'acc_attach', an

>> OpenACC 'enter data' directive with 'attach' clause currently uses

>> this same reference-counted code path, and thus such an 'attach'

>> without corresponding 'detach' inhibits unmapping; [...]

>

> The attached patch stops attach/detach operations from affecting

> reference counts (either structured or dynamic). This isn't as invasive

> as I'd imagined: we can extend the use of the "do_detach" flag in

> target_mem_descs' variable lists to mark mappings that correspond to

> attach operations, then use that flag to avoid refcount

> increment/decrements.


Thanks, ACK.

> (The flag should possibly be renamed now.)


How about:

    -  /* True if variable should be detached at end of region.  */
    -  bool do_detach;
    +  /* True if this is for OpenACC 'attach'.  */
    +  bool is_attach;

(Changing that similarly is obvious/pre-approved.)

> Tested with offloading to NVPTX. OK?


I've adjusted the patch for current GCC sources, and did some further
changes/cleanup; see below, and attached "[OpenACC] Deep copy
attach/detach should not affect reference counts".  If you're happy with
that, that's OK for master and releases/gcc-10 (once un-frozen) branches.

> --- a/libgomp/oacc-mem.c

> +++ b/libgomp/oacc-mem.c


> @@ -1131,7 +1134,9 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,

>           if (tgt->list[j].key == n)

>             {

>               for (size_t k = 0; k < groupnum; k++)

> -               if (j + k < tgt->list_count && tgt->list[j + k].key)

> +               if (j + k < tgt->list_count

> +                   && tgt->list[j + k].key

> +                   && !tgt->list[j + k].do_detach)

>                   {

>                     tgt->list[j + k].key->refcount++;

>                     tgt->list[j + k].key->dynamic_refcount++;

> @@ -1156,7 +1161,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,

>         for (size_t j = 0; j < tgt->list_count; j++)

>           {

>             n = tgt->list[j].key;

> -           if (n)

> +           if (n && !tgt->list[j].do_detach)

>               n->dynamic_refcount++;

>           }

>       }


If I understand correctly, relatedly, we can also "strengthen" the
'is_tgt_unmapped' checking (nowadays centralized in 'goacc_exit_datum_1')
by excluding any 'do_detach' ones from '++num_mappings'.  Done.

> --- a/libgomp/target.c

> +++ b/libgomp/target.c


> @@ -382,7 +382,7 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep,

>                       (void *) newn->host_start,

>                       newn->host_end - newn->host_start, cbuf);

>

> -  if (oldn->refcount != REFCOUNT_INFINITY)

> +  if (oldn->refcount != REFCOUNT_INFINITY && kind != GOMP_MAP_ATTACH)

>      oldn->refcount++;

>  }


That's always-true.  Removed.

> --- /dev/null

> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c

> @@ -0,0 +1,50 @@

> +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */

> +

> +#include <openacc.h>

> +#include <assert.h>

> +

> +#define N 1024

> +

> +struct mystr {

> +  int pad;

> +  int *data;

> +};


The 'pad' is no longer needed with PR95270 "OpenACC 'enter data attach'
looks up target memory object displaced by pointer size" fixed.

> +[...]


> --- /dev/null

> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c

> @@ -0,0 +1,4 @@

> +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */

> +/* { dg-additional-options "-DATTACH_VIA_DIRECTIVE" } */

> +

> +#include "attach-detach-rc-1.c"


I've merged/extended 'libgomp.oacc-c-c++-common/attach-detach-rc-1.c',
'libgomp.oacc-c-c++-common/attach-detach-rc-2.c' into
'libgomp.oacc-c-c++-common/mdc-refcount-1.c', and further added
'libgomp.oacc-c-c++-common/mdc-refcount-2.c', and
'libgomp.oacc-c-c++-common/mdc-refcount-3.c'.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
From 3b1262da8922df1321ab982744ac48334b2279da Mon Sep 17 00:00:00 2001
From: Julian Brown <julian@codesourcery.com>

Date: Thu, 18 Jun 2020 05:11:08 -0700
Subject: [PATCH] [OpenACC] Deep copy attach/detach should not affect reference
 counts

TODO Some rationale.

TODO Update
	libgomp/
	* oacc-mem.c (goacc_enter_data_internal): Don't affect reference counts
	for attach mappings.
	(goacc_exit_data_internal): Don't affect reference counts for detach
	mappings.
	* target.c (gomp_map_vars_existing): Don't affect reference counts for
	attach mappings.
	(gomp_map_vars_internal): Set do_detach flag unconditionally to mark
	attach mappings.
	(gomp_unmap_vars_internal): Use above flag to prevent affecting
	reference count for attach mappings.
	* testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c: Likewise.
	* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark
	test as shouldfail.
	* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
	gracefully in no-finalize mode.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
---
 libgomp/oacc-mem.c                            |  40 +++---
 libgomp/target.c                              |  12 +-
 .../mdc-refcount-1.c                          |  60 +++++++++
 .../mdc-refcount-2.c                          | 123 ++++++++++++++++++
 .../mdc-refcount-3.c                          |  86 ++++++++++++
 .../deep-copy-6-no_finalize.F90               |   9 +-
 .../libgomp.oacc-fortran/deep-copy-6.f90      |   8 +-
 7 files changed, 318 insertions(+), 20 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-3.c

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 41548f75e72c..0fa6597aaf1b 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -669,6 +669,9 @@ static void
 goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s,
 		    unsigned short kind, splay_tree_key n, goacc_aq aq)
 {
+  assert (kind != GOMP_MAP_DETACH
+	  && kind != GOMP_MAP_FORCE_DETACH);
+
   if ((uintptr_t) h < n->host_start || (uintptr_t) h + s > n->host_end)
     {
       size_t host_size = n->host_end - n->host_start;
@@ -678,8 +681,7 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s,
     }
 
   bool finalize = (kind == GOMP_MAP_FORCE_FROM
-		   || kind == GOMP_MAP_DELETE
-		   || kind == GOMP_MAP_FORCE_DETACH);
+		   || kind == GOMP_MAP_DELETE);
 
   assert (n->refcount != REFCOUNT_LINK);
   if (n->refcount != REFCOUNT_INFINITY
@@ -727,7 +729,8 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s,
 	     zero.  Otherwise (e.g. for a 'GOMP_MAP_STRUCT' mapping with
 	     multiple members), fall back to skipping the test.  */
 	  for (size_t l_i = 0; l_i < n->tgt->list_count; ++l_i)
-	    if (n->tgt->list[l_i].key)
+	    if (n->tgt->list[l_i].key
+		&& !n->tgt->list[l_i].do_detach)
 	      ++num_mappings;
 	  bool is_tgt_unmapped = gomp_remove_var (acc_dev, n);
 	  assert (is_tgt_unmapped || num_mappings > 1);
@@ -1137,12 +1140,15 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	  void *h = hostaddrs[i];
 	  size_t s = sizes[i];
 
-	  /* A standalone attach clause.  */
 	  if ((kinds[i] & 0xff) == GOMP_MAP_ATTACH)
-	    gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n,
-				 (uintptr_t) h, s, NULL);
-
-	  goacc_map_var_existing (acc_dev, h, s, n);
+	    {
+	      gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n,
+				   (uintptr_t) h, s, NULL);
+	      /* OpenACC 'attach'/'detach' doesn't affect structured/dynamic
+		 reference counts ('n->refcount', 'n->dynamic_refcount').  */
+	    }
+	  else
+	    goacc_map_var_existing (acc_dev, h, s, n);
 	}
       else if (n && groupnum > 1)
 	{
@@ -1170,7 +1176,9 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 		   list, and increment the refcounts for each item in that
 		   group.  */
 		for (size_t k = 0; k < groupnum; k++)
-		  if (j + k < tgt->list_count && tgt->list[j + k].key)
+		  if (j + k < tgt->list_count
+		      && tgt->list[j + k].key
+		      && !tgt->list[j + k].do_detach)
 		    {
 		      tgt->list[j + k].key->refcount++;
 		      tgt->list[j + k].key->dynamic_refcount++;
@@ -1204,7 +1212,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	  for (size_t j = 0; j < tgt->list_count; j++)
 	    {
 	      n = tgt->list[j].key;
-	      if (n)
+	      if (n && !tgt->list[j].do_detach)
 		n->dynamic_refcount++;
 	    }
 	}
@@ -1270,14 +1278,10 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	case GOMP_MAP_POINTER:
 	case GOMP_MAP_DELETE:
 	case GOMP_MAP_RELEASE:
-	case GOMP_MAP_DETACH:
-	case GOMP_MAP_FORCE_DETACH:
 	  {
 	    struct splay_tree_key_s cur_node;
 	    size_t size;
-	    if (kind == GOMP_MAP_POINTER
-		|| kind == GOMP_MAP_DETACH
-		|| kind == GOMP_MAP_FORCE_DETACH)
+	    if (kind == GOMP_MAP_POINTER)
 	      size = sizeof (void *);
 	    else
 	      size = sizes[i];
@@ -1300,6 +1304,12 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	     'GOMP_MAP_STRUCT's anymore.  */
 	  break;
 
+	case GOMP_MAP_DETACH:
+	case GOMP_MAP_FORCE_DETACH:
+	  /* OpenACC 'attach'/'detach' doesn't affect structured/dynamic
+	     reference counts ('n->refcount', 'n->dynamic_refcount').  */
+	  break;
+
 	default:
 	  gomp_fatal (">>>> goacc_exit_data_internal UNHANDLED kind 0x%.2x",
 			  kind);
diff --git a/libgomp/target.c b/libgomp/target.c
index 478909e3b275..0358864608a2 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1095,9 +1095,10 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		      tgt->list[i].length = n->host_end - n->host_start;
 		      tgt->list[i].copy_from = false;
 		      tgt->list[i].always_copy_from = false;
-		      tgt->list[i].do_detach
-			= (pragma_kind != GOMP_MAP_VARS_ENTER_DATA);
-		      n->refcount++;
+		      tgt->list[i].do_detach = true;
+		      /* OpenACC 'attach'/'detach' doesn't affect
+			 structured/dynamic reference counts ('n->refcount',
+			 'n->dynamic_refcount').  */
 		    }
 		  else
 		    {
@@ -1448,6 +1449,11 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
       if (k == NULL)
 	continue;
 
+      /* OpenACC 'attach'/'detach' doesn't affect structured/dynamic reference
+	 counts ('n->refcount', 'n->dynamic_refcount').  */
+      if (tgt->list[i].do_detach)
+	continue;
+
       bool do_unmap = false;
       if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
 	k->refcount--;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c
new file mode 100644
index 000000000000..6170447e7d31
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c
@@ -0,0 +1,60 @@
+/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
+
+#include <openacc.h>
+#include <assert.h>
+
+#define N 1024
+
+struct mystr {
+  int *data;
+};
+
+static void
+test (unsigned variant)
+{
+  int arr[N];
+  struct mystr s;
+
+  s.data = arr;
+
+  acc_copyin (&s, sizeof (s));
+  acc_create (s.data, N * sizeof (int));
+
+  for (int i = 0; i < 20; i++)
+    {
+      if ((variant + i) % 1)
+	{
+#pragma acc enter data attach(s.data)
+	}
+      else
+	acc_attach ((void **) &s.data);
+
+      if ((variant + i) % 2)
+	{
+#pragma acc exit data detach(s.data)
+	}
+      else
+	acc_detach ((void **) &s.data);
+    }
+
+  assert (acc_is_present (arr, N * sizeof (int)));
+  assert (acc_is_present (&s, sizeof (s)));
+
+  acc_delete (arr, N * sizeof (int));
+
+  assert (!acc_is_present (arr, N * sizeof (int)));
+
+  acc_copyout (&s, sizeof (s));
+
+  assert (!acc_is_present (&s, sizeof (s)));
+  assert (s.data == arr);
+}
+
+int
+main (int argc, char *argv[])
+{
+  for (unsigned variant = 0; variant < 4; ++variant)
+    test (variant);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c
new file mode 100644
index 000000000000..2431a76a805c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c
@@ -0,0 +1,123 @@
+/* Verify that OpenACC 'attach'/'detach' doesn't interfere with reference
+   counting.  */
+
+#include <assert.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+/* Need to shared this (and, in particular, implicit '&data_work' in
+   'attach'/'detach' clauses) between 'test' and 'test_'.  */
+static unsigned char *data_work;
+
+static void test_(unsigned variant,
+		  unsigned char *data,
+		  void *data_d)
+{
+  assert(acc_is_present(&data_work, sizeof data_work));
+  assert(data_work == data);
+
+  acc_update_self(&data_work, sizeof data_work);
+  assert(data_work == data);
+
+  if (variant & 1)
+    {
+#pragma acc enter data attach(data_work)
+    }
+  else
+    acc_attach((void **) &data_work);
+  acc_update_self(&data_work, sizeof data_work);
+  assert(data_work == data_d);
+
+  if (variant & 4)
+    {
+      if (variant & 2)
+	{ // attach some more
+	  data_work = data;
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+#pragma acc enter data attach(data_work)
+#pragma acc enter data attach(data_work)
+	  acc_attach((void **) &data_work);
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+	}
+      else
+	{}
+    }
+  else
+    { // detach
+      data_work = data;
+      if (variant & 2)
+	{
+#pragma acc exit data detach(data_work)
+	}
+      else
+	acc_detach((void **) &data_work);
+      acc_update_self(&data_work, sizeof data_work);
+      assert(data_work == data);
+
+      // now not attached anymore
+
+#if 0
+      if (TODO)
+	{
+	  acc_detach(&data_work); //TODO PR95203 "libgomp: attach count underflow"
+	  acc_update_self(&data_work, sizeof data_work);
+	  assert(data_work == data);
+	}
+#endif
+    }
+
+  assert(acc_is_present(&data_work, sizeof data_work));
+}
+
+static void test(unsigned variant)
+{
+  const int size = sizeof (void *);
+  unsigned char *data = (unsigned char *) malloc(size);
+  assert(data);
+  void *data_d = acc_create(data, size);
+  assert(data_d);
+  assert(acc_is_present(data, size));
+
+  data_work = data;
+
+  if (variant & 8)
+    {
+#pragma acc data copyin(data_work)
+      test_(variant, data, data_d);
+    }
+  else
+    {
+      acc_copyin(&data_work, sizeof data_work);
+      test_(variant, data, data_d);
+      acc_delete(&data_work, sizeof data_work);
+    }
+#if ACC_MEM_SHARED
+  assert(acc_is_present(&data_work, sizeof data_work));
+#else
+  assert(!acc_is_present(&data_work, sizeof data_work));
+#endif
+  data_work = NULL;
+
+  assert(acc_is_present(data, size));
+  acc_delete(data, size);
+  data_d = NULL;
+#if ACC_MEM_SHARED
+  assert(acc_is_present(data, size));
+#else
+  assert(!acc_is_present(data, size));
+#endif
+  free(data);
+  data = NULL;
+}
+
+int main()
+{
+  for (size_t i = 0; i < 16; ++i)
+    test(i);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-3.c
new file mode 100644
index 000000000000..0f5e7becada8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-3.c
@@ -0,0 +1,86 @@
+/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
+
+/* Variant of 'deep-copy-7.c'.  */
+
+#include <stdlib.h>
+#include <assert.h>
+#include <openacc.h>
+
+struct dc
+{
+  int a;
+  int *b;
+};
+
+int
+main ()
+{
+  int n = 100, i, j, k;
+  struct dc v = { .a = 3 };
+
+  v.b = (int *) malloc (sizeof (int) * n);
+
+  for (k = 0; k < 16; k++)
+    {
+      /* Here, we do not explicitly copy the enclosing structure, but work
+	 with fields directly.  Make sure attachment counters and reference
+	 counters work properly in that case.  */
+#pragma acc enter data copyin(v.a, v.b[0:n]) // 1
+      assert (acc_is_present (&v.b, sizeof v.b));
+      assert (acc_is_present (v.b, sizeof (int) * n));
+#pragma acc enter data pcopyin(v.b[0:n]) // 2
+#pragma acc enter data pcopyin(v.b[0:n]) // 3
+
+#pragma acc parallel loop present(v.a, v.b)
+      for (i = 0; i < n; i++)
+	v.b[i] = k + v.a + i;
+
+      switch (k % 5)
+	{ // All optional.
+	case 0:
+	  break;
+	case 1:
+	  ; //TODO PR95901
+#pragma acc exit data detach(v.b) finalize
+	  break;
+	case 2:
+	  ; //TODO PR95901
+#pragma acc exit data detach(v.b)
+	  break;
+	case 3:
+	  acc_detach_finalize ((void **) &v.b);
+	  break;
+	case 4:
+	  acc_detach ((void **) &v.b);
+	  break;
+	}
+      assert (acc_is_present (&v.b, sizeof v.b));
+      assert (acc_is_present (v.b, sizeof (int) * n));
+      { // 3
+	acc_delete (&v.b, sizeof v.b);
+	assert (acc_is_present (&v.b, sizeof v.b));
+	acc_copyout (v.b, sizeof (int) * n);
+	assert (acc_is_present (v.b, sizeof (int) * n));
+      }
+      { // 2
+	acc_delete (&v.b, sizeof v.b);
+	assert (acc_is_present (&v.b, sizeof v.b));
+	acc_copyout (v.b, sizeof (int) * n);
+	assert (acc_is_present (v.b, sizeof (int) * n));
+      }
+      { // 1
+	acc_delete (&v.b, sizeof v.b);
+	assert (!acc_is_present (&v.b, sizeof v.b));
+	acc_copyout (v.b, sizeof (int) * n);
+	assert (!acc_is_present (v.b, sizeof (int) * n));
+      }
+#pragma acc exit data delete(v.a)
+
+      for (i = 0; i < n; i++)
+	assert (v.b[i] == k + v.a + i);
+
+      assert (!acc_is_present (&v, sizeof (v)));
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90 b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
index 038f04a3c37e..1daff2dadf11 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
@@ -1,5 +1,12 @@
 ! { dg-do run }
 
-/* Nullify the 'finalize' clause.  */
+/* Nullify the 'finalize' clause.
+
+   That means, we do not detach properly, the host sees a device pointer, and
+   we fail as follows.
+   { dg-output "STOP 30(\n|\r\n|\r)+" { target { ! openacc_host_selected } } }
+   { dg-shouldfail "" { ! openacc_host_selected } }
+*/
 #define finalize
 #include "deep-copy-6.f90"
+
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
index 6aab6a4a7633..94ddca3bce8e 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
@@ -12,11 +12,14 @@ program dtype
   end type mytype
   integer i
 
-  type(mytype) :: var
+  type(mytype), target :: var
+  integer, pointer :: hostptr(:)
 
   allocate(var%a(1:n))
   allocate(var%b(1:n))
 
+  hostptr => var%a
+
 !$acc data copy(var)
 
   do i = 1, n
@@ -49,6 +52,9 @@ program dtype
 
 !$acc end data
 
+  ! See 'deep-copy-6-no_finalize.F90'.
+  if (.not. associated(hostptr, var%a)) stop 30
+
   do i = 1,4
     if (var%a(i) .ne. 0) stop 1
     if (var%b(i) .ne. 0) stop 2
-- 
2.27.0

Patch

From d99a701387054259419292b95462f3646a00d6d9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 8 Jun 2020 21:35:32 +0200
Subject: [PATCH] OpenACC 'attach'/'detach' has no business affecting
 user-visible reference counting

In particular, an 'attach' without 'detach' must not inhibit unmapping.

	libgomp/
	* oacc-mem.c (goacc_attach_internal): New function, split out of
	'acc_attach_async'.
	(acc_attach, goacc_enter_data_internal): Use it.
	(goacc_exit_data_internal) <GOMP_MAP_DETACH,
	GOMP_MAP_FORCE_DETACH>: Skip unmapping.
	* target.c (gomp_map_vars_existing): Assert not 'GOMP_MAP_ATTACH'.
	(gomp_map_vars_internal) <GOMP_MAP_ATTACH>: Assert this
	is not an 'enter data'.
	* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New file.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust.
---
 libgomp/oacc-mem.c                            |  51 +++++---
 libgomp/target.c                              |  21 ++-
 .../mdc-refcount-1.c                          | 123 ++++++++++++++++++
 .../mdc-refcount-1-4-1.f90                    |   7 +-
 4 files changed, 176 insertions(+), 26 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 936ae649dd9..0758f59ec3c 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -881,12 +881,11 @@  acc_update_self_async (void *h, size_t s, int async)
   update_dev_host (0, h, s, async);
 }
 
-void
-acc_attach_async (void **hostaddr, int async)
+static void
+goacc_attach_internal (goacc_aq aq, void **hostaddr, size_t bias)
 {
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
-  goacc_aq aq = get_goacc_asyncqueue (async);
 
   struct splay_tree_key_s cur_node;
   splay_tree_key n;
@@ -907,15 +906,22 @@  acc_attach_async (void **hostaddr, int async)
     }
 
   gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n, (uintptr_t) hostaddr,
-		       0, NULL);
+		       bias, NULL);
 
   gomp_mutex_unlock (&acc_dev->lock);
 }
 
+void
+acc_attach_async (void **hostaddr, int async)
+{
+  goacc_aq aq = get_goacc_asyncqueue (async);
+  goacc_attach_internal (aq, hostaddr, 0);
+}
+
 void
 acc_attach (void **hostaddr)
 {
-  acc_attach_async (hostaddr, acc_async_sync);
+  goacc_attach_internal (NULL, hostaddr, 0);
 }
 
 static void
@@ -1034,11 +1040,22 @@  goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
     {
       int group_last = find_group_last (i, mapnum, sizes, kinds);
 
-      gomp_map_vars_async (acc_dev, aq,
-			   (group_last - i) + 1,
-			   &hostaddrs[i], NULL,
-			   &sizes[i], &kinds[i], true,
-			   GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+      unsigned char kind = kinds[i] & 0xff;
+      switch (kind)
+	{
+	case GOMP_MAP_ATTACH:
+	  assert (group_last == i);
+	  goacc_attach_internal (aq, /*TODO is that type cast alright? */ (void **) hostaddrs[i], sizes[i]);
+	  /* Doesn't use reference counting.  */
+	  break;
+	default:
+	  gomp_map_vars_async (acc_dev, aq,
+			       (group_last - i) + 1,
+			       &hostaddrs[i], NULL,
+			       &sizes[i], &kinds[i], true,
+			       GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+	  break;
+	}
 
       i = group_last;
     }
@@ -1094,12 +1111,16 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
       bool finalize = false;
 
       if (kind == GOMP_MAP_FORCE_FROM
-	  || kind == GOMP_MAP_DELETE
-	  || kind == GOMP_MAP_FORCE_DETACH)
+	  || kind == GOMP_MAP_DELETE)
 	finalize = true;
 
       switch (kind)
 	{
+	case GOMP_MAP_DETACH:
+	case GOMP_MAP_FORCE_DETACH:
+	  /* Handled above; doesn't use reference counting.  */
+	  break;
+
 	case GOMP_MAP_FROM:
 	case GOMP_MAP_FORCE_FROM:
 	case GOMP_MAP_ALWAYS_FROM:
@@ -1110,14 +1131,10 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	case GOMP_MAP_POINTER:
 	case GOMP_MAP_DELETE:
 	case GOMP_MAP_RELEASE:
-	case GOMP_MAP_DETACH:
-	case GOMP_MAP_FORCE_DETACH:
 	  {
 	    struct splay_tree_key_s cur_node;
 	    size_t size;
-	    if (kind == GOMP_MAP_POINTER
-		|| kind == GOMP_MAP_DETACH
-		|| kind == GOMP_MAP_FORCE_DETACH)
+	    if (kind == GOMP_MAP_POINTER)
 	      size = sizeof (void *);
 	    else
 	      size = sizes[i];
diff --git a/libgomp/target.c b/libgomp/target.c
index 36425477dcb..2197067a9a3 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -357,10 +357,12 @@  gomp_map_vars_existing (struct gomp_device_descr *devicep,
 			splay_tree_key newn, struct target_var_desc *tgt_var,
 			unsigned char kind, struct gomp_coalesce_buf *cbuf)
 {
+  assert (kind != GOMP_MAP_ATTACH);
+
   tgt_var->key = oldn;
   tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
   tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
-  tgt_var->do_detach = kind == GOMP_MAP_ATTACH;
+  tgt_var->do_detach = false; //TODO Not 'newn->do_detach', right?
   tgt_var->offset = newn->host_start - oldn->host_start;
   tgt_var->length = newn->host_end - newn->host_start;
 
@@ -810,13 +812,15 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	}
       else if ((kind & typemask) == GOMP_MAP_ATTACH)
 	{
+	  assert (pragma_kind != GOMP_MAP_VARS_ENTER_DATA
+		  && pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+
 	  tgt->list[i].key = NULL;
 	  has_firstprivate = true;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if (!GOMP_MAP_POINTER_P (kind & typemask)
-	  && (kind & typemask) != GOMP_MAP_ATTACH)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
@@ -1083,6 +1087,9 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		continue;
 	      case GOMP_MAP_ATTACH:
 		{
+		  assert (pragma_kind != GOMP_MAP_VARS_ENTER_DATA
+			  && pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+
 		  cur_node.host_start = (uintptr_t) hostaddrs[i];
 		  cur_node.host_end = cur_node.host_start + sizeof (void *);
 		  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
@@ -1093,8 +1100,12 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		      tgt->list[i].length = n->host_end - n->host_start;
 		      tgt->list[i].copy_from = false;
 		      tgt->list[i].always_copy_from = false;
-		      tgt->list[i].do_detach
-			= (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+		      tgt->list[i].do_detach = true;
+		      /* OpenACC 'attach'/'detach' has no business affecting
+			 user-visible reference counting, but the following
+			 adjustment of the structured reference counter ('data'
+			 construct), this is just an implementation detail,
+			 isn't visible to the user.  */
 		      n->refcount++;
 		    }
 		  else
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c
new file mode 100644
index 00000000000..d5eb167ca07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c
@@ -0,0 +1,123 @@ 
+/* Verify that OpenACC 'attach'/'detach' doesn't interfere with reference
+   counting.  */
+
+#include <assert.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+/* Need to shared this (and, in particular, implicit '&data_work' in
+   'attach'/'detach' clauses) between 'test' and 'test_'.  */
+static unsigned char *data_work;
+
+static void test_(unsigned variant,
+		  unsigned char *data,
+		  void *data_d)
+{
+  assert(acc_is_present(&data_work, sizeof data_work));
+  assert(data_work == data);
+
+  acc_update_self(&data_work, sizeof data_work);
+  assert(data_work == data);
+
+  if (variant & 1)
+    {
+#pragma acc enter data attach(data_work)
+    }
+  else
+    acc_attach((void **) &data_work);
+  acc_update_self(&data_work, sizeof data_work);
+  assert(data_work == data_d);
+
+  if (variant & 4)
+    {
+      if (variant & 2)
+	{ // attach some more
+	  data_work = data;
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+#pragma acc enter data attach(data_work)
+#pragma acc enter data attach(data_work)
+	  acc_attach((void **) &data_work);
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+	}
+      else
+	{}
+    }
+  else
+    { // detach
+      data_work = data;
+      if (variant & 2)
+	{
+#pragma acc exit data detach(data_work)
+	}
+      else
+	acc_detach((void **) &data_work);
+      acc_update_self(&data_work, sizeof data_work);
+      assert(data_work == data);
+
+      // now not attached anymore
+
+#if 0
+      if (TODO)
+	{
+	  acc_detach(&data_work); //TODO PR95203 "libgomp: attach count underflow"
+	  acc_update_self(&data_work, sizeof data_work);
+	  assert(data_work == data);
+	}
+#endif
+    }
+
+  assert(acc_is_present(&data_work, sizeof data_work));
+}
+
+static void test(unsigned variant)
+{
+  const int size = sizeof (void *) + 1; // In sweet memory of PR95270.
+  unsigned char *data = (unsigned char *) malloc(size);
+  assert(data);
+  void *data_d = acc_create(data, size);
+  assert(data_d);
+  assert(acc_is_present(data, size));
+
+  data_work = data;
+
+  if (variant & 8)
+    {
+#pragma acc data copyin(data_work)
+      test_(variant, data, data_d);
+    }
+  else
+    {
+      acc_copyin(&data_work, sizeof data_work);
+      test_(variant, data, data_d);
+      acc_delete(&data_work, sizeof data_work);
+    }
+#if ACC_MEM_SHARED
+  assert(acc_is_present(&data_work, sizeof data_work));
+#else
+  assert(!acc_is_present(&data_work, sizeof data_work));
+#endif
+  data_work = NULL;
+
+  assert(acc_is_present(data, size));
+  acc_delete(data, size);
+  data_d = NULL;
+#if ACC_MEM_SHARED
+  assert(acc_is_present(data, size));
+#else
+  assert(!acc_is_present(data, size));
+#endif
+  free(data);
+  data = NULL;
+}
+
+int main()
+{
+  for (size_t i = 0; i < 16; ++i)
+    test(i);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90
index b22e411567f..fbd52373946 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90
@@ -23,16 +23,15 @@  program main
   if (.not. acc_is_present(var%a)) stop 1
   if (.not. acc_is_present(var)) stop 2
 
+  !$acc exit data detach(var%a) finalize
   print *, "CheCKpOInT1"
   ! { dg-output ".*CheCKpOInT1(\n|\r\n|\r)" }
-  !$acc exit data detach(var%a) finalize
-  !TODO     goacc_exit_data_internal: Assertion `is_tgt_unmapped || num_mappings > 1' failed.
-  !TODO { dg-output ".*\[Aa\]ssert.*is_tgt_unmapped" { target { ! openacc_host_selected } } } ! Scan for what we expect in the "XFAILed" case (without actually XFAILing).
+  !$acc exit data delete(var%a)
+  !TODO { dg-output "(\n|\r\n|\r)libgomp: attach count underflow(\n|\r\n|\r)$" { target { ! openacc_host_selected } } } ! Scan for what we expect in the "XFAILed" case (without actually XFAILing).
   !TODO { dg-shouldfail "XFAILed" { ! openacc_host_selected } } ! ... instead of 'dg-xfail-run-if' so that 'dg-output' is evaluated at all.
   !TODO { dg-final { if { [dg-process-target { xfail { ! openacc_host_selected } }] == "F" } { xfail "[testname-for-summary] really is XFAILed" } } } ! ... so that we still get an XFAIL visible in the log.
   print *, "CheCKpOInT2"
   ! { dg-output ".CheCKpOInT2(\n|\r\n|\r)" { target { openacc_host_selected } } }
-  !$acc exit data delete(var%a)
   if (acc_is_present(var%a)) stop 3
   if (.not. acc_is_present(var)) stop 4
 
-- 
2.17.1