[openmp,simt] Error out for user-defined reduction

Message ID 20210503102409.GA20090@delia
State New
Headers show
Series
  • [openmp,simt] Error out for user-defined reduction
Related show

Commit Message

Tom de Vries May 3, 2021, 10:24 a.m.
Hi,

The test-case included in this patch contains this target region:
...
  for (int i0 = 0 ; i0 < N0 ; i0++ )
    counter_N0.i += 1;
...

When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32.  The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.

This is caused by the implementation of SIMT being incomplete.  It handles
regular reductions, but appearantly not user-defined reductions.

For now, make this explicit by erroring out for nvptx, like this:
...
target-44.c: In function 'main':
target-44.c:20:9: error: SIMT reduction not fully implemented
...

Tested libgomp on x86_64-linux with and without nvptx accelerator.

Any comments?

Thanks,
- Tom

[openmp, simt] Error out for user-defined reduction

gcc/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* omp-low.c (lower_rec_input_clauses): Error out for user-defined reduction
	for SIMT.

libgomp/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* testsuite/libgomp.c/target-44.c: New test.

---
 gcc/omp-low.c                           |  2 ++
 libgomp/testsuite/libgomp.c/target-44.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Aldy Hernandez via Gcc-patches May 3, 2021, 10:31 a.m. | #1
On Mon, May 03, 2021 at 12:24:10PM +0200, Tom de Vries wrote:
> The test-case included in this patch contains this target region:

> ...

>   for (int i0 = 0 ; i0 < N0 ; i0++ )

>     counter_N0.i += 1;

> ...

> 

> When running with nvptx accelerator, the counter variable is expected to

> be N0 after the region, but instead is N0 / 32.  The problem is that rather

> than getting the result for all warp lanes, we get it for just one lane.

> 

> This is caused by the implementation of SIMT being incomplete.  It handles

> regular reductions, but appearantly not user-defined reductions.

> 

> For now, make this explicit by erroring out for nvptx, like this:

> ...

> target-44.c: In function 'main':

> target-44.c:20:9: error: SIMT reduction not fully implemented

> ...

> 

> Tested libgomp on x86_64-linux with and without nvptx accelerator.

> 

> Any comments?


If you want a workaround, the workaround should be to disable SIMT if
UDR reductions are seen, rather than erroring out.
So e.g. in lower_rec_simd_input_clauses for sctx->is_simt if sctx->max_vf
isn't 1 look for OMP_CLAUSE_REDUCTION with OMP_CLAUSE_REDUCTION_PLACEHOLDER
and punt (set max_vf = 1) in that case.

The right thing is to implement it properly of course.

	Jakub
Tom de Vries May 3, 2021, 5:03 p.m. | #2
On 5/3/21 12:31 PM, Jakub Jelinek wrote:
> On Mon, May 03, 2021 at 12:24:10PM +0200, Tom de Vries wrote:

>> The test-case included in this patch contains this target region:

>> ...

>>   for (int i0 = 0 ; i0 < N0 ; i0++ )

>>     counter_N0.i += 1;

>> ...

>>

>> When running with nvptx accelerator, the counter variable is expected to

>> be N0 after the region, but instead is N0 / 32.  The problem is that rather

>> than getting the result for all warp lanes, we get it for just one lane.

>>

>> This is caused by the implementation of SIMT being incomplete.  It handles

>> regular reductions, but appearantly not user-defined reductions.

>>

>> For now, make this explicit by erroring out for nvptx, like this:

>> ...

>> target-44.c: In function 'main':

>> target-44.c:20:9: error: SIMT reduction not fully implemented

>> ...

>>

>> Tested libgomp on x86_64-linux with and without nvptx accelerator.

>>

>> Any comments?

> 

> If you want a workaround, the workaround should be to disable SIMT if

> UDR reductions are seen, rather than erroring out.

> So e.g. in lower_rec_simd_input_clauses for sctx->is_simt if sctx->max_vf

> isn't 1 look for OMP_CLAUSE_REDUCTION with OMP_CLAUSE_REDUCTION_PLACEHOLDER

> and punt (set max_vf = 1) in that case.

> 


Thanks for the review, I've tried to implement this, see patch below.

> The right thing is to implement it properly of course.


Ack, I've taken a look, and for me itd doesn't look like a below-a-day
kind of task, so unfortunately I don't have the time for this right now.

Thanks,
- Tom
[openmp, simt] Disable SIMT for user-defined reduction

The test-case included in this patch contains this target region:
...
  for (int i0 = 0 ; i0 < N0 ; i0++ )
    counter_N0.i += 1;
...

When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32.  The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.

This is caused by the implementation of SIMT being incomplete.  It handles
regular reductions, but appearantly not user-defined reductions.

For now, handle this by disabling SIMT in this case, specifically by setting
sctx->max_vf to 1.

Tested libgomp on x86_64-linux with nvptx accelerator.

gcc/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined
	reduction.

libgomp/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* testsuite/libgomp.c/target-44.c: New test.

---
 gcc/omp-low.c                           |  8 ++++++++
 libgomp/testsuite/libgomp.c/target-44.c | 27 +++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7b122059c6e..bb8d3188c26 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4385,6 +4385,14 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 		sctx->max_vf = lower_bound (sctx->max_vf, safe_len);
 	    }
 	}
+      if (sctx->is_simt && !known_eq (sctx->max_vf, 1U))
+	{
+	  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
+				    OMP_CLAUSE_REDUCTION);
+	  if (c && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+	    /* UDR reductions are not supported yet for SIMT, disable SIMT.  */
+	    sctx->max_vf = 1;
+	}
       if (maybe_gt (sctx->max_vf, 1U))
 	{
 	  sctx->idx = create_tmp_var (unsigned_type_node);
diff --git a/libgomp/testsuite/libgomp.c/target-44.c b/libgomp/testsuite/libgomp.c/target-44.c
new file mode 100644
index 00000000000..13e0c757845
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-44.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+
+#include <stdlib.h>
+
+struct s
+{
+  int i;
+};
+
+#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i)
+
+int
+main (void)
+{
+  const int N0 = 32768;
+
+  struct s counter_N0 = { 0 };
+#pragma omp target
+#pragma omp for simd reduction(+: counter_N0)
+  for (int i0 = 0 ; i0 < N0 ; i0++ )
+    counter_N0.i += 1;
+
+  if (counter_N0.i != N0)
+    abort ();
+
+  return 0;
+}
Aldy Hernandez via Gcc-patches May 3, 2021, 5:14 p.m. | #3
On Mon, May 03, 2021 at 07:03:24PM +0200, Tom de Vries wrote:
> +      if (sctx->is_simt && !known_eq (sctx->max_vf, 1U))

> +	{

> +	  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),

> +				    OMP_CLAUSE_REDUCTION);

> +	  if (c && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))

> +	    /* UDR reductions are not supported yet for SIMT, disable SIMT.  */

> +	    sctx->max_vf = 1;


This isn't sufficient, you could have e.g. 2 reductions, the first non-UDR
one and the second one with UDR.
So it needs to be a for loop like:
	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
	       c = OMP_CLAUSE_CHAIN (c))
	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
	      {
		/* UDR reductions are not supported yet for SIMT,
		   disable SIMT.  */
		sctx->max_vf = 1;
		break;
	      }
(or with omp_find_clause used in two spots).

	Jakub
Thomas Schwinge May 18, 2021, 11:03 a.m. | #4
Hi!

On 2021-05-03T19:03:24+0200, Tom de Vries <tdevries@suse.de> wrote:
> --- /dev/null

> +++ b/libgomp/testsuite/libgomp.c/target-44.c

> @@ -0,0 +1,27 @@

> +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */


Causes issues if more than nvptx offloading compilation is enabled.  Thus
pushed "'libgomp.c/target-44.c': Restrict '-latomic' to nvptx offloading
compilation" to master branch in commit
abf937ac00e523576ca86957dfa9769281896ca5, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
From abf937ac00e523576ca86957dfa9769281896ca5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>

Date: Mon, 17 May 2021 08:05:40 +0200
Subject: [PATCH] 'libgomp.c/target-44.c': Restrict '-latomic' to nvptx
 offloading compilation

Fix-up for recent commit f87990a2a8fc9e20d30462a0a4c9047582af0cd9
"[openmp, simt] Disable SIMT for user-defined reduction"; see commit
d42088e453042f4f8ba9190a7e29efd937ea2181 "Avoid -latomic for amdgcn
offloading".

	libgomp/
	* testsuite/libgomp.c/target-44.c: Restrict '-latomic' to nvptx
	offloading compilation.
---
 libgomp/testsuite/libgomp.c/target-44.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c/target-44.c b/libgomp/testsuite/libgomp.c/target-44.c
index 13e0c757845..b95e807a114 100644
--- a/libgomp/testsuite/libgomp.c/target-44.c
+++ b/libgomp/testsuite/libgomp.c/target-44.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=nvptx-none=-latomic" { target { offload_target_nvptx } } } */
 
 #include <stdlib.h>
 
-- 
2.30.2

Patch

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7b122059c6e..0f122857a3a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6005,6 +6005,8 @@  lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c);
 		  gimple *tseq;
 		  tree ptype = TREE_TYPE (placeholder);
+		  if (sctx.is_simt)
+		    error ("SIMT reduction not fully implemented");
 		  if (cond)
 		    {
 		      x = error_mark_node;
diff --git a/libgomp/testsuite/libgomp.c/target-44.c b/libgomp/testsuite/libgomp.c/target-44.c
new file mode 100644
index 00000000000..497931cd14c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-44.c
@@ -0,0 +1,28 @@ 
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* { dg-error "SIMT reduction not fully implemented" "" { target { offload_target_nvptx } } 0 }  */
+#include <stdlib.h>
+
+struct s
+{
+  int i;
+};
+
+#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i)
+
+int
+main (void)
+{
+  const int N0 = 32768;
+
+  struct s counter_N0 = { 0 };
+#pragma omp target
+#pragma omp for simd reduction(+: counter_N0)
+  for (int i0 = 0 ; i0 < N0 ; i0++ )
+    counter_N0.i += 1;
+
+  if (counter_N0.i != N0)
+    abort ();
+
+  return 0;
+}