Move pass_oacc_device_lower after pass_graphite

Message ID xu8feela319r.fsf@harwath.name
State New
Headers show
Series
  • Move pass_oacc_device_lower after pass_graphite
Related show

Commit Message

Frederik Harwath Nov. 3, 2020, 3:31 p.m.
Hi,

as a first step towards enabling the use of Graphite for optimizing
OpenACC loops this patch moves the OpenACC device lowering after the
Graphite pass.  This means that the device lowering now takes place
after some crucial optimization passes. Thus new instances of those
passes are added inside of a new pass pass_oacc_functions which ensures
that they run on OpenACC functions only. The choice of the new position
for pass_oacc_device_lower is further constrainted by the need to
execute it before pass_vectorize.  This means that
pass_oacc_device_lower now runs inside of pass_tree_loop. A further
instance of the pass that handles functions without loops is added
inside of pass_tree_no_loop. Yet another pass instance that executes if
optimizations are disabled is included inside of a new
pass_no_optimizations.

The patch has been bootstrapped on x86_64-linux-gnu and tested with the
GCC testsuite and with the libgomp testsuite with nvptx and gcn
offloading.

The patch should have no impact on non-OpenACC user code. However the
new pass instances have changed the pass instance numbering and hence
the dump scanning commands in several tests had to be adjusted. I hope
that I found all that needed adjustment, but it is well possible that I
missed some tests that execute for particular targets or non-default
languages only. The resulting UNRESOLVED tests are usually easily fixed
by appending a pass number to the name of a pass that previously had no
number (e.g. "cunrolli" becomes "cunrolli1") or by incrementing the pass
number (e.g. "dce6" becomes "dce7") in a dump scanning command.

The patch leads to several new unresolved tests in the libgomp testsuite
which are caused by the combination of torture testing, missing cleanup
of the offload dump files, and the new pass numbering.  If a test that
uses, for instance, "-foffload=fdump-tree-oaccdevlow" gets compiled with
"-O0" and afterwards with "-O2", each run of the test executes different
instances of pass_oacc_device_lower and produces dumps whose names
differ only in the pass instance number.  The dump scanning command in
the second run fails, because the dump files do not get removed after
the first run and the command consequently matches two different dump
files.  This seems to be a known issue.  I am going to submit a patch
that implements the cleanup of the offload dumps soon.

I have tried to rule out performance regressions by running different
benchmark suites with nvptx and gcn offloading. Nevertheless, I think
that it makes sense to keep an eye on OpenACC performance in the close
future and revisit the optimizations that run on the device lowered
function if necessary.

Ok to include the patch in master?

Best regards,
Frederik


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Comments

apinski--- via Gcc-patches Nov. 4, 2020, 9:18 a.m. | #1
On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath
<frederik@codesourcery.com> wrote:
>

>

> Hi,

>

> as a first step towards enabling the use of Graphite for optimizing

> OpenACC loops this patch moves the OpenACC device lowering after the

> Graphite pass.  This means that the device lowering now takes place

> after some crucial optimization passes. Thus new instances of those

> passes are added inside of a new pass pass_oacc_functions which ensures

> that they run on OpenACC functions only. The choice of the new position

> for pass_oacc_device_lower is further constrainted by the need to

> execute it before pass_vectorize.  This means that

> pass_oacc_device_lower now runs inside of pass_tree_loop. A further

> instance of the pass that handles functions without loops is added

> inside of pass_tree_no_loop. Yet another pass instance that executes if

> optimizations are disabled is included inside of a new

> pass_no_optimizations.

>

> The patch has been bootstrapped on x86_64-linux-gnu and tested with the

> GCC testsuite and with the libgomp testsuite with nvptx and gcn

> offloading.

>

> The patch should have no impact on non-OpenACC user code. However the

> new pass instances have changed the pass instance numbering and hence

> the dump scanning commands in several tests had to be adjusted. I hope


What's on my TODO list (or on the list of things to explore) is to make
the dump file names/suffixes explicit in passes.def like via

  NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")

and we'd get a dump named .ccp_oacc or so.  Or stick with explicit
numbers by specifying , 5.  If just the number is fixed this could
eventually be done with just tweaks to gen-pass-instances.awk

Now, what does oacc_device_lower actually do that you need to
re-run complex lowering?  What does cunrolli do at this point that
the complete_unroll pass later does not do?

What's special about oacc_device lower that doesn't also apply
to omp_device_lower?

Is all this targeted at code compiled exclusively for the offload
target?  Thus we're in lto1 here?  Does it make eventually more
sense to have a completely custom pass pipeline for the
offload compilation?  Maybe even per offload target?  See how
we have a custom pipeline for -Og (pass_all_optimizations_g).

> that I found all that needed adjustment, but it is well possible that I

> missed some tests that execute for particular targets or non-default

> languages only. The resulting UNRESOLVED tests are usually easily fixed

> by appending a pass number to the name of a pass that previously had no

> number (e.g. "cunrolli" becomes "cunrolli1") or by incrementing the pass

> number (e.g. "dce6" becomes "dce7") in a dump scanning command.

>

> The patch leads to several new unresolved tests in the libgomp testsuite

> which are caused by the combination of torture testing, missing cleanup

> of the offload dump files, and the new pass numbering.  If a test that

> uses, for instance, "-foffload=fdump-tree-oaccdevlow" gets compiled with

> "-O0" and afterwards with "-O2", each run of the test executes different

> instances of pass_oacc_device_lower and produces dumps whose names

> differ only in the pass instance number.  The dump scanning command in

> the second run fails, because the dump files do not get removed after

> the first run and the command consequently matches two different dump

> files.  This seems to be a known issue.  I am going to submit a patch

> that implements the cleanup of the offload dumps soon.

>

> I have tried to rule out performance regressions by running different

> benchmark suites with nvptx and gcn offloading. Nevertheless, I think

> that it makes sense to keep an eye on OpenACC performance in the close

> future and revisit the optimizations that run on the device lowered

> function if necessary.

>

> Ok to include the patch in master?

>

> Best regards,

> Frederik

>

>

> -----------------

> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany

> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Frederik Harwath Nov. 6, 2020, 11:18 a.m. | #2
Hi Richard,

Richard Biener <richard.guenther@gmail.com> writes:

> On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath


> What's on my TODO list (or on the list of things to explore) is to make

> the dump file names/suffixes explicit in passes.def like via

>

>   NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")

>

> and we'd get a dump named .ccp_oacc or so.


That would be very helpful for avoiding the drudgery of adapting those
pass numbers!

> Now, what does oacc_device_lower actually do that you need to

> re-run complex lowering?  What does cunrolli do at this point that

> the complete_unroll pass later does not do?

>


Good spot, "cunrolli" seems to be unnecessary.  The complex lowering is
necessary to handle the code that gets created by the OpenACC reduction
lowering during oaccdevlow.  I have attached a test case (a reduced
version of
libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which
shows that the complex instructions are created by
pass_oacc_device_lower and which leads to an ICE if compiled without the
new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2").
The problem is an unlowered addition. This is from a diff of the dump of
the pass following oaccdevlow1 (ccp4) with disabled and with enabled
tree-cplxlower2:

<   _91 = VIEW_CONVERT_EXPR<complex float>(_1);
<   _92 = reduction_var_2 + _91;
---
>   _104 = REALPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;

>   _105 = IMAGPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;

>   _91 = COMPLEX_EXPR <_104, _105>;

>   _106 = reduction_var$real_100 + _104;

>   _107 = reduction_var$imag_101 + _105;

>   _92 = COMPLEX_EXPR <_106, _107>;


> What's special about oacc_device lower that doesn't also apply

> to omp_device_lower?


The passes do different things. The goal is to optimize OpenACC
loops using Graphite. The relevant lowering of the internal OpenACC
function calls happens in pass_oacc_device_lower.

> Is all this targeted at code compiled exclusively for the offload

> target?  Thus we're in lto1 here?


The OpenACC outlined functions also get compiled for the host.

> Does it make eventually more sense to have a completely custom pass

> pipeline for the  offload compilation?  Maybe even per offload target?

> See how we have a custom pipeline for -Og (pass_all_optimizations_g).


What would be the main benefits of a separate pipeline? Avoiding
(re-)running passes unneccessarily, less unwanted interactions
in the test suite (but your suggestion above regarding the fixed
pass names would also solve this)?

>> Ok to include the patch in master?


Best regards,
Frederik

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c
new file mode 100644
index 00000000000..6879e5aaf25
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c
@@ -0,0 +1,50 @@
+/* { dg-additional-options "-foffload=-fdump-tree-cplxlower2" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-do link } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } {""} } */
+
+#include <stdio.h>
+#if !defined(__hppa__) || !defined(__hpux__)
+#include <complex.h>
+#endif
+
+#define N 100
+
+static float _Complex __attribute__ ((noinline))
+sum (float _Complex ary[N])
+{
+  float _Complex reduction_var = 0;
+#pragma acc parallel loop gang reduction(+:reduction_var)
+  for (int ix = 0; ix < N; ix++)
+    reduction_var += ary[ix];
+
+ return reduction_var;
+}
+
+int main (void)
+{
+  float _Complex ary[N];
+  float _Complex result;
+
+  for (int ix = 0; ix < N;  ix++)
+    {
+      float frac = ix * (1.0f / 1024) + 1.0f;
+      ary[ix] = frac + frac * 2.0j - 1.0j;
+    }
+
+  result = sum (ary);
+  printf("%.1f%+.1fi\n", creal(result), cimag(result));
+  return 0;
+}
+
+/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR" 1 "oaccdevlow1" } }
+
+ There is just one COMPLEX_EXPR right before oaccdevlow1 ...*/
+
+/* { dg-final { scan-offload-tree-dump-times "GOACC_REDUCTION .*?reduction_var.*?;" 4 "oaccdevlow1" } }
+
+  ... but several IFN_GOACC_REDUCTION calls for the reduction variable which are subsequently lowered ... */
+
+/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR <reduction_var.real_\\d+, reduction_var.imag_\\d+>" 4  "cplxlower2" } }
+
+ ... which introduces new COMPLEX_EXPRs. */
apinski--- via Gcc-patches Nov. 6, 2020, 12:45 p.m. | #3
On Fri, Nov 6, 2020 at 12:18 PM Frederik Harwath
<frederik@codesourcery.com> wrote:
>

>

> Hi Richard,

>

> Richard Biener <richard.guenther@gmail.com> writes:

>

> > On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath

>

> > What's on my TODO list (or on the list of things to explore) is to make

> > the dump file names/suffixes explicit in passes.def like via

> >

> >   NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")

> >

> > and we'd get a dump named .ccp_oacc or so.

>

> That would be very helpful for avoiding the drudgery of adapting those

> pass numbers!

>

> > Now, what does oacc_device_lower actually do that you need to

> > re-run complex lowering?  What does cunrolli do at this point that

> > the complete_unroll pass later does not do?

> >

>

> Good spot, "cunrolli" seems to be unnecessary.  The complex lowering is

> necessary to handle the code that gets created by the OpenACC reduction

> lowering during oaccdevlow.  I have attached a test case (a reduced

> version of

> libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which

> shows that the complex instructions are created by

> pass_oacc_device_lower and which leads to an ICE if compiled without the

> new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2").

> The problem is an unlowered addition. This is from a diff of the dump of

> the pass following oaccdevlow1 (ccp4) with disabled and with enabled

> tree-cplxlower2:

>

> <   _91 = VIEW_CONVERT_EXPR<complex float>(_1);

> <   _92 = reduction_var_2 + _91;

> ---

> >   _104 = REALPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;

> >   _105 = IMAGPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>;

> >   _91 = COMPLEX_EXPR <_104, _105>;

> >   _106 = reduction_var$real_100 + _104;

> >   _107 = reduction_var$imag_101 + _105;

> >   _92 = COMPLEX_EXPR <_106, _107>;


I wonder if oacc device lowering could handle this itself rather than
requiring another cplxlower pass for presumably just complex add?

> > What's special about oacc_device lower that doesn't also apply

> > to omp_device_lower?

>

> The passes do different things. The goal is to optimize OpenACC

> loops using Graphite. The relevant lowering of the internal OpenACC

> function calls happens in pass_oacc_device_lower.

>

> > Is all this targeted at code compiled exclusively for the offload

> > target?  Thus we're in lto1 here?

>

> The OpenACC outlined functions also get compiled for the host.

>

> > Does it make eventually more sense to have a completely custom pass

> > pipeline for the  offload compilation?  Maybe even per offload target?

> > See how we have a custom pipeline for -Og (pass_all_optimizations_g).

>

> What would be the main benefits of a separate pipeline? Avoiding

> (re-)running passes unneccessarily, less unwanted interactions

> in the test suite (but your suggestion above regarding the fixed

> pass names would also solve this)?


Mainly to avoid (re-)running passes unneccessarily and more
easily tuning towards offload targets without affecting non-offload
code too much.

Can I somehow make you work on that dump-file idea? ;)

Richard.

> >> Ok to include the patch in master?

>

> Best regards,

> Frederik

>

> -----------------

> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany

> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Patch

From 93fb166876a0540416e19c9428316d1370dd1e1b Mon Sep 17 00:00:00 2001
From: Frederik Harwath <frederik@codesourcery.com>
Date: Tue, 3 Nov 2020 12:58:37 +0100
Subject: [PATCH] Move pass_oacc_device_lower after pass_graphite

As a first step towards enabling the use of Graphite for optimizing
OpenACC loops, the OpenACC device lowering must be moved after the
Graphite pass.  This means that the device lowering now takes place
after some crucial optimization passes. Thus new instances of those
passes are added inside of a new pass pass_oacc_functions which
ensures that they execute on OpenACC functions only. The choice of the
new position for pass_oacc_device_lower is further constrainted by the
need to execute it before pass_vectorize.  This means that
pass_oacc_device_lower now runs inside of pass_tree_loop. A further
instance of the pass that handles functions without loops is added
inside of pass_tree_no_loop. Yet another pass instance that executes
if optimizations are disabled is included inside of a new
pass_no_optimizations.

2020-11-03  Frederik Harwath  <frederik@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>

gcc/ChangeLog:

	* omp-general.c (oacc_get_fn_dim_size): Adapt.
	* omp-offload.c (pass_oacc_device_lower::clone) : New method.
	* passes.c (class pass_no_optimizations): New pass.
	(make_pass_no_optimizations): New static function.
	* passes.def: Move pass_oacc_device_lower into pass_tree_loop
	and add further instances to pass_tree_no_loop and to new pass
	pass_no_optimizations. Add new instances of
	pass_lower_complex, pass_ccp, pass_sink_code,
	pass_complete_unrolli, pass_backprop, pass_phiprop,
	pass_forwprop, pass_vrp, pass_dce, pass_loop_done,
	pass_loop_init, pass_fix_loops supporting the
	pass_oacc_device_lower instance in pass_tree_loop.
	* tree-pass.h (make_pass_oacc_functions): New static function.
	(make_pass_oacc_functions): New static function.
	* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone): New method.
	(pass_complete_unrolli::clone): New method.
	* tree-ssa-loop.c (pass_fix_loops::clone): New method.
	(pass_tree_loop_init::clone): New method.
	(pass_tree_loop_done::clone): New method.
	* tree-ssa-phiprop.c (pass_phiprop::clone): New method.
	* tree-ssa-sink.c (pass_sink_code::clone): New method.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adapt to
	changed pass instance numbering.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/classify-kernels-unparallelized.c: Adapt to changed
	pass instance numbering.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-routine.c: Likewise.
	* c-c++-common/unroll-1.c: Likewise.
	* c-c++-common/unroll-4.c: Likewise.
	* g++.dg/ext/unroll-1.C: Likewise.
	* g++.dg/ext/unroll-2.C: Likewise.
	* g++.dg/ext/unroll-3.C: Likewise.
	* g++.dg/tree-ssa/pr49911.C: Likewise.
	* g++.dg/vect/pr36648.cc: Likewise.
	* gcc.dg/goacc/loop-processing-1.c: Likewise.
	* gcc.dg/graphite/fuse-1.c: Likewise.
	* gcc.dg/tree-ssa/backprop-1.c: Likewise.
	* gcc.dg/tree-ssa/backprop-2.c: Likewise.
	* gcc.dg/tree-ssa/backprop-3.c: Likewise.
	* gcc.dg/tree-ssa/backprop-4.c: Likewise.
	* gcc.dg/tree-ssa/backprop-5.c: Likewise.
	* gcc.dg/tree-ssa/backprop-6.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
	* gcc.dg/tree-ssa/ldist-17.c: Likewise.
	* gcc.dg/tree-ssa/loop-38.c: Likewise.
	* gcc.dg/tree-ssa/pr21463.c: Likewise.
	* gcc.dg/tree-ssa/pr45427.c: Likewise.
	* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
	* gcc.dg/tree-ssa/pr68234.c: Likewise.
	* gcc.dg/tree-ssa/pr70232.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-1.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-10.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-13.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-14.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-16.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-17.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-2.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-3.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-4.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-5.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-6.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-7.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-8.c: Likewise.
	* gcc.dg/tree-ssa/ssa-sink-9.c: Likewise.
	* gcc.dg/tree-ssa/ssa-thread-11.c: Likewise.
	* gcc.dg/tree-ssa/vrp47.c: Likewise.
	* gcc.dg/tree-ssa/vrp91.c: Likewise.
	* gcc.dg/unroll-2.c: Likewise.
	* gcc.dg/unroll-3.c: Likewise.
	* gcc.dg/unroll-4.c: Likewise.
	* gcc.dg/unroll-5.c: Likewise.
	* gcc.dg/vect/bb-slp-59.c: Likewise.
	* gcc.dg/vect/pr26359.c: Likewise.
	* gcc.dg/vect/vect-profile-1.c: Likewise.
	* gcc.dg/vrp-min-max-2.c: Likewise.
	* gcc.dg/wrapped-binop-simplify.c: Likewise.
	* gfortran.dg/directive_unroll_1.f90: Likewise.
	* gfortran.dg/directive_unroll_4.f90: Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	* gfortran.dg/goacc/classify-parallel.f95: Likewise.
	* gfortran.dg/goacc/classify-routine.f95: Likewise.
	* gnat.dg/unroll1.adb: Likewise.
	* gnat.dg/unroll2.adb: Likewise.
	* c-c++-common/goacc/device-lowering-no-optimizations.c: New test.
	* c-c++-common/goacc/device-lowering-with-optimizations.c: New test.
---
 gcc/omp-general.c                             |  8 ++-
 gcc/omp-offload.c                             |  1 +
 gcc/passes.c                                  | 35 ++++++++++++
 gcc/passes.def                                | 29 +++++++++-
 .../goacc/classify-kernels-unparallelized.c   |  6 +--
 .../c-c++-common/goacc/classify-kernels.c     |  6 +--
 .../c-c++-common/goacc/classify-parallel.c    |  6 +--
 .../c-c++-common/goacc/classify-routine.c     |  6 +--
 .../goacc/device-lowering-no-optimizations.c  | 25 +++++++++
 .../device-lowering-with-optimizations.c      | 30 +++++++++++
 gcc/testsuite/c-c++-common/unroll-1.c         |  8 +--
 gcc/testsuite/c-c++-common/unroll-4.c         |  4 +-
 gcc/testsuite/g++.dg/ext/unroll-1.C           |  2 +-
 gcc/testsuite/g++.dg/ext/unroll-2.C           |  2 +-
 gcc/testsuite/g++.dg/ext/unroll-3.C           |  2 +-
 gcc/testsuite/g++.dg/tree-ssa/pr49911.C       |  4 +-
 gcc/testsuite/g++.dg/vect/pr36648.cc          |  2 +-
 .../gcc.dg/goacc/loop-processing-1.c          |  3 +-
 gcc/testsuite/gcc.dg/graphite/fuse-1.c        |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c    |  6 +--
 gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c    |  6 +--
 gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c    |  6 +--
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c     |  6 +--
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c     |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c     |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c      |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/loop-38.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21463.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr45427.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c     |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr68234.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr70232.c       |  4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-13.c   |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-14.c   |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-16.c   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-17.c   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-5.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-6.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-7.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-8.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-9.c    |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp47.c         |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp91.c         |  4 +-
 gcc/testsuite/gcc.dg/unroll-2.c               |  2 +-
 gcc/testsuite/gcc.dg/unroll-3.c               |  4 +-
 gcc/testsuite/gcc.dg/unroll-4.c               |  4 +-
 gcc/testsuite/gcc.dg/unroll-5.c               |  4 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-59.c         |  2 +-
 gcc/testsuite/gcc.dg/vect/pr26359.c           |  4 +-
 gcc/testsuite/gcc.dg/vect/vect-profile-1.c    |  2 +-
 gcc/testsuite/gcc.dg/vrp-min-max-2.c          |  6 +--
 gcc/testsuite/gcc.dg/wrapped-binop-simplify.c |  4 +-
 .../gfortran.dg/directive_unroll_1.f90        |  2 +-
 .../gfortran.dg/directive_unroll_4.f90        |  2 +-
 .../goacc/classify-kernels-unparallelized.f95 |  6 +--
 .../gfortran.dg/goacc/classify-kernels.f95    |  6 +--
 .../gfortran.dg/goacc/classify-parallel.f95   |  6 +--
 .../gfortran.dg/goacc/classify-routine.f95    |  6 +--
 gcc/testsuite/gnat.dg/unroll1.adb             |  2 +-
 gcc/testsuite/gnat.dg/unroll2.adb             |  2 +-
 gcc/tree-pass.h                               |  1 +
 gcc/tree-ssa-loop-ivcanon.c                   |  2 +
 gcc/tree-ssa-loop.c                           | 54 +++++++++++++++++++
 gcc/tree-ssa-phiprop.c                        |  2 +
 gcc/tree-ssa-sink.c                           |  2 +
 .../libgomp.oacc-c-c++-common/pr84955-1.c     |  4 +-
 .../libgomp.oacc-c-c++-common/pr85486-2.c     |  2 +-
 .../libgomp.oacc-c-c++-common/pr85486-3.c     |  2 +-
 .../libgomp.oacc-c-c++-common/pr85486.c       |  2 +-
 .../vector-length-128-1.c                     |  2 +-
 .../vector-length-128-2.c                     |  2 +-
 .../vector-length-128-3.c                     |  2 +-
 .../vector-length-128-4.c                     |  2 +-
 .../vector-length-128-5.c                     |  2 +-
 .../vector-length-128-6.c                     |  2 +-
 .../vector-length-128-7.c                     |  2 +-
 86 files changed, 316 insertions(+), 130 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimizations.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-with-optimizations.c

diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index b66dfb58257..7b848d9b20c 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -2778,7 +2778,13 @@  oacc_get_fn_dim_size (tree fn, int axis)
   while (axis--)
     dims = TREE_CHAIN (dims);
 
-  int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+  tree v = TREE_VALUE (dims);
+  /* TODO With 'pass_oacc_device_lower' moved "later", this is necessary to
+     avoid ICE for some OpenACC 'kernels' ("parloops") constructs.  */
+  if (v == NULL_TREE)
+    return 0;
+
+  int size = TREE_INT_CST_LOW (v);
 
   return size;
 }
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 4490701147c..8ff4675153c 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2027,6 +2027,7 @@  public:
     {
       return execute_oacc_device_lower ();
     }
+  opt_pass * clone () { return new pass_oacc_device_lower (m_ctxt); }
 
 }; // class pass_oacc_device_lower
 
diff --git a/gcc/passes.c b/gcc/passes.c
index f71f63918f4..51fa39e94e8 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -620,6 +620,41 @@  make_pass_all_optimizations_g (gcc::context *ctxt)
 
 namespace {
 
+const pass_data pass_data_no_optimizations =
+{
+  GIMPLE_PASS, /* type */
+  "*no_optimizations", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_OPTIMIZE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_no_optimizations : public gimple_opt_pass
+{
+public:
+  pass_no_optimizations (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_no_optimizations, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return !optimize; }
+
+}; // class pass_no_optimizations
+
+} // anon namespace
+
+static gimple_opt_pass *
+make_pass_no_optimizations (gcc::context *ctxt)
+{
+  return new pass_no_optimizations (ctxt);
+}
+
+namespace {
+
 const pass_data pass_data_rest_of_compilation =
 {
   RTL_PASS, /* type */
diff --git a/gcc/passes.def b/gcc/passes.def
index c68231287b6..58f9be8f957 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -181,7 +181,6 @@  along with GCC; see the file COPYING3.  If not see
   INSERT_PASSES_AFTER (all_passes)
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
-  NEXT_PASS (pass_oacc_device_lower);
   NEXT_PASS (pass_omp_device_lower);
   NEXT_PASS (pass_omp_target_link);
   NEXT_PASS (pass_adjust_alignment);
@@ -284,6 +283,29 @@  along with GCC; see the file COPYING3.  If not see
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_parallelize_loops, false /* oacc_kernels_p */);
 	  NEXT_PASS (pass_expand_omp_ssa);
+	  /* Interrupt pass_tree_loop for OpenACC device lowering. */
+	  NEXT_PASS (pass_oacc_functions);
+	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_functions)
+	    NEXT_PASS (pass_tree_loop_done);
+	    NEXT_PASS (pass_oacc_device_lower);
+	    /* Passes that must run after OpenACC device lowering. */
+	    /* Lower complex number instructions arising from reductions. */
+	    NEXT_PASS (pass_lower_complex);
+	    /* Those optimizations are generally beneficial, but they are
+	    particularly important to help the vectorizer which is crucial
+	    for AMD GCN offloading. */
+	    NEXT_PASS (pass_ccp, true /* nonzero_p */);
+	    NEXT_PASS (pass_sink_code);
+	    NEXT_PASS (pass_complete_unrolli);
+	    NEXT_PASS (pass_backprop);
+	    NEXT_PASS (pass_phiprop);
+	    NEXT_PASS (pass_forwprop);
+	    NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
+	    NEXT_PASS (pass_dce);
+	    NEXT_PASS (pass_fix_loops);
+	    /* Continue pass_tree_loop after OpenACC device lowering. */
+	  NEXT_PASS (pass_tree_loop_init);
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_ch_vect);
 	  NEXT_PASS (pass_if_conversion);
 	  /* pass_vectorize must immediately follow pass_if_conversion.
@@ -312,6 +334,7 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_tree_no_loop);
       PUSH_INSERT_PASSES_WITHIN (pass_tree_no_loop)
 	  NEXT_PASS (pass_slp_vectorize);
+	  NEXT_PASS (pass_oacc_device_lower);
       POP_INSERT_PASSES ()
       NEXT_PASS (pass_simduid_cleanup);
       NEXT_PASS (pass_lower_vector_ssa);
@@ -387,6 +410,10 @@  along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_local_pure_const);
       NEXT_PASS (pass_modref);
   POP_INSERT_PASSES ()
+  NEXT_PASS (pass_no_optimizations);
+  PUSH_INSERT_PASSES_WITHIN (pass_no_optimizations)
+      NEXT_PASS (pass_oacc_device_lower);
+  POP_INSERT_PASSES ()
   NEXT_PASS (pass_tm_init);
   PUSH_INSERT_PASSES_WITHIN (pass_tm_init)
       NEXT_PASS (pass_tm_mark);
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index d4c4b2ca237..df733954847 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -35,6 +35,6 @@  void KERNELS ()
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index 16e9b9e31d1..649ef317e93 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -31,6 +31,6 @@  void KERNELS ()
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
index 66a6d133663..3dc528fa099 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
@@ -24,6 +24,6 @@  void PARALLEL ()
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccdevlow1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-routine.c b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
index 0b9ba6ea69f..6509103c52e 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-routine.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
@@ -26,6 +26,6 @@  void ROUTINE ()
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccdevlow1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimizations.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimizations.c
new file mode 100644
index 00000000000..ce90891e342
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimizations.c
@@ -0,0 +1,25 @@ 
+/* Check that the instance of the OpenACC device lowering pass that is
+   supposed to run if optimizations are disabled does get executed. */
+
+/* { dg-additional-options "-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-O0" } */
+
+#pragma acc routine
+int test (int x)
+{
+  return x * x;
+}
+
+int test2 (int x)
+{
+#pragma acc parallel
+  {
+    for (int i = 1; i < 1000; ++i)
+      x += x;
+  }
+
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times "Function is OpenACC routine" 1 "oaccdevlow3" } } */
+/* { dg-final { scan-tree-dump-times "Function is OpenACC parallel" 1 "oaccdevlow3" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-with-optimizations.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-with-optimizations.c
new file mode 100644
index 00000000000..9b7cb625b35
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-with-optimizations.c
@@ -0,0 +1,30 @@ 
+/* Check that the different instances of the OpenACC device lowering
+   pass get executed on the types of functions they are supposed to
+   handle if optimizations are enabled. */
+
+/* { dg-additional-options "-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-O" } */
+
+#pragma acc routine
+int test (int x)
+{
+  return x * x;
+}
+
+int test2 (int x)
+{
+#pragma acc parallel
+  {
+    for (int i = 1; i < 1000; ++i)
+      x += x;
+  }
+
+  return x;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Function is OpenACC routine" 1 "oaccdevlow2" } }
+   The acc routine should be handled by the pass instance for functions without loops. */
+/* { dg-final { scan-tree-dump-times "Function is OpenACC parallel" 1 "oaccdevlow1" } }
+   The function with the parallel region should be handled by the pass instance
+   for functions with loops. */
diff --git a/gcc/testsuite/c-c++-common/unroll-1.c b/gcc/testsuite/c-c++-common/unroll-1.c
index fe7f4f31912..8e57a44be23 100644
--- a/gcc/testsuite/c-c++-common/unroll-1.c
+++ b/gcc/testsuite/c-c++-common/unroll-1.c
@@ -1,5 +1,5 @@ 
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" } */
+/* { dg-do compile } *
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fdump-rtl-loop2_unroll-details" } */
 
 extern void bar (int);
 
@@ -10,12 +10,12 @@  void test (void)
   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 8; ++i)
     bar(i);
-  /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli1" } } */
 
   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 7; ++i)
     bar(i);
-  /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli1" } } */
 
   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 15; ++i)
diff --git a/gcc/testsuite/c-c++-common/unroll-4.c b/gcc/testsuite/c-c++-common/unroll-4.c
index 1c1988174ba..fe7f9e10626 100644
--- a/gcc/testsuite/c-c++-common/unroll-4.c
+++ b/gcc/testsuite/c-c++-common/unroll-4.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli1-details" } */
 
 extern void bar (int);
 
@@ -17,6 +17,6 @@  void test (void)
   for (unsigned long i = 1; i <= j; ++i)
     bar(i);
 
-  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli1" } } */
   /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
 }
diff --git a/gcc/testsuite/g++.dg/ext/unroll-1.C b/gcc/testsuite/g++.dg/ext/unroll-1.C
index aa11b2e6ef7..0e087dfd251 100644
--- a/gcc/testsuite/g++.dg/ext/unroll-1.C
+++ b/gcc/testsuite/g++.dg/ext/unroll-1.C
@@ -16,4 +16,4 @@  bar (int *a, int *b, int *c)
   foo <int> (a, b, c);
 }
 
-// { dg-final { scan-tree-dump "loop with 8 iterations completely unrolled" "cunrolli" } }
+// { dg-final { scan-tree-dump "loop with 8 iterations completely unrolled" "cunrolli1" } }
diff --git a/gcc/testsuite/g++.dg/ext/unroll-2.C b/gcc/testsuite/g++.dg/ext/unroll-2.C
index f9ec892dbdd..4feb23bf565 100644
--- a/gcc/testsuite/g++.dg/ext/unroll-2.C
+++ b/gcc/testsuite/g++.dg/ext/unroll-2.C
@@ -10,4 +10,4 @@  foo (int (&a)[8], int *b, int *c)
     a[i] = b[i] * c[i];
 }
 
-// { dg-final { scan-tree-dump "loop with 8 iterations completely unrolled" "cunrolli" } }
+// { dg-final { scan-tree-dump "loop with 8 iterations completely unrolled" "cunrolli1" } }
diff --git a/gcc/testsuite/g++.dg/ext/unroll-3.C b/gcc/testsuite/g++.dg/ext/unroll-3.C
index dda94c56af2..3b772fa45c8 100644
--- a/gcc/testsuite/g++.dg/ext/unroll-3.C
+++ b/gcc/testsuite/g++.dg/ext/unroll-3.C
@@ -17,4 +17,4 @@  bar (int (&a)[8], int *b, int *c)
   foo <int> (a, b, c);
 }
 
-// { dg-final { scan-tree-dump "loop with 8 iterations completely unrolled" "cunrolli" } }
+// { dg-final { scan-tree-dump "loop with 8 iterations completely unrolled" "cunrolli1" } }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr49911.C b/gcc/testsuite/g++.dg/tree-ssa/pr49911.C
index e31a3f4b1d9..5df6b6f9291 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr49911.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr49911.C
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fstrict-enums -fno-rtti -fno-exceptions -fno-strict-aliasing -fdump-tree-vrp2" } */
+/* { dg-options "-O2 -fstrict-enums -fno-rtti -fno-exceptions -fno-strict-aliasing -fdump-tree-vrp3" } */
 
 
 extern  void JS_Assert();
@@ -37,4 +37,4 @@  void jsop_setelem(bool y, int z) {
   x = frame.dataRematInfo2(y, z);
 }
 
-/* { dg-final { scan-tree-dump-times "Folding predicate.*45" 0 "vrp2"} } */
+/* { dg-final { scan-tree-dump-times "Folding predicate.*45" 0 "vrp3"} } */
diff --git a/gcc/testsuite/g++.dg/vect/pr36648.cc b/gcc/testsuite/g++.dg/vect/pr36648.cc
index 8d24d3d445d..8990041e4fa 100644
--- a/gcc/testsuite/g++.dg/vect/pr36648.cc
+++ b/gcc/testsuite/g++.dg/vect/pr36648.cc
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
-/* { dg-additional-options "-fdisable-tree-cunrolli" } */
+/* { dg-additional-options "-fdisable-tree-cunrolli1" } */
 
 struct vector
 {
diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
index bd4c07e7d81..4dc33241b78 100644
--- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -15,4 +15,5 @@  void vector_1 (int *ary, int size)
   }
 }
 
-/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccdevlow" } } */
+/* { dg-final { scan-tree-dump {
+OpenACC loops.*Loop 0\(0\).*Loop [0-9]{2}\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccdevlow*" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/fuse-1.c b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
index 204d3b20703..8a0ac433f92 100644
--- a/gcc/testsuite/gcc.dg/graphite/fuse-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
@@ -12,7 +12,7 @@  for (int c0 = 0; c0 <= 99; c0 += 1) {
 /* { dg-final { scan-tree-dump-times "AST generated by isl:.*for \\(int c0 = 0; c0 <= 99; c0 \\+= 1\\) \\{.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*\\}" 1 "graphite" } } */
 
 /* Check that after fusing the loops, the scalar computation is also fused.  */
-/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\\n\]*\\^ 12" 1 "forwprop4" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\\n\]*\\^ 12" 1 "forwprop5" } } */
 
 #define MAX 100
 int A[MAX];
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
index 302fdb570b6..b6b11bf30af 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */
 
 /* Test a simple case of non-looping code in which both uses ignore
    the sign and both definitions are sign ops.  */
@@ -18,5 +18,5 @@  TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <x} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <x} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
index d54fd36e2fb..bef921be500 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */
 
 /* Test a simple case of non-looping code in which both uses ignore
    the sign but only one definition is a sign op.  */
@@ -18,4 +18,4 @@  TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
index a244b4af2ac..1b76ce05cbe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */
 
 /* Test a simple case of non-looping code in which one use ignores
    the sign but another doesn't.  */
@@ -18,4 +18,4 @@  TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 0 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 0 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
index 54355009c74..02223fd9f23 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */
 
 /* Test a simple reduction loop in which all inputs are sign ops and
    the consumer doesn't care about the sign.  */
@@ -17,5 +17,5 @@  TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 3 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 3 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
index e4f0f856ff6..9dd04408b3a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */
 
 /* Test a loop that does both a multiplication and addition.  The addition
    should prevent any sign ops from being removed.  */
@@ -17,4 +17,4 @@  TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 0 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 0 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
index 31f05716f14..1d17c732803 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -fdump-tree-backprop1-details" }  */
 
 void start (void *);
 void end (void *);
@@ -26,5 +26,5 @@  TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
index bcafbfe86b5..110c6cd8635 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O3 -fdump-tree-cunrolli1-details" } */
 int a[2];
 void
 test(int c)
@@ -9,5 +9,5 @@  test(int c)
     a[i]=5;
 }
 /* Array bounds says the loop will not roll much.  */
-/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli"} } */
-/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli1"} } */
+/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
index e25c638ac51..f8ab47cebf0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */
 int a[1];
 void
 test(int c)
@@ -12,4 +12,4 @@  test(int c)
 }
 /* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
 
-/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
index 886dc147ad1..f93db92ab38 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdisable-tree-evrp" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fdisable-tree-evrp" } */
 void abort (void);
 int q (void);
 int a[10];
@@ -20,4 +20,4 @@  t (int n)
     }
   return sum;
 }
-/* { dg-final { scan-tree-dump-times "Removed pointless exit:" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "Removed pointless exit:" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
index b3617f685a1..86c84606ce5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details -fdisable-tree-cunroll -fdisable-tree-cunrolli" } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details -fdisable-tree-cunroll -fdisable-tree-cunrolli1" } */
 
 typedef int mad_fixed_t;
 struct mad_pcm
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
index 7ca1e470975..f8f04ffaa45 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */
 int a[10];
 int b[11];
 int q (void);
@@ -15,4 +15,4 @@  t(int n)
 	sum+=b[i];
   return sum;
 }
-/* { dg-final { scan-tree-dump "Loop 1 iterates at most 11 times" "cunrolli" } } */
+/* { dg-final { scan-tree-dump "Loop 1 iterates at most 11 times" "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
index ed0829a038c..c6f1226d683 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-phiprop-details" } */
+/* { dg-options "-O -fdump-tree-phiprop1-details" } */
 
 struct f
 {
@@ -16,4 +16,4 @@  int g(int i, int c, struct f *ff, int g)
   return *t;
 }
 
-/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop" } } */
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
index 2f86f02a30c..3e8a13cd40c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */
 
 extern void abort (void);
 int __attribute__((noinline,noclone))
@@ -25,4 +25,4 @@  int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "bounded by 0x0\[^0-9a-f\]" 0 "cunrolli"} } */
+/* { dg-final { scan-tree-dump-times "bounded by 0x0\[^0-9a-f\]" 0 "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
index 669d357045c..069df138bcb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
@@ -50,4 +50,4 @@  int foo1 (e_u8 a[4][N], int b1, int b2, e_u8 b[M+1][4][N])
 
 /* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 2 "cunroll" } } */
 /* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 2 "cunroll" } } */
-/* { dg-final { scan-tree-dump-not "completely unrolled" "cunrolli" } } */

+/* { dg-final { scan-tree-dump-not "completely unrolled" "cunrolli1" } } */

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c
index e7c2a95aa4c..fae864936b5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp2" } */
+/* { dg-options "-O2 -fdump-tree-vrp3" } */
 
 extern int nc;
 void ff (unsigned long long);
@@ -21,4 +21,4 @@  f (void)
     }
 }
 
-/* { dg-final { scan-tree-dump ">> 6" "vrp2" } } */
+/* { dg-final { scan-tree-dump ">> 6" "vrp3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr70232.c b/gcc/testsuite/gcc.dg/tree-ssa/pr70232.c
index 6cc987a722a..672878d7bd1 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr70232.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr70232.c
@@ -1,12 +1,12 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -w -fdump-tree-vrp1-details -fdump-tree-vrp2-details -fdump-tree-dom2-details -fdump-tree-dom3-details" } */
+/* { dg-options "-O2 -w -fdump-tree-vrp1-details -fdump-tree-vrp3-details -fdump-tree-dom2-details -fdump-tree-dom3-details" } */
 
 /* All the threads found by the FSM threader should have too
    many statements to be profitable.  */
 /* { dg-final { scan-tree-dump-not "Registering FSM " "dom2"} } */
 /* { dg-final { scan-tree-dump-not "Registering FSM " "dom3"} } */
 /* { dg-final { scan-tree-dump-not "Registering FSM " "vrp1"} } */
-/* { dg-final { scan-tree-dump-not "Registering FSM " "vrp2"} } */
+/* { dg-final { scan-tree-dump-not "Registering FSM " "vrp3"} } */
 
 typedef _Bool bool;
 typedef unsigned char uint8_t;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index bad5bc1d003..cec2132ce65 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
+/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp3-stats -fno-guess-branch-probability" } */
 
 /* Here we have the same issue as was commented in ssa-dom-thread-6.c.
    The PHI coming into the threader has a lot more constants, so the
@@ -24,7 +24,7 @@  $ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2
    to change decisions in switch expansion which in turn can expose new
    jump threading opportunities.  Skip the later tests on aarch64.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp3" { target { ! aarch64*-*-* } } } } */
 
 enum STATE {
   S0=0,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c
index 411585a6dc4..57b501681f3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c
@@ -7,4 +7,4 @@  foo (int a, int b, int c)
   return c ? x : a;
 }
 /* We should sink the x = a * b calculation into the branch that returns x. */
-/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c
index 37e4d2fe687..535cb3208f5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c
@@ -16,4 +16,4 @@  void foo (void)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Sinking # VUSE" 4 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sinking # VUSE" 4 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-13.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-13.c
index a65ba35d4ba..584fd91f43a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-13.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-13.c
@@ -21,5 +21,5 @@  void test ()
 
 /* We should sink/merge all stores and end up with a single BB.  */
 
-/* { dg-final { scan-tree-dump-times "MEM\[^\n\r\]* = 0;" 3 "sink" } } */
-/* { dg-final { scan-tree-dump-times "<bb " 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "MEM\[^\n\r\]* = 0;" 3 "sink1" } } */
+/* { dg-final { scan-tree-dump-times "<bb " 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-14.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-14.c
index 771cd4420c4..f5418b06deb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-14.c
@@ -13,5 +13,5 @@  void foo (int b)
 /* We should have sunk the store and inserted a PHI to merge the
    stored values.  */
 
-/* { dg-final { scan-tree-dump-times " = PHI" 1 "sink" } } */
-/* { dg-final { scan-tree-dump-times "x = " 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times " = PHI" 1 "sink1" } } */
+/* { dg-final { scan-tree-dump-times "x = " 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-16.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-16.c
index 610c8d60ebe..012b165fbab 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-16.c
@@ -10,5 +10,5 @@  int f(int n)
   return j;
 }
 
-/* { dg-final { scan-tree-dump "Sinking j_. = __builtin_ffs" "sink" } } */
+/* { dg-final { scan-tree-dump "Sinking j_. = __builtin_ffs" "sink1" } } */
 /* { dg-final { scan-tree-dump "return 2;" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-17.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-17.c
index cf2e2a0f766..d0aeeb312cc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-17.c
@@ -12,4 +12,4 @@  int my_f(int a, int b)
 }
 
 /* We should sink the call to pure_f to the if block.  */
-/* { dg-final { scan-tree-dump "Sinking # VUSE" "sink" } } */
+/* { dg-final { scan-tree-dump "Sinking # VUSE" "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c
index 6aa5a182a3a..a0b4734b1e0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c
@@ -9,4 +9,4 @@  bar (int a, int b, int c)
   return y;
 }
 /* We should sink the x = a * b calculation into the else branch  */
-/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
index 599997e0e6b..ad88ccc4f5b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
@@ -12,4 +12,4 @@  main (int argc)
     }
 }
 /* We should sink the a = argc + 1 calculation into the if branch  */
-/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c
index 784edd2fc87..1e3cfa93fa8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c
@@ -17,4 +17,4 @@  main (int argc)
   foo2 (a);
 }
 /* We should sink the first a = b + c calculation into the else branch  */
-/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sunk statements: 1" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-5.c
index dbdde39add6..f04da5da9b0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-5.c
@@ -44,4 +44,4 @@  void foo(int16_t runs[], uint8_t alpha[], int x, int count)
 }
 
 /* We should not sink the next_runs = runs + x calculation after the loop.  */
-/* { dg-final { scan-tree-dump-times "Sunk statements:" 0 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sunk statements:" 0 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-6.c
index 1abae9f7943..31f5af330f9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-6.c
@@ -14,4 +14,4 @@  int foo(int *a, int r)
 
 /* *a = 1 should be sunk to the else block.  */
 
-/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-7.c
index ec3288f4e69..bd748442edc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-7.c
@@ -15,4 +15,4 @@  int foo(int *a, int r, short *b)
 
 /* *a = 1 should be sunk to the else block.  */
 
-/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-8.c
index 48af4218fc0..4b23b567fd0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-8.c
@@ -24,4 +24,4 @@  int foo(int *a, int r, short *b)
 
 /* *a = 1 should be sunk into the default case.  */
 
-/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-9.c
index 509a76330a4..32bfc81741a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-9.c
@@ -15,4 +15,4 @@  int foo(int *a, int r, int *b)
 
 /* *a = 1 should be sunk to the else block.  */
 
-/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink" } } */
+/* { dg-final { scan-tree-dump-times "Sinking" 1 "sink1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
index 67e1e89ecd3..7bbcb79d7f0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-11.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp2-details --param logical-op-non-short-circuit=1" } */
-/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "vrp2" } } */
+/* { dg-options "-O2 -fdump-tree-vrp3-details --param logical-op-non-short-circuit=1" } */
+/* { dg-final { scan-tree-dump-not "IRREDUCIBLE_LOOP" "vrp3" } } */
 
 void abort (void);
 typedef struct bitmap_head_def *bitmap;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c
index eb7546c4873..cf4c02ad2d8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c
@@ -1,7 +1,7 @@ 
 /* Setting LOGICAL_OP_NON_SHORT_CIRCUIT to 0 inhibits the setcc
    optimizations that expose the VRP opportunity.  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-dom2 -fdump-tree-vrp2 --param logical-op-non-short-circuit=1" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-dom2 -fdump-tree-vrp3 --param logical-op-non-short-circuit=1" } */
 /* { dg-additional-options "-march=i586" { target { { i?86-*-* x86_64-*-* } && ia32 } } } */
 
 int h(int x, int y)
@@ -39,5 +39,5 @@  int f(int x)
 
 /* VRP2 gets rid of the remaining & 1 operations, x and y are always
    either 0 or 1.  */
-/* { dg-final { scan-tree-dump-times " & 1;" 0 "vrp2" } } */
+/* { dg-final { scan-tree-dump-times " & 1;" 0 "vrp3" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp91.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp91.c
index d1fea9804a3..c2bc3743909 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp91.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp91.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-S -O2 -fdump-tree-vrp2" } */
+/* { dg-options "-S -O2 -fdump-tree-vrp3" } */
 
 unsigned short data;
 void foo ()
@@ -18,4 +18,4 @@  void foo ()
     }
 }
 
-/* { dg-final { scan-tree-dump "\\\[0, 7\\\]" "vrp2" } } */
+/* { dg-final { scan-tree-dump "\\\[0, 7\\\]" "vrp3" } } */
diff --git a/gcc/testsuite/gcc.dg/unroll-2.c b/gcc/testsuite/gcc.dg/unroll-2.c
index 8baceaac169..f94174f0f1d 100644
--- a/gcc/testsuite/gcc.dg/unroll-2.c
+++ b/gcc/testsuite/gcc.dg/unroll-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details=stderr -fno-peel-loops -fno-tree-vrp  -fdisable-tree-cunroll -fenable-tree-cunrolli" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details=stderr -fno-peel-loops -fno-tree-vrp  -fdisable-tree-cunroll -fenable-tree-cunrolli1" } */
 
 /* Blank lines can occur in the output of
    -fdump-tree-cunrolli-details=stderr.  */
diff --git a/gcc/testsuite/gcc.dg/unroll-3.c b/gcc/testsuite/gcc.dg/unroll-3.c
index 10bf59b9a2e..0284378b9c5 100644
--- a/gcc/testsuite/gcc.dg/unroll-3.c
+++ b/gcc/testsuite/gcc.dg/unroll-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunrolli=foo -fenable-tree-cunrolli=foo" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunrolli1=foo -fenable-tree-cunrolli1=foo" } */
 
 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/unroll-4.c b/gcc/testsuite/gcc.dg/unroll-4.c
index 17f19421227..d62e2e7afa0 100644
--- a/gcc/testsuite/gcc.dg/unroll-4.c
+++ b/gcc/testsuite/gcc.dg/unroll-4.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli=foo -fdisable-tree-cunrolli=foo2" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli1=foo -fdisable-tree-cunrolli1=foo2" } */
 
 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/unroll-5.c b/gcc/testsuite/gcc.dg/unroll-5.c
index f3bdebe9882..c81467cd420 100644
--- a/gcc/testsuite/gcc.dg/unroll-5.c
+++ b/gcc/testsuite/gcc.dg/unroll-5.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli=foo2 -fdisable-tree-cunrolli=foo" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli1=foo2 -fdisable-tree-cunrolli1=foo" } */
 
 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
index 815b44e1f7c..2f7c17d803e 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
@@ -22,5 +22,5 @@  void foo (void)
 /* We should be able to vectorize the cycle in one SLP attempt including
    both load groups and do only one permutation.  */
 /* { dg-final { scan-tree-dump-times "transform load" 2 "slp1" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "loopdone" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "loopdone2" } } */
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr26359.c b/gcc/testsuite/gcc.dg/vect/pr26359.c
index 5b445a9bda7..f57535bff51 100644
--- a/gcc/testsuite/gcc.dg/vect/pr26359.c
+++ b/gcc/testsuite/gcc.dg/vect/pr26359.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
-/* { dg-additional-options "-fdump-tree-dce6-details" } */
+/* { dg-additional-options "-fdump-tree-dce7-details" } */
 
 int a[256], b[256], c[256];
 
@@ -13,4 +13,4 @@  foo () {
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Deleting : vect_" 0 "dce6" } } */
+/* { dg-final { scan-tree-dump-times "Deleting : vect_" 0 "dce7" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-profile-1.c b/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
index 922f965806f..a8b3ffb87d0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
-/* { dg-additional-options "-fdump-tree-vect-details-blocks -fdisable-tree-cunrolli" } */
+/* { dg-additional-options "-fdump-tree-vect-details-blocks -fdisable-tree-cunrolli1" } */
 
 /* At least one of these should correspond to a full vector.  */
 
diff --git a/gcc/testsuite/gcc.dg/vrp-min-max-2.c b/gcc/testsuite/gcc.dg/vrp-min-max-2.c
index 39360888823..17ef1795823 100644
--- a/gcc/testsuite/gcc.dg/vrp-min-max-2.c
+++ b/gcc/testsuite/gcc.dg/vrp-min-max-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp2" } */
+/* { dg-options "-O2 -fdump-tree-vrp3" } */
 /* { dg-require-effective-target int32plus } */
 
 int Foo (int X)
@@ -14,5 +14,5 @@  int Foo (int X)
 
 /* We expect this min/max pair to survive.  */
 
-/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "vrp2" } } */
-/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "vrp2" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "vrp3" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "vrp3" } } */
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c
index a5d953b46c7..6226579a19a 100644
--- a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile { target { { i?86-*-* x86_64-*-* s390*-*-* } && lp64 } } } */
-/* { dg-options "-O2 -fdump-tree-vrp2-details" } */
-/* { dg-final { scan-tree-dump-times "gimple_simplified to" 4 "vrp2" } } */
+/* { dg-options "-O2 -fdump-tree-vrp3-details" } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 4 "vrp3" } } */
 
 void v1 (unsigned long *in, unsigned long *out, unsigned int n)
 {
diff --git a/gcc/testsuite/gfortran.dg/directive_unroll_1.f90 b/gcc/testsuite/gfortran.dg/directive_unroll_1.f90
index d758ad74395..5b9316b909c 100644
--- a/gcc/testsuite/gfortran.dg/directive_unroll_1.f90
+++ b/gcc/testsuite/gfortran.dg/directive_unroll_1.f90
@@ -12,7 +12,7 @@  subroutine test1(a)
   DO i=1, 8, 1
     call dummy(a(i))
   ENDDO
-! { dg-final { scan-tree-dump "12:.*: loop with 8 iterations completely unrolled" "cunrolli" } } */
+! { dg-final { scan-tree-dump "12:.*: loop with 8 iterations completely unrolled" "cunrolli1" } } */
 end subroutine test1
 
 subroutine test2(a, n)
diff --git a/gcc/testsuite/gfortran.dg/directive_unroll_4.f90 b/gcc/testsuite/gfortran.dg/directive_unroll_4.f90
index fbb5f24e76f..6e6c78f6b20 100644
--- a/gcc/testsuite/gfortran.dg/directive_unroll_4.f90
+++ b/gcc/testsuite/gfortran.dg/directive_unroll_4.f90
@@ -25,5 +25,5 @@  subroutine test2(a, n)
   ENDDO
 end subroutine test2
 
-! { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */
+! { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli1" } } */
 ! { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 08772428c4c..7453bb53d7c 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -37,6 +37,6 @@  end program main
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index f2c4736e111..3138d0fea4e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -33,6 +33,6 @@  end program main
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
index a23ea81609b..e160713ab89 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
@@ -26,6 +26,6 @@  end program main
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccdevlow" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), omp target entrypoint\\)\\)" 1 "oaccdevlow1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95
index 401d5270391..aa40eb15b4a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95
@@ -25,6 +25,6 @@  end subroutine ROUTINE
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\)\\)\\)" 1 "oaccdevlow" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\)\\)\\)" 1 "oaccdevlow1" } }
diff --git a/gcc/testsuite/gnat.dg/unroll1.adb b/gcc/testsuite/gnat.dg/unroll1.adb
index 34d8a8f3f38..d7d70d76c81 100644
--- a/gcc/testsuite/gnat.dg/unroll1.adb
+++ b/gcc/testsuite/gnat.dg/unroll1.adb
@@ -23,5 +23,5 @@  package body Unroll1 is
 
 end Unroll1;
 
--- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli" } }
+-- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli1" } }
 -- { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } }
diff --git a/gcc/testsuite/gnat.dg/unroll2.adb b/gcc/testsuite/gnat.dg/unroll2.adb
index 1d3a75706de..13f05283f64 100644
--- a/gcc/testsuite/gnat.dg/unroll2.adb
+++ b/gcc/testsuite/gnat.dg/unroll2.adb
@@ -23,4 +23,4 @@  package body Unroll2 is
 
 end Unroll2;
 
--- { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 2 "cunrolli" } }
+-- { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 2 "cunrolli1" } }
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9cb22acc243..03ff4ebc8eb 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -478,6 +478,7 @@  extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_functions (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_warn_nonnull_compare (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 33e15d448b2..b62611dd8a9 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1587,6 +1587,7 @@  public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_complete_unroll (m_ctxt); }
 
 }; // class pass_complete_unroll
 
@@ -1646,6 +1647,7 @@  public:
   /* opt_pass methods: */
   virtual bool gate (function *) { return optimize >= 2; }
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_complete_unrolli (m_ctxt); }
 
 }; // class pass_complete_unrolli
 
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 5e8365d4e83..79ece2c941f 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -70,6 +70,9 @@  public:
   virtual bool gate (function *) { return flag_tree_loop_optimize; }
 
   virtual unsigned int execute (function *fn);
+
+  opt_pass * clone () { return new pass_fix_loops (m_ctxt); }
+
 }; // class pass_fix_loops
 
 unsigned int
@@ -202,6 +205,53 @@  make_pass_oacc_kernels (gcc::context *ctxt)
   return new pass_oacc_kernels (ctxt);
 }
 
+/* A superpass that runs only on OpenACC functions.  */
+
+namespace {
+
+const pass_data pass_data_oacc_functions =
+{
+  GIMPLE_PASS, /* type */
+  "*oacc_functions", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_functions: public gimple_opt_pass
+{
+public:
+  pass_oacc_functions (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_functions, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) {
+    if (!flag_openacc)
+      return false;
+
+    if (!oacc_get_fn_attrib (fn->decl))
+      return false;
+
+    return true;
+  }
+
+}; // class pass_oacc_functions
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_functions (gcc::context *ctxt)
+{
+  return new pass_oacc_functions (ctxt);
+}
+
+
+
 /* The ipa oacc superpass.  */
 
 namespace {
@@ -344,6 +394,8 @@  public:
   /* opt_pass methods: */
   virtual unsigned int execute (function *);
 
+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
+
 }; // class pass_tree_loop_init
 
 unsigned int
@@ -558,6 +610,8 @@  public:
   /* opt_pass methods: */
   virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
 
+  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
+
 }; // class pass_tree_loop_done
 
 } // anon namespace
diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
index 024da8c408c..6c67e95c0b0 100644
--- a/gcc/tree-ssa-phiprop.c
+++ b/gcc/tree-ssa-phiprop.c
@@ -479,6 +479,8 @@  public:
   virtual bool gate (function *) { return flag_tree_phiprop; }
   virtual unsigned int execute (function *);
 
+  opt_pass * clone () { return new pass_phiprop (m_ctxt); }
+
 }; // class pass_phiprop
 
 unsigned int
diff --git a/gcc/tree-ssa-sink.c b/gcc/tree-ssa-sink.c
index 207aae2818a..e64fd077b84 100644
--- a/gcc/tree-ssa-sink.c
+++ b/gcc/tree-ssa-sink.c
@@ -816,6 +816,8 @@  public:
   virtual bool gate (function *) { return flag_tree_sink != 0; }
   virtual unsigned int execute (function *);
 
+  opt_pass * clone () { return new pass_sink_code (m_ctxt); }
+
 }; // class pass_sink_code
 
 unsigned int
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c
index 44767cd27c3..9fff1a35143 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile }  */
-/* { dg-options "-O2 -fdump-tree-cddce2 -ffinite-loops" } */
+/* { dg-options "-O2 -fdump-tree-cddce -ffinite-loops" } */
 
 int
 f1 (void)
@@ -28,4 +28,4 @@  f2 (void)
 
   return i + j;
 }
-/* { dg-final { scan-tree-dump-not "if" "cddce2"} } */
+/* { dg-final { scan-tree-dump-not "if" "cddce3"} } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
index d45326488cd..a04905eab2d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
@@ -7,5 +7,5 @@ 
 
 #include "pr85486.c"
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow{2,3}" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
index 33480a4ae68..abd36f93686 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
@@ -7,5 +7,5 @@ 
 
 #include "pr85486.c"
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow{2,3}" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
index 0d98b82f993..78df5b140ba 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
@@ -54,5 +54,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow{2,3}" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
index 18d77cc5ecb..085d7ffe287 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
@@ -34,5 +34,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
index 8b5b2a4a92d..391aa845f42 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
@@ -35,5 +35,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
index 59be37a7c27..3be8b21ef01 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
@@ -38,5 +38,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
index e5d1df09b8a..a9a00d1141b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
@@ -36,5 +36,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
index e60f1c28db4..1633a6ca81a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
@@ -37,5 +37,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
index a1f67622f84..16af1c9e6c6 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
@@ -37,5 +37,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
index c419f6499b5..57830542ad0 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
@@ -36,5 +36,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow*" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=8, vectors=128" } */
-- 
2.17.1