[libgomp,OpenMP,5.0] Implement omp_get_device_num

Message ID 32c3546a-49c1-bedf-3002-c4e7783bf312@codesourcery.com
State Superseded
Headers show
Series
  • [libgomp,OpenMP,5.0] Implement omp_get_device_num
Related show

Commit Message

Chung-Lin Tang July 23, 2021, 10:21 a.m.
Hi all,
this patch implements the omp_get_device_num API function, which appears
to be a missing piece in the library routines implementation.

The host-side implementation is simple, which by specification is equivalent
to omp_get_initial_device.

Inside offloaded regions, the preferred way to should be that the device
already has this information initialized (once) when the device is initialized.
And the function merely returns the stored value.

This implementation adds a convention for an additional entry (dubbed under 'others'
in the code) returned by the 'load_image' plugin hook. Basically we define
a variable name in libgomp-plugin.h, which the device libgomp defines, and the
offload plugin searches for, and returns the variable device location start/end for
gomp_load_image_from_device to initialize. The device-side omp_get_device_num
then just returns that value.

This patch implements for gcn and nvptx offload targets. The icv-device.c file is
starting to look like a file ready to consolidate away the target specific versions,
but that's for later.

Basic libgomp tests were added for C/C++ and Fortran. Tested without regressions
with offloading for amdgcn and nvptx on x86_64-linux host. Okay for trunk?

Thanks,
Chung-Lin

2021-07-23  Chung-Lin Tang  <cltang@codesourcery.com>

libgomp/ChangeLog

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.1): Add omp_get_device_num, omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* config/plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* config/plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.

Comments

Martin Sebor via Gcc-patches July 23, 2021, 10:39 a.m. | #1
On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote:
> --- a/libgomp/icv-device.c

> +++ b/libgomp/icv-device.c

> @@ -61,8 +61,17 @@ omp_is_initial_device (void)

>    return 1;

>  }

>  

> +int

> +omp_get_device_num (void)

> +{

> +  /* By specification, this is equivalent to omp_get_initial_device

> +     on the host.  */

> +  return omp_get_initial_device ();

> +}

> +


I think this won't work properly with the intel micoffload, where the host
libgomp is used in the offloaded code.
For omp_is_initial_device, the plugin solves it by:
liboffloadmic/plugin/offload_target_main.cpp
overriding it:
/* Override the corresponding functions from libgomp.  */
extern "C" int
omp_is_initial_device (void) __GOMP_NOTHROW
{
  return 0;
}
   
extern "C" int32_t
omp_is_initial_device_ (void)
{
  return omp_is_initial_device ();
}
but guess it will need slightly more work because we need to copy the value
to the offloading device too.
It can be done incrementally though.

> --- a/libgomp/libgomp-plugin.h

> +++ b/libgomp/libgomp-plugin.h

> @@ -102,6 +102,12 @@ struct addr_pair

>    uintptr_t end;

>  };

>  

> +/* This symbol is to name a target side variable that holds the designated

> +   'device number' of the target device. The symbol needs to be available to

> +   libgomp code and the  offload plugin (which in the latter case must be

> +   stringified).  */

> +#define GOMP_DEVICE_NUM_VAR __gomp_device_num


For a single var it is acceptable (though, please avoid the double space
before offload plugin in the comment), but once we have more than one
variable, I think we should simply have a struct which will contain all the
parameters that need to be copied from the host to the offloading device at
image load time (and have eventually another struct that holds parameters
that we'll need to copy to the device on each kernel launch, I bet some ICVs
will be one category, other ICVs another one).

> diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map

> index 8ea27b5565f..ffcb98ae99e 100644

> --- a/libgomp/libgomp.map

> +++ b/libgomp/libgomp.map

> @@ -197,6 +197,8 @@ OMP_5.0.1 {

>  	omp_get_supported_active_levels_;

>  	omp_fulfill_event;

>  	omp_fulfill_event_;

> +	omp_get_device_num;

> +	omp_get_device_num_;

>  } OMP_5.0;


This is wrong.  We've already released GCC 11.1 with the OMP_5.0.1
symbol version, so we must not add any further symbols into that symbol
version.  OpenMP 5.0 routines added in GCC 12 should be OMP_5.0.2 symbol
version.

	Jakub
Tobias Burnus July 23, 2021, 11:01 a.m. | #2
On 23.07.21 12:21, Chung-Lin Tang wrote:
> Inside offloaded regions, the preferred way to should be that the device

> already has this information initialized (once) when the device is

> initialized.

> And the function merely returns the stored value.

...
> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c

> @@ -0,0 +1,23 @@

> +/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } } */

...
> +  int device_num;

> +  #pragma omp target map(from: device_num)

> +  {

> +    device_num = omp_get_device_num ();

> +  }

> +

> +  if (host_device_num == device_num)

> +    abort ();


I personally prefer having:
    int initial_dev;
and inside 'omp target' (with 'map(from:initial_dev)'):
    initial_device = omp_is_initial_device();

Then the check would be:
   if (initial_device && host_device_num != device_num)
     abort();
   if (!initial_device && host_device_num == device_num)
     abort();

(Likewise for Fortran.)

And instead of restricting the target to nvptx/gcn, we could just add
dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*.

Additionally, offload_target_nvptx/...amdgcn only check whether
compilation support is available not whether a device exists
at run time.
(The device availability is checked by target_offload_device,
using omp_is_initial_device().)

Tobias

PS: For completeness, I want to note that OpenMP 5.1 supports
setting the per-device ICV as via the environment variables,
besides inheriting the generic ICV values, device-specific
settings are possible with:
   <ENVIRONMENT VARIABLE>_DEV[_<device>]
Thus, more data will be passed from libgomp to the plugins
in the future.

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Chung-Lin Tang Aug. 2, 2021, 1:09 p.m. | #3
On 2021/7/23 7:01 PM, Tobias Burnus wrote:
> I personally prefer having:

>     int initial_dev;

> and inside 'omp target' (with 'map(from:initial_dev)'):

>     initial_device = omp_is_initial_device();

> 

> Then the check would be:

>    if (initial_device && host_device_num != device_num)

>      abort();

>    if (!initial_device && host_device_num == device_num)

>      abort();

> 

> (Likewise for Fortran.)


Thanks, I've adjusted the new testcases to use this style.

> And instead of restricting the target to nvptx/gcn, we could just add

> dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*.


I've added a 'offload_target_intelmic' to use on the new testcases.

> Additionally, offload_target_nvptx/...amdgcn only check whether

> compilation support is available not whether a device exists

> at run time.

> (The device availability is checked by target_offload_device,

> using omp_is_initial_device().)


I guess there is value in testing compilation as long as the compiler
is properly configured, and leaving the execution as an independent test.
OTOH, I think the OpenMP execution tests are not properly forcing offload
(or not) using the environment variables, unlike what we have for OpenACC.

Thanks,
Chung-Lin

Patch

diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 72d4f7cff74..8f72028a6c8 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@  omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c
index 3b96890f338..e586da1d3a8 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -58,8 +58,19 @@  omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 4ec39c4e61b..2360582e32e 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -598,6 +598,12 @@  omp_get_initial_device_ (void)
   return omp_get_initial_device ();
 }
 
+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
 int32_t
 omp_get_max_task_priority_ (void)
 {
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index c1bedf46647..f11bdfa85c4 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@  omp_is_initial_device (void)
   return 1;
 }
 
+int
+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+     on the host.  */
+  return omp_get_initial_device ();
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 62645ce9954..3f0983c1ea4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -102,6 +102,12 @@  struct addr_pair
   uintptr_t end;
 };
 
+/* This symbol is to name a target side variable that holds the designated
+   'device number' of the target device. The symbol needs to be available to
+   libgomp code and the  offload plugin (which in the latter case must be
+   stringified).  */
+#define GOMP_DEVICE_NUM_VAR __gomp_device_num
+
 /* Miscellaneous functions.  */
 extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
 extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 8ea27b5565f..ffcb98ae99e 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -197,6 +197,8 @@  OMP_5.0.1 {
 	omp_get_supported_active_levels_;
 	omp_fulfill_event;
 	omp_fulfill_event_;
+	omp_get_device_num;
+	omp_get_device_num_;
 } OMP_5.0;
 
 GOMP_1.0 {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 2c1f1b5968b..fc9e708a8d2 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -165,6 +165,7 @@  linkage, and do not throw exceptions.
 * omp_get_ancestor_thread_num:: Ancestor thread ID
 * omp_get_cancellation::        Whether cancellation support is enabled
 * omp_get_default_device::      Get the default device for target regions
+* omp_get_device_num::          Get device that current thread is running on
 * omp_get_dynamic::             Dynamic teams setting
 * omp_get_initial_device::      Device number of host device
 * omp_get_level::               Number of parallel regions
@@ -385,6 +386,34 @@  For OpenMP 5.1, this must be equal to the value returned by the
 
 
 
+@node omp_get_device_num
+@section @code{omp_get_device_num} -- Return device number of current device
+@table @asis
+@item @emph{Description}:
+This function returns a device number that represents the device that the
+current thread is executing on. For OpenMP 5.0, this must be equal to the
+value returned by the @code{omp_get_initial_device} function when called
+from the host.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_initial_device}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
+@end table
+
+
+
 @node omp_get_level
 @section @code{omp_get_level} -- Obtain the current nesting level
 @table @asis
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 69f96f09124..3bfd0aee6db 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -243,6 +243,7 @@  extern void omp_get_partition_place_nums (int *) __GOMP_NOTHROW;
 extern void omp_set_default_device (int) __GOMP_NOTHROW;
 extern int omp_get_default_device (void) __GOMP_NOTHROW;
 extern int omp_get_num_devices (void) __GOMP_NOTHROW;
+extern int omp_get_device_num (void) __GOMP_NOTHROW;
 extern int omp_get_num_teams (void) __GOMP_NOTHROW;
 extern int omp_get_team_num (void) __GOMP_NOTHROW;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 851f85f5316..adc20726296 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -550,6 +550,12 @@ 
           end function omp_get_initial_device
         end interface
 
+        interface
+          function omp_get_device_num ()
+            integer (4) :: omp_get_device_num
+          end function omp_get_device_num
+        end interface
+
         interface
           function omp_get_max_task_priority ()
             integer (4) :: omp_get_max_task_priority
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index 06d17b5fcdc..b006509d075 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -244,6 +244,9 @@ 
       external omp_get_initial_device
       integer(4) omp_get_initial_device
 
+      external omp_get_device_num
+      integer(4) omp_get_device_num
+
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index cfed42a2d4d..76906a52bc2 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -29,6 +29,7 @@ 
 /* {{{ Includes and defines  */
 
 #include "config.h"
+#include "symcat.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -3310,6 +3311,7 @@  GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   struct kernel_info *kernel;
   int kernel_count = image_desc->kernel_count;
   unsigned var_count = image_desc->global_variable_count;
+  int other_count = 1;
 
   agent = get_agent_info (ord);
   if (!agent)
@@ -3326,7 +3328,8 @@  GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
   GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
-  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count - 2)
+  GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
+  pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
 			     * sizeof (struct addr_pair));
   *target_table = pair;
   module = (struct module_info *)
@@ -3401,6 +3404,35 @@  GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       pair++;
     }
 
+  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+
+  hsa_status_t status;
+  hsa_executable_symbol_t var_symbol;
+  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						 STRINGX (GOMP_DEVICE_NUM_VAR),
+						 agent->id, 0, &var_symbol);
+  if (status == HSA_STATUS_SUCCESS)
+    {
+      uint64_t device_num_varptr;
+      uint32_t device_num_varsize;
+
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &device_num_varptr);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable from its symbol", status);
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+	 &device_num_varsize);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable size from its symbol", status);
+
+      pair->start = device_num_varptr;
+      pair->end = device_num_varptr + device_num_varsize;
+    }
+  else
+    pair->start = pair->end = 0;
+
   /* Ensure that constructors are run first.  */
   struct GOMP_kernel_launch_attributes kla =
     { 3,
@@ -3423,7 +3455,7 @@  GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (module->fini_array_func)
     kernel_count--;
 
-  return kernel_count + var_count;
+  return kernel_count + var_count + other_count;
 }
 
 /* Unload GCN object-code module described by struct gcn_image_desc in
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1215212d501..8b34895a88f 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -34,6 +34,7 @@ 
 #define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
+#include "symcat.h"
 #include "libgomp-plugin.h"
 #include "oacc-plugin.h"
 #include "gomp-constants.h"
@@ -1265,7 +1266,7 @@  GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   CUmodule module;
   const char *const *var_names;
   const struct targ_fn_launch *fn_descs;
-  unsigned int fn_entries, var_entries, i, j;
+  unsigned int fn_entries, var_entries, other_entries, i, j;
   struct targ_fn_descriptor *targ_fns;
   struct addr_pair *targ_tbl;
   const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
@@ -1295,8 +1296,11 @@  GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   fn_entries = img_header->fn_num;
   fn_descs = img_header->fn_descs;
 
+  /* Currently, the only other entry kind is 'device number'.  */
+  other_entries = 1;
+
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
-				 * (fn_entries + var_entries));
+				 * (fn_entries + var_entries + other_entries));
   targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
 				 * fn_entries);
 
@@ -1345,9 +1349,22 @@  GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
       targ_tbl->end = targ_tbl->start + bytes;
     }
 
+  CUdeviceptr device_num_varptr;
+  size_t device_num_varsize;
+  CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
+				  &device_num_varsize, module,
+				  STRINGX (GOMP_DEVICE_NUM_VAR));
+  if (r == CUDA_SUCCESS)
+    {
+      targ_tbl->start = (uintptr_t) device_num_varptr;
+      targ_tbl->end = (uintptr_t) (device_num_varptr + device_num_varsize);
+    }
+  else
+    targ_tbl->start = targ_tbl->end = 0;
+
   nvptx_set_clocktick (module, dev);
 
-  return fn_entries + var_entries;
+  return fn_entries + var_entries + other_entries;
 }
 
 /* Unload the program described by TARGET_DATA.  DEV_DATA is the
diff --git a/libgomp/target.c b/libgomp/target.c
index bb09d501dd6..118871269b6 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1920,6 +1920,9 @@  gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
 
+  /* Others currently is only 'device_num' */
+  int num_others = 1;
+
   /* Load image to device and get target addresses for the image.  */
   struct addr_pair *target_table = NULL;
   int i, num_target_entries;
@@ -1928,7 +1931,9 @@  gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
     = devicep->load_image_func (devicep->target_id, version,
 				target_data, &target_table);
 
-  if (num_target_entries != num_funcs + num_vars)
+  if (num_target_entries != num_funcs + num_vars
+      /* Others (device_num) are included as trailing entries in pair list.  */
+      && num_target_entries != num_funcs + num_vars + num_others)
     {
       gomp_mutex_unlock (&devicep->lock);
       if (is_register_lock)
@@ -2000,6 +2005,31 @@  gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       array++;
     }
 
+  /* Last entry is for the on-device 'device_num' variable. Tolerate case
+     where plugin does not return this entry.  */
+  if (num_funcs + num_vars < num_target_entries)
+    {
+      struct addr_pair *device_num_var = &target_table[num_funcs + num_vars];
+      if (device_num_var->start != 0)
+	{
+	  /* The index of the devicep within devices[] is regarded as its
+	     'device number', which is different from the per-device type
+	     devicep->target_id.  */
+	  int device_num_val = (int) (devicep - &devices[0]);
+	  if (device_num_var->end - device_num_var->start != sizeof (int))
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      gomp_fatal ("offload plugin managed 'device_num' not of expected "
+			  "format");
+	    }
+
+	  /* Copy device_num value to place on device memory, hereby actually
+	     designating its device number into effect.  */
+	  gomp_copy_host2dev (devicep, NULL, (void *) device_num_var->start,
+			      &device_num_val, sizeof (int), NULL);
+	}
+    }
+
   free (target_table);
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
new file mode 100644
index 00000000000..b39a85154a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -0,0 +1,23 @@ 
+/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } } */
+
+#include <omp.h>
+#include <stdlib.h>
+
+int main (void)
+{
+  int host_device_num = omp_get_device_num ();
+
+  if (host_device_num != omp_get_initial_device ())
+    abort ();
+
+  int device_num;
+  #pragma omp target map(from: device_num)
+  {
+    device_num = omp_get_device_num ();
+  }
+
+  if (host_device_num == device_num)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
new file mode 100644
index 00000000000..329438c8afc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -0,0 +1,17 @@ 
+! { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } }
+
+program main
+  use omp_lib
+  implicit none
+  integer :: device_num, host_device_num
+
+  host_device_num = omp_get_device_num ()
+  if (host_device_num .ne. omp_get_initial_device ()) stop 1
+
+  !$omp target map(from: device_num)
+  device_num = omp_get_device_num ()
+  !$omp end target
+
+  if (host_device_num .eq. device_num) stop 2
+
+end program main