V2 [PATCH] x86: Use _rdtsc intrinsic for HP_TIMING_NOW

Message ID CAMe9rOpeMG7ES2q3iLt9AjZ04mynkWKm1BJs5taZi3B+a9=HsQ@mail.gmail.com
State New
Headers show
Series
  • V2 [PATCH] x86: Use _rdtsc intrinsic for HP_TIMING_NOW
Related show

Commit Message

H.J. Lu Oct. 3, 2018, 11:08 p.m.
On Tue, Oct 2, 2018 at 11:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Tue, Oct 2, 2018 at 10:23 AM Adhemerval Zanella

> <adhemerval.zanella@linaro.org> wrote:

> >

> >

> >

> > On 02/10/2018 13:56, H.J. Lu wrote:

> > > On Tue, Oct 2, 2018 at 6:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> > >>

> > >> On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella

> > >> <adhemerval.zanella@linaro.org> wrote:

> > >

> > >>> I am not very found of cross abi includes like this one, wouldn't be better

> > >>> to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff

> > >>> __i686__ is defined otherwise include generic header?

> > >>

> > >> I can do that.

> > >

> > > Here is the V2 patch.  OK for master?

> > >

> >

> > > NB: Checking if __i686__ isn't sufficient since __i686__ may not be

> > > defined when building for i686 class processors.

> >

> > Right, it seems gcc does not define it for -march newer than pentium4.

> >

> > > diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h

> > > similarity index 89%

> > > rename from sysdeps/i386/i586/init-arch.h

> > > rename to sysdeps/i386/i586/isa.h

> > > index 72fb46c61e..a2b5d585d4 100644

> > > --- a/sysdeps/i386/i586/init-arch.h

> > > +++ b/sysdeps/i386/i586/isa.h

> > > @@ -1,4 +1,4 @@

> > > -/* Copyright (C) 2015-2018 Free Software Foundation, Inc.

> > > +/* Copyright (C) 2018 Free Software Foundation, Inc.

> > >     This file is part of the GNU C Library.

> > >

> > >     The GNU C Library is free software; you can redistribute it and/or

> > > @@ -15,5 +15,9 @@

> > >     License along with the GNU C Library; if not, see

> > >     <http://www.gnu.org/licenses/>.  */

> > >

> > > +#ifndef _isa_h

> > > +#define _isa_h

> > > +

> >

> > Not sure if this is explicit on our coding rules, but usually we

> > use uppercase caps for preprocessor guards.  I think it requires

> > a one-line description as well (same for other isa.h files).

>

> I will change that.

>

> > > -#ifndef _HP_TIMING_H

> > > -#define _HP_TIMING_H 1

> > > +#include <isa.h>

> > > +

> > > +#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664

> > > +# ifndef _HP_TIMING_H

> > > +#  define _HP_TIMING_H       1

> > > +

> >

> > The include guard inside the MINIMUM_ISA check seems off, why do

> > you need it?

>

> <sysdeps/generic/hp-timing.h> will be skipped if _HP_TIMING_H is

> defined.  That is the draw back for the single x86 hp-timing.h.


Here is the updated patch.  OK for master?

-- 
H.J.

Comments

Adhemerval Zanella Oct. 17, 2018, 1:15 p.m. | #1
On 03/10/2018 20:08, H.J. Lu wrote:
> On Tue, Oct 2, 2018 at 11:06 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>> On Tue, Oct 2, 2018 at 10:23 AM Adhemerval Zanella

>> <adhemerval.zanella@linaro.org> wrote:

>>>

>>>

>>> On 02/10/2018 13:56, H.J. Lu wrote:

>>>> On Tue, Oct 2, 2018 at 6:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>> On Tue, Oct 2, 2018 at 6:15 AM Adhemerval Zanella

>>>>> <adhemerval.zanella@linaro.org> wrote:

>>>>>> I am not very found of cross abi includes like this one, wouldn't be better

>>>>>> to create a 'sysdeps/x86/hp-timing.h' and for 32-bit use _rdtsc iff

>>>>>> __i686__ is defined otherwise include generic header?

>>>>> I can do that.

>>>> Here is the V2 patch.  OK for master?

>>>>

>>>> NB: Checking if __i686__ isn't sufficient since __i686__ may not be

>>>> defined when building for i686 class processors.

>>> Right, it seems gcc does not define it for -march newer than pentium4.

>>>

>>>> diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h

>>>> similarity index 89%

>>>> rename from sysdeps/i386/i586/init-arch.h

>>>> rename to sysdeps/i386/i586/isa.h

>>>> index 72fb46c61e..a2b5d585d4 100644

>>>> --- a/sysdeps/i386/i586/init-arch.h

>>>> +++ b/sysdeps/i386/i586/isa.h

>>>> @@ -1,4 +1,4 @@

>>>> -/* Copyright (C) 2015-2018 Free Software Foundation, Inc.

>>>> +/* Copyright (C) 2018 Free Software Foundation, Inc.

>>>>     This file is part of the GNU C Library.

>>>>

>>>>     The GNU C Library is free software; you can redistribute it and/or

>>>> @@ -15,5 +15,9 @@

>>>>     License along with the GNU C Library; if not, see

>>>>     <http://www.gnu.org/licenses/>.  */

>>>>

>>>> +#ifndef _isa_h

>>>> +#define _isa_h

>>>> +

>>> Not sure if this is explicit on our coding rules, but usually we

>>> use uppercase caps for preprocessor guards.  I think it requires

>>> a one-line description as well (same for other isa.h files).

>> I will change that.

>>

>>>> -#ifndef _HP_TIMING_H

>>>> -#define _HP_TIMING_H 1

>>>> +#include <isa.h>

>>>> +

>>>> +#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664

>>>> +# ifndef _HP_TIMING_H

>>>> +#  define _HP_TIMING_H       1

>>>> +

>>> The include guard inside the MINIMUM_ISA check seems off, why do

>>> you need it?

>> <sysdeps/generic/hp-timing.h> will be skipped if _HP_TIMING_H is

>> defined.  That is the draw back for the single x86 hp-timing.h.

> Here is the updated patch.  OK for master?


LGTM.

> index 59af526fdb..1c20e9d828 100644

> --- a/sysdeps/i386/i686/hp-timing.h

> +++ b/sysdeps/x86/hp-timing.h

> @@ -1,7 +1,6 @@

> -/* High precision, low overhead timing functions.  i686 version.

> -   Copyright (C) 1998-2018 Free Software Foundation, Inc.

> +/* High precision, low overhead timing functions.  x86 version.

> +   Copyright (C) 2018 Free Software Foundation, Inc.

>     This file is part of the GNU C Library.

> -   Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.

>  

>     The GNU C Library is free software; you can redistribute it and/or

>     modify it under the terms of the GNU Lesser General Public

> @@ -20,12 +19,17 @@

>  #ifndef _HP_TIMING_H

>  #define _HP_TIMING_H	1


Another possibility is define _HP_TIMING_X86_1 and not undefine it below,
I don't have a strong opinion, so either way look good to me.
Florian Weimer Oct. 20, 2018, 4 p.m. | #2
* H. J. Lu:

> Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for

> HP_TIMING_NOW.  This patch

>

> 1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h.

> 2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h

> can check minimum x86 ISA to decide if _rdtsc can be used.

>

> NB: Checking if __i686__ isn't sufficient since __i686__ may not be

> defined when building for i686 class processors.

>

> 	* sysdeps/i386/init-arch.h: Removed.

> 	* sysdeps/i386/i586/init-arch.h: Likewise.

> 	* sysdeps/i386/i686/init-arch.h: Likewise.

> 	* sysdeps/i386/i686/hp-timing.h: Likewise.

> 	* sysdeps/x86_64/hp-timing.h: Likewise.

> 	* sysdeps/i386/isa.h: New file.

> 	* sysdeps/i386/i586/isa.h: Likewise.

> 	* sysdeps/i386/i686/isa.h: Likewise.

> 	* sysdeps/x86_64/isa.h: Likewise.

> 	* sysdeps/x86/hp-timing.h: New file.

> 	* sysdeps/x86/init-arch.h: Include <isa.h>.


This patch results in a substantial build time increase on Debian with
GCC 6: instead of less than two minutes I now see more than four
minutes for a full build (excluding testing).

I tried to fix this by avoiding the #undef in sysdeps/x86/hp-timing.h,
as Adhemerval suggested, but it did not help.  It turns out that
including <x86intrin.h> in the old header (sysdeps/x86_64/hp-timing.h)
is sufficient to trigger the regression in build performance.

I think the regression substantial enough to revert the patch.
Comments?

I'm trying to remove the #include <hp-timing.h> from
<libc-internal.h>, but I don't know how hard this will be.
Florian Weimer Oct. 20, 2018, 4:15 p.m. | #3
* Florian Weimer:


> I'm trying to remove the #include <hp-timing.h> from

> <libc-internal.h>, but I don't know how hard this will be.


That doesn't help.  It's pulled in via <tls.h> and <descr.h> as well.
We would need a separate <hp_timing_t.h> header file to avoid that
because hp_timing_t is required in a few central places (thread
descriptor and link map).
H.J. Lu Oct. 20, 2018, 10:27 p.m. | #4
On 10/20/18, Florian Weimer <fw@deneb.enyo.de> wrote:
> * Florian Weimer:

>

>

>> I'm trying to remove the #include <hp-timing.h> from

>> <libc-internal.h>, but I don't know how hard this will be.

>

> That doesn't help.  It's pulled in via <tls.h> and <descr.h> as well.

> We would need a separate <hp_timing_t.h> header file to avoid that

> because hp_timing_t is required in a few central places (thread

> descriptor and link map).

>


Please try this.

-- 
H.J.
From a4fac17d8bc95fc405faa1fac0a383719b72bccf Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 20 Oct 2018 15:22:01 -0700
Subject: [PATCH] x86: Don't include <x86intrin.h>

Use __builtin_ia32_rdtsc directly since including <x86intrin.h> is very
slow with GCC 6.

	* sysdeps/x86/hp-timing.h: Don't include <x86intrin.h>.
	(HP_TIMING_NOW): Replace _rdtsc with __builtin_ia32_rdtsc.
---
 sysdeps/x86/hp-timing.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/sysdeps/x86/hp-timing.h b/sysdeps/x86/hp-timing.h
index 1c20e9d828..950a2feb6b 100644
--- a/sysdeps/x86/hp-timing.h
+++ b/sysdeps/x86/hp-timing.h
@@ -22,8 +22,6 @@
 #include <isa.h>
 
 #if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664
-# include <x86intrin.h>
-
 /* We always assume having the timestamp register.  */
 # define HP_TIMING_AVAIL	(1)
 # define HP_SMALL_TIMING_AVAIL	(1)
@@ -38,8 +36,11 @@ typedef unsigned long long int hp_timing_t;
    might not be 100% accurate since there might be some more instructions
    running in this moment.  This could be changed by using a barrier like
    'cpuid' right before the `rdtsc' instruciton.  But we are not interested
-   in accurate clock cycles here so we don't do this.  */
-# define HP_TIMING_NOW(Var)	((Var) = _rdtsc ())
+   in accurate clock cycles here so we don't do this.
+
+   NB: Use __builtin_ia32_rdtsc directly since including <x86intrin.h>
+   is very slow with GCC 6.  */
+# define HP_TIMING_NOW(Var)	((Var) = __builtin_ia32_rdtsc ())
 
 # include <hp-timing-common.h>
 #else
Florian Weimer Oct. 21, 2018, 6:57 a.m. | #5
* H. J. Lu:

> +   NB: Use __builtin_ia32_rdtsc directly since including <x86intrin.h>

> +   is very slow with GCC 6.  */


Thanks for the patch.

It fixes the issue, but the comment is wrong: I see the same slowdown
with GCC 8.

Patch

From 01974a82401e0facbefe46f5d5a1423314083d7d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 1 Oct 2018 15:05:34 -0700
Subject: [PATCH] x86: Use _rdtsc intrinsic for HP_TIMING_NOW

Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
HP_TIMING_NOW.  This patch

1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h.
2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h
can check minimum x86 ISA to decide if _rdtsc can be used.

NB: Checking if __i686__ isn't sufficient since __i686__ may not be
defined when building for i686 class processors.

	* sysdeps/i386/init-arch.h: Removed.
	* sysdeps/i386/i586/init-arch.h: Likewise.
	* sysdeps/i386/i686/init-arch.h: Likewise.
	* sysdeps/i386/i686/hp-timing.h: Likewise.
	* sysdeps/x86_64/hp-timing.h: Likewise.
	* sysdeps/i386/isa.h: New file.
	* sysdeps/i386/i586/isa.h: Likewise.
	* sysdeps/i386/i686/isa.h: Likewise.
	* sysdeps/x86_64/isa.h: Likewise.
	* sysdeps/x86/hp-timing.h: New file.
	* sysdeps/x86/init-arch.h: Include <isa.h>.
---
 sysdeps/i386/i586/{init-arch.h => isa.h} |  9 ++++--
 sysdeps/i386/i686/{init-arch.h => isa.h} |  9 ++++--
 sysdeps/i386/{init-arch.h => isa.h}      |  9 ++++--
 sysdeps/{i386/i686 => x86}/hp-timing.h   | 28 +++++++++++------
 sysdeps/x86/init-arch.h                  |  1 +
 sysdeps/x86_64/hp-timing.h               | 40 ------------------------
 sysdeps/x86_64/isa.h                     | 24 ++++++++++++++
 7 files changed, 65 insertions(+), 55 deletions(-)
 rename sysdeps/i386/i586/{init-arch.h => isa.h} (85%)
 rename sysdeps/i386/i686/{init-arch.h => isa.h} (85%)
 rename sysdeps/i386/{init-arch.h => isa.h} (85%)
 rename sysdeps/{i386/i686 => x86}/hp-timing.h (69%)
 delete mode 100644 sysdeps/x86_64/hp-timing.h
 create mode 100644 sysdeps/x86_64/isa.h

diff --git a/sysdeps/i386/i586/init-arch.h b/sysdeps/i386/i586/isa.h
similarity index 85%
rename from sysdeps/i386/i586/init-arch.h
rename to sysdeps/i386/i586/isa.h
index 72fb46c61e..79481ce680 100644
--- a/sysdeps/i386/i586/init-arch.h
+++ b/sysdeps/i386/i586/isa.h
@@ -1,4 +1,5 @@ 
-/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
+/* x86 ISA info.  i586 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -15,5 +16,9 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef _ISA_H
+#define _ISA_H
+
 #define MINIMUM_ISA 586
-#include <sysdeps/x86/init-arch.h>
+
+#endif
diff --git a/sysdeps/i386/i686/init-arch.h b/sysdeps/i386/i686/isa.h
similarity index 85%
rename from sysdeps/i386/i686/init-arch.h
rename to sysdeps/i386/i686/isa.h
index ab99392b58..584e26bd4f 100644
--- a/sysdeps/i386/i686/init-arch.h
+++ b/sysdeps/i386/i686/isa.h
@@ -1,4 +1,5 @@ 
-/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
+/* x86 ISA info.  i686 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -15,5 +16,9 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef _ISA_H
+#define _ISA_H
+
 #define MINIMUM_ISA 686
-#include <sysdeps/x86/init-arch.h>
+
+#endif
diff --git a/sysdeps/i386/init-arch.h b/sysdeps/i386/isa.h
similarity index 85%
rename from sysdeps/i386/init-arch.h
rename to sysdeps/i386/isa.h
index 043089ceb9..e0a1e9c84f 100644
--- a/sysdeps/i386/init-arch.h
+++ b/sysdeps/i386/isa.h
@@ -1,4 +1,5 @@ 
-/* Copyright (C) 2015-2018 Free Software Foundation, Inc.
+/* x86 ISA info.  i486 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -15,5 +16,9 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef _ISA_H
+#define _ISA_H
+
 #define MINIMUM_ISA 486
-#include <sysdeps/x86/init-arch.h>
+
+#endif
diff --git a/sysdeps/i386/i686/hp-timing.h b/sysdeps/x86/hp-timing.h
similarity index 69%
rename from sysdeps/i386/i686/hp-timing.h
rename to sysdeps/x86/hp-timing.h
index 59af526fdb..1c20e9d828 100644
--- a/sysdeps/i386/i686/hp-timing.h
+++ b/sysdeps/x86/hp-timing.h
@@ -1,7 +1,6 @@ 
-/* High precision, low overhead timing functions.  i686 version.
-   Copyright (C) 1998-2018 Free Software Foundation, Inc.
+/* High precision, low overhead timing functions.  x86 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -20,12 +19,17 @@ 
 #ifndef _HP_TIMING_H
 #define _HP_TIMING_H	1
 
+#include <isa.h>
+
+#if MINIMUM_ISA == 686 || MINIMUM_ISA == 8664
+# include <x86intrin.h>
+
 /* We always assume having the timestamp register.  */
-#define HP_TIMING_AVAIL		(1)
-#define HP_SMALL_TIMING_AVAIL	(1)
+# define HP_TIMING_AVAIL	(1)
+# define HP_SMALL_TIMING_AVAIL	(1)
 
 /* We indeed have inlined functions.  */
-#define HP_TIMING_INLINE	(1)
+# define HP_TIMING_INLINE	(1)
 
 /* We use 64bit values for the times.  */
 typedef unsigned long long int hp_timing_t;
@@ -35,8 +39,14 @@  typedef unsigned long long int hp_timing_t;
    running in this moment.  This could be changed by using a barrier like
    'cpuid' right before the `rdtsc' instruciton.  But we are not interested
    in accurate clock cycles here so we don't do this.  */
-#define HP_TIMING_NOW(Var)	__asm__ __volatile__ ("rdtsc" : "=A" (Var))
+# define HP_TIMING_NOW(Var)	((Var) = _rdtsc ())
 
-#include <hp-timing-common.h>
+# include <hp-timing-common.h>
+#else
+/* NB: Undefine _HP_TIMING_H so that <sysdeps/generic/hp-timing.h> will
+   be included.  */
+# undef _HP_TIMING_H
+# include <sysdeps/generic/hp-timing.h>
+#endif
 
-#endif	/* hp-timing.h */
+#endif /* hp-timing.h */
diff --git a/sysdeps/x86/init-arch.h b/sysdeps/x86/init-arch.h
index a81ca8a4eb..bc860fcd69 100644
--- a/sysdeps/x86/init-arch.h
+++ b/sysdeps/x86/init-arch.h
@@ -21,6 +21,7 @@ 
 # include <ldsodefs.h>
 #endif
 #include <ifunc-init.h>
+#include <isa.h>
 
 #ifndef __x86_64__
 /* Due to the reordering and the other nifty extensions in i686, it is
diff --git a/sysdeps/x86_64/hp-timing.h b/sysdeps/x86_64/hp-timing.h
deleted file mode 100644
index ec543bef03..0000000000
--- a/sysdeps/x86_64/hp-timing.h
+++ /dev/null
@@ -1,40 +0,0 @@ 
-/* High precision, low overhead timing functions.  x86-64 version.
-   Copyright (C) 2002-2018 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef _HP_TIMING_H
-#define _HP_TIMING_H	1
-
-/* We always assume having the timestamp register.  */
-#define HP_TIMING_AVAIL		(1)
-#define HP_SMALL_TIMING_AVAIL	(1)
-
-/* We indeed have inlined functions.  */
-#define HP_TIMING_INLINE	(1)
-
-/* We use 64bit values for the times.  */
-typedef unsigned long long int hp_timing_t;
-
-/* The "=A" constraint used in 32-bit mode does not work in 64-bit mode.  */
-#define HP_TIMING_NOW(Var) \
-  ({ unsigned int _hi, _lo; \
-     asm volatile ("rdtsc" : "=a" (_lo), "=d" (_hi)); \
-     (Var) = ((unsigned long long int) _hi << 32) | _lo; })
-
-#include <hp-timing-common.h>
-
-#endif /* hp-timing.h */
diff --git a/sysdeps/x86_64/isa.h b/sysdeps/x86_64/isa.h
new file mode 100644
index 0000000000..452bce75eb
--- /dev/null
+++ b/sysdeps/x86_64/isa.h
@@ -0,0 +1,24 @@ 
+/* x86 ISA info.  x86-64 version.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _ISA_H
+#define _ISA_H
+
+#define MINIMUM_ISA 8664
+
+#endif
-- 
2.17.1