[00/20] RFC: Add the CPU run-time library for C

Message ID 20180612221939.19545-1-hjl.tools@gmail.com
Headers show
Series
  • RFC: Add the CPU run-time library for C
Related show

Message

H.J. Lu June 12, 2018, 10:19 p.m.
The current glibc has memory and string functions highly optimized for
the current processors on the market.  But it takes years for released
glibc to be installed on end-users machines.  In 2018, many machines
with the latest Intel processors are still running glibc 2.17, which
was released in February, 2013.

This patch set introduces the CPU run-time library for C, libcpu-rt-c.
libcpu-rt-c contains a subset of the C library with the optimized
functions.  The resulting libcpu-rt-c.so is binary compatible with
older versions of libc.so so that libcpu-rt-c.so can be used with
LD_PRELOAD or linked directly with applications.  For some workloads,
LD_PRELOAD=libcpu-rt-c.so has shown to improve performance by as much
as 20% on Skylake machine.

H.J. Lu (20):
  Initial empty CPU run-time library for C: libcpu-rt-c
  libcpu-rt-c/x86: Add cacheinfo
  libcpu-rt-c/x86: Add cpu-rt-tunables.c
  libcpu-rt-c/x86-64: Add memchr
  libcpu-rt-c/x86-64: Add memcmp
  libcpu-rt-c/x86-64: Add memcpy, memmove and mempcpy
  libcpu-rt-c/x86-64: Add memrchr
  libcpu-rt-c/x86-64: Add memset and wmemset
  libcpu-rt-c/i386: Add memcmp
  libcpu-rt-c: Don't use IFUNC memcmp in init_cpu_features
  libcpu-rt-c/x86-64: Add strchr
  libcpu-rt-c/x86-64: Add strcmp
  libcpu-rt-c/x86-64: Add strcpy
  libcpu-rt-c/x86-64: Add strlen
  libcpu-rt-c/x86-64: Add strcat
  libcpu-rt-c/x86-64: Add strnlen
  libcpu-rt-c/x86-64: Add strncat
  libcpu-rt-c/x86-64: Add strncmp
  libcpu-rt-c/x86-64: Add strncpy
  libcpu-rt-c/x86-64: Add strrchr

 Makeconfig                                    |  4 +-
 configure                                     | 17 +++++
 configure.ac                                  | 11 ++++
 cpu-rt-c/Makefile                             | 42 ++++++++++++
 cpu-rt-c/cpu-rt-misc.c                        | 22 +++++++
 cpu-rt-c/cpu-rt-support.h                     | 38 +++++++++++
 cpu-rt-c/cpu-rt-tunables.c                    | 28 ++++++++
 cpu-rt-c/dl-tunables.h                        | 57 ++++++++++++++++
 elf/dl-misc.c                                 |  2 +
 elf/dl-tunables.c                             |  2 +-
 shlib-versions                                |  3 +
 sysdeps/i386/Makefile                         |  7 ++
 sysdeps/i386/dl-procinfo.c                    | 16 +++--
 sysdeps/i386/i686/multiarch/Makefile          |  4 ++
 sysdeps/i386/i686/multiarch/memcmp-ia32.S     |  8 ++-
 sysdeps/i386/i686/multiarch/memcmp-sse4.S     |  2 +-
 sysdeps/i386/i686/multiarch/memcmp-ssse3.S    |  2 +-
 sysdeps/i386/i686/multiarch/memcmp.c          |  2 +-
 sysdeps/unix/sysv/linux/i386/dl-procinfo.h    |  4 +-
 sysdeps/x86/Makefile                          | 13 ++++
 sysdeps/x86/cacheinfo.c                       | 19 +++++-
 sysdeps/x86/cpu-features.c                    | 46 ++++++++-----
 sysdeps/x86/cpu-features.h                    | 12 +++-
 sysdeps/x86/cpu-rt-misc.c                     | 65 +++++++++++++++++++
 sysdeps/x86/cpu-rt-support.h                  | 21 ++++++
 sysdeps/x86/cpu-tunables.c                    | 14 +++-
 sysdeps/x86/dl-procinfo.c                     | 41 +++++++-----
 sysdeps/x86/dl-procinfo.h                     | 15 +++--
 sysdeps/x86/ldsodefs.h                        | 17 ++++-
 sysdeps/x86_64/Makefile                       | 10 +++
 sysdeps/x86_64/memchr.S                       |  2 +-
 sysdeps/x86_64/memmove.S                      | 12 +++-
 sysdeps/x86_64/memrchr.S                      |  6 ++
 sysdeps/x86_64/memset.S                       |  6 +-
 sysdeps/x86_64/multiarch/Makefile             | 30 +++++++++
 sysdeps/x86_64/multiarch/memchr-avx2.S        |  2 +-
 sysdeps/x86_64/multiarch/memchr-sse2.S        |  2 +-
 sysdeps/x86_64/multiarch/memchr.c             |  6 +-
 sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S  |  2 +-
 sysdeps/x86_64/multiarch/memcmp-sse2.S        |  2 +-
 sysdeps/x86_64/multiarch/memcmp-sse4.S        |  2 +-
 sysdeps/x86_64/multiarch/memcmp-ssse3.S       |  2 +-
 sysdeps/x86_64/multiarch/memcmp.c             |  4 +-
 sysdeps/x86_64/multiarch/memcpy-ssse3-back.S  |  6 +-
 sysdeps/x86_64/multiarch/memcpy-ssse3.S       |  6 +-
 sysdeps/x86_64/multiarch/memcpy.c             | 14 ++--
 .../multiarch/memmove-avx-unaligned-erms.S    |  2 +-
 .../multiarch/memmove-avx512-no-vzeroupper.S  |  8 ++-
 .../multiarch/memmove-avx512-unaligned-erms.S |  2 +-
 .../multiarch/memmove-sse2-unaligned-erms.S   |  2 +-
 .../multiarch/memmove-vec-unaligned-erms.S    | 33 ++++++----
 sysdeps/x86_64/multiarch/memmove.c            | 10 ++-
 sysdeps/x86_64/multiarch/mempcpy.c            | 10 ++-
 sysdeps/x86_64/multiarch/memrchr-avx2.S       |  2 +-
 sysdeps/x86_64/multiarch/memrchr-sse2.S       |  2 +-
 sysdeps/x86_64/multiarch/memrchr.c            |  8 ++-
 .../multiarch/memset-avx2-unaligned-erms.S    |  2 +-
 .../multiarch/memset-avx512-no-vzeroupper.S   |  4 +-
 .../multiarch/memset-avx512-unaligned-erms.S  |  2 +-
 .../multiarch/memset-sse2-unaligned-erms.S    |  8 ++-
 .../multiarch/memset-vec-unaligned-erms.S     | 17 +++--
 sysdeps/x86_64/multiarch/memset.c             |  4 +-
 .../x86_64/multiarch/strcat-sse2-unaligned.S  |  2 +-
 sysdeps/x86_64/multiarch/strcat-sse2.S        |  2 +-
 sysdeps/x86_64/multiarch/strcat-ssse3.S       |  2 +-
 sysdeps/x86_64/multiarch/strcat.c             |  4 +-
 sysdeps/x86_64/multiarch/strchr-avx2.S        |  2 +-
 sysdeps/x86_64/multiarch/strchr-sse2-no-bsf.S |  2 +-
 sysdeps/x86_64/multiarch/strchr-sse2.S        |  2 +-
 sysdeps/x86_64/multiarch/strchr.c             |  4 +-
 sysdeps/x86_64/multiarch/strcmp-avx2.S        |  2 +-
 .../x86_64/multiarch/strcmp-sse2-unaligned.S  |  2 +-
 sysdeps/x86_64/multiarch/strcmp-sse2.S        |  2 +-
 sysdeps/x86_64/multiarch/strcmp-ssse3.S       |  2 +-
 sysdeps/x86_64/multiarch/strcmp.c             |  4 +-
 .../x86_64/multiarch/strcpy-sse2-unaligned.S  |  2 +-
 sysdeps/x86_64/multiarch/strcpy-sse2.S        |  2 +-
 sysdeps/x86_64/multiarch/strcpy-ssse3.S       |  2 +-
 sysdeps/x86_64/multiarch/strcpy.c             |  4 +-
 sysdeps/x86_64/multiarch/strlen-avx2.S        |  2 +-
 sysdeps/x86_64/multiarch/strlen-sse2.S        |  2 +-
 sysdeps/x86_64/multiarch/strlen.c             |  4 +-
 sysdeps/x86_64/multiarch/strncat-c.c          |  2 +-
 sysdeps/x86_64/multiarch/strncat.c            |  6 +-
 sysdeps/x86_64/multiarch/strncmp-sse2.S       |  2 +-
 sysdeps/x86_64/multiarch/strncmp.c            |  4 +-
 sysdeps/x86_64/multiarch/strncpy.c            |  4 +-
 sysdeps/x86_64/multiarch/strnlen-sse2.S       |  2 +-
 sysdeps/x86_64/multiarch/strnlen.c            | 20 +++++-
 sysdeps/x86_64/multiarch/strrchr-avx2.S       |  2 +-
 sysdeps/x86_64/multiarch/strrchr-sse2.S       |  2 +-
 sysdeps/x86_64/multiarch/strrchr.c            |  4 +-
 sysdeps/x86_64/multiarch/wmemset.c            | 10 ++-
 sysdeps/x86_64/strcat.S                       |  2 +
 sysdeps/x86_64/strncat.c                      |  9 +++
 95 files changed, 717 insertions(+), 166 deletions(-)
 create mode 100644 cpu-rt-c/Makefile
 create mode 100644 cpu-rt-c/cpu-rt-misc.c
 create mode 100644 cpu-rt-c/cpu-rt-support.h
 create mode 100644 cpu-rt-c/cpu-rt-tunables.c
 create mode 100644 cpu-rt-c/dl-tunables.h
 create mode 100644 sysdeps/x86/cpu-rt-misc.c
 create mode 100644 sysdeps/x86/cpu-rt-support.h
 create mode 100644 sysdeps/x86_64/strncat.c

-- 
2.17.1

Comments

Florian Weimer June 13, 2018, 6:50 a.m. | #1
On 06/13/2018 12:19 AM, H.J. Lu wrote:
> The current glibc has memory and string functions highly optimized for

> the current processors on the market.  But it takes years for released

> glibc to be installed on end-users machines.  In 2018, many machines

> with the latest Intel processors are still running glibc 2.17, which

> was released in February, 2013.

> 

> This patch set introduces the CPU run-time library for C, libcpu-rt-c.

> libcpu-rt-c contains a subset of the C library with the optimized

> functions.  The resulting libcpu-rt-c.so is binary compatible with

> older versions of libc.so so that libcpu-rt-c.so can be used with

> LD_PRELOAD or linked directly with applications.  For some workloads,

> LD_PRELOAD=libcpu-rt-c.so has shown to improve performance by as much

> as 20% on Skylake machine.


What do you gain from adding this to glibc?

Old systems will not have sufficiently new compilers and linkers, and 
they will not be able to directly build glibc and this new library.

It seems that this library does not define its own ABI, so it does not 
have to be tied to the glibc release schedule, but putting it into the 
source tree implies such alignment.

I don't doubt that this can be useful (but I doubt it will completely 
stop string function backports), but building and shipping this library 
as part of glibc seems detrimental to its goals.

Thanks,
Florian
H.J. Lu June 13, 2018, 10:13 a.m. | #2
On Tue, Jun 12, 2018 at 11:50 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/13/2018 12:19 AM, H.J. Lu wrote:

>>

>> The current glibc has memory and string functions highly optimized for

>> the current processors on the market.  But it takes years for released

>> glibc to be installed on end-users machines.  In 2018, many machines

>> with the latest Intel processors are still running glibc 2.17, which

>> was released in February, 2013.

>>

>> This patch set introduces the CPU run-time library for C, libcpu-rt-c.

>> libcpu-rt-c contains a subset of the C library with the optimized

>> functions.  The resulting libcpu-rt-c.so is binary compatible with

>> older versions of libc.so so that libcpu-rt-c.so can be used with

>> LD_PRELOAD or linked directly with applications.  For some workloads,

>> LD_PRELOAD=libcpu-rt-c.so has shown to improve performance by as much

>> as 20% on Skylake machine.

>

>

> What do you gain from adding this to glibc?


It uses what we have in glibc.   There is no need for separate implementations.

> Old systems will not have sufficiently new compilers and linkers, and they

> will not be able to directly build glibc and this new library.


libcpu-rt-c can be built on any systems with required GCC and binutils.
The resulting libcpu-rt-c.so is binary compatible with ALL versions of
x86-64 glibcs.

> It seems that this library does not define its own ABI, so it does not have

> to be tied to the glibc release schedule, but putting it into the source

> tree implies such alignment.


I don't expect there will be formal releases of libcpu-rt-c.  On the other
hand, libcpu-rt-c is ready to use at any time. Initially, it has to be built
as the part of glibc.  Depending on its feedbacks, it may be changed
to just build libcpu-rt-c without building the rest of glibc.

> I don't doubt that this can be useful (but I doubt it will completely stop

> string function backports), but building and shipping this library as part

> of glibc seems detrimental to its goals.


I don't expect it will be shipped as the part of glibc.  But when people
ask for better string/memory functions on Skylake running RHEL 7, I
can point them to libcpu-rt-c or I can build one for them on Fedora 28.

-- 
H.J.
Adhemerval Zanella June 13, 2018, 12:03 p.m. | #3
On 13/06/2018 07:13, H.J. Lu wrote:
> On Tue, Jun 12, 2018 at 11:50 PM, Florian Weimer <fweimer@redhat.com> wrote:

>> On 06/13/2018 12:19 AM, H.J. Lu wrote:

>>>

>>> The current glibc has memory and string functions highly optimized for

>>> the current processors on the market.  But it takes years for released

>>> glibc to be installed on end-users machines.  In 2018, many machines

>>> with the latest Intel processors are still running glibc 2.17, which

>>> was released in February, 2013.

>>>

>>> This patch set introduces the CPU run-time library for C, libcpu-rt-c.

>>> libcpu-rt-c contains a subset of the C library with the optimized

>>> functions.  The resulting libcpu-rt-c.so is binary compatible with

>>> older versions of libc.so so that libcpu-rt-c.so can be used with

>>> LD_PRELOAD or linked directly with applications.  For some workloads,

>>> LD_PRELOAD=libcpu-rt-c.so has shown to improve performance by as much

>>> as 20% on Skylake machine.

>>

>>

>> What do you gain from adding this to glibc?

> 

> It uses what we have in glibc.   There is no need for separate implementations.


At least for ARM we have cortex-strings [1] with a more permissive licensing.
We usually try to update this repository first with arm related string
optimizations and sync later with other projects (glibc, android, etc.).

It does not have ifunc support, since so far it was not required (since it
also did not had any update for different chips like falkor or thunder), but
I can't see why it can't add support for runtime selection.  

While I see the advantages of adding on glibc to avoid the double effort,
I really think this kind of framework is better served as an external project.
It adds even more complexity (IS_IN (libcpu_rt_c)) on current GLIBC ifunc code
(which is somewhat convoluted) and it is not clear how easy it would be to 
distribute it on a usual way (a distribution would need to create another package
with downloads a different glibc release than system to build and extract the 
library itself).

[1] https://git.linaro.org/toolchain/cortex-strings.git

> 

>> Old systems will not have sufficiently new compilers and linkers, and they

>> will not be able to directly build glibc and this new library.

> 

> libcpu-rt-c can be built on any systems with required GCC and binutils.

> The resulting libcpu-rt-c.so is binary compatible with ALL versions of

> x86-64 glibcs.

> 

>> It seems that this library does not define its own ABI, so it does not have

>> to be tied to the glibc release schedule, but putting it into the source

>> tree implies such alignment.

> 

> I don't expect there will be formal releases of libcpu-rt-c.  On the other

> hand, libcpu-rt-c is ready to use at any time. Initially, it has to be built

> as the part of glibc.  Depending on its feedbacks, it may be changed

> to just build libcpu-rt-c without building the rest of glibc.

> 

>> I don't doubt that this can be useful (but I doubt it will completely stop

>> string function backports), but building and shipping this library as part

>> of glibc seems detrimental to its goals.

> 

> I don't expect it will be shipped as the part of glibc.  But when people

> ask for better string/memory functions on Skylake running RHEL 7, I

> can point them to libcpu-rt-c or I can build one for them on Fedora 28.

> 


With the expectation on non formal releases, no ready useful gain in a default
installation (it only makes sense if you download/build a more recent glibc
version than system onde) and the idea of ready availability it really indicates
me to push this an external project.
H.J. Lu June 13, 2018, 12:31 p.m. | #4
On Wed, Jun 13, 2018 at 5:03 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>

>

> On 13/06/2018 07:13, H.J. Lu wrote:

>> On Tue, Jun 12, 2018 at 11:50 PM, Florian Weimer <fweimer@redhat.com> wrote:

>>> On 06/13/2018 12:19 AM, H.J. Lu wrote:

>>>>

>>>> The current glibc has memory and string functions highly optimized for

>>>> the current processors on the market.  But it takes years for released

>>>> glibc to be installed on end-users machines.  In 2018, many machines

>>>> with the latest Intel processors are still running glibc 2.17, which

>>>> was released in February, 2013.

>>>>

>>>> This patch set introduces the CPU run-time library for C, libcpu-rt-c.

>>>> libcpu-rt-c contains a subset of the C library with the optimized

>>>> functions.  The resulting libcpu-rt-c.so is binary compatible with

>>>> older versions of libc.so so that libcpu-rt-c.so can be used with

>>>> LD_PRELOAD or linked directly with applications.  For some workloads,

>>>> LD_PRELOAD=libcpu-rt-c.so has shown to improve performance by as much

>>>> as 20% on Skylake machine.

>>>

>>>

>>> What do you gain from adding this to glibc?

>>

>> It uses what we have in glibc.   There is no need for separate implementations.

>

> At least for ARM we have cortex-strings [1] with a more permissive licensing.

> We usually try to update this repository first with arm related string

> optimizations and sync later with other projects (glibc, android, etc.).

>

> It does not have ifunc support, since so far it was not required (since it

> also did not had any update for different chips like falkor or thunder), but

> I can't see why it can't add support for runtime selection.

>

> While I see the advantages of adding on glibc to avoid the double effort,

> I really think this kind of framework is better served as an external project.

> It adds even more complexity (IS_IN (libcpu_rt_c)) on current GLIBC ifunc code


The framework for IFUNC and tunables are in glibc.  The optimized functions
are in glibc.  I can extract the relevant parts from glibc to create a separate
git repo.  Then I need to decide if I should fork or keep updating it
from glibc.
It is a duplicated effort for no good values.

> (which is somewhat convoluted) and it is not clear how easy it would be to

> distribute it on a usual way (a distribution would need to create another package

> with downloads a different glibc release than system to build and extract the

> library itself).

>

> [1] https://git.linaro.org/toolchain/cortex-strings.git

>

>>

>>> Old systems will not have sufficiently new compilers and linkers, and they

>>> will not be able to directly build glibc and this new library.

>>

>> libcpu-rt-c can be built on any systems with required GCC and binutils.

>> The resulting libcpu-rt-c.so is binary compatible with ALL versions of

>> x86-64 glibcs.

>>

>>> It seems that this library does not define its own ABI, so it does not have

>>> to be tied to the glibc release schedule, but putting it into the source

>>> tree implies such alignment.

>>

>> I don't expect there will be formal releases of libcpu-rt-c.  On the other

>> hand, libcpu-rt-c is ready to use at any time. Initially, it has to be built

>> as the part of glibc.  Depending on its feedbacks, it may be changed

>> to just build libcpu-rt-c without building the rest of glibc.

>>

>>> I don't doubt that this can be useful (but I doubt it will completely stop

>>> string function backports), but building and shipping this library as part

>>> of glibc seems detrimental to its goals.

>>

>> I don't expect it will be shipped as the part of glibc.  But when people

>> ask for better string/memory functions on Skylake running RHEL 7, I

>> can point them to libcpu-rt-c or I can build one for them on Fedora 28.

>>

>

> With the expectation on non formal releases, no ready useful gain in a default

> installation (it only makes sense if you download/build a more recent glibc

> version than system onde) and the idea of ready availability it really indicates

> me to push this an external project.


I have no problem to maintain it as a branch on github and point people
to it when asked.

-- 
H.J.
Florian Weimer June 13, 2018, 1:25 p.m. | #5
On 06/13/2018 12:13 PM, H.J. Lu wrote:

> libcpu-rt-c can be built on any systems with required GCC and binutils.

> The resulting libcpu-rt-c.so is binary compatible with ALL versions of

> x86-64 glibcs.


Yes, but that doesn't work for every distribution.

>> I don't doubt that this can be useful (but I doubt it will completely stop

>> string function backports), but building and shipping this library as part

>> of glibc seems detrimental to its goals.

> 

> I don't expect it will be shipped as the part of glibc.  But when people

> ask for better string/memory functions on Skylake running RHEL 7, I

> can point them to libcpu-rt-c or I can build one for them on Fedora 28.


We cannot produce supportable builds on Fedora 28.  Supported builds 
must come out of the build system.  It so happens that we have Developer 
Toolset (DTS) which matches upstream version requirements, and that is 
available in certain product build environments, so we could produce 
supportable builds with this hybrid approach.  But to give a different 
example: I don't think Debian has this capability, particularly when it 
comes to binutils.

All that wouldn't be a problem if the changes are fairly isolated and 
there wouldn't be tons of conditionals in generic code, but the current 
approach this is implemented looks like quite a bit of an additional 
burden, for each architecture that chooses to participate.

It would be another thing to build a static PIC string library from 
sources and link that both into libc.so and this new support library. 
Then the build process would at least be aligned.  But I don't think our 
current build system is up for that.

Thanks,
Florian
H.J. Lu June 13, 2018, 1:41 p.m. | #6
On Wed, Jun 13, 2018 at 6:25 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/13/2018 12:13 PM, H.J. Lu wrote:

>

>> libcpu-rt-c can be built on any systems with required GCC and binutils.

>> The resulting libcpu-rt-c.so is binary compatible with ALL versions of

>> x86-64 glibcs.

>

>

> Yes, but that doesn't work for every distribution.


True.

>>> I don't doubt that this can be useful (but I doubt it will completely

>>> stop

>>> string function backports), but building and shipping this library as

>>> part

>>> of glibc seems detrimental to its goals.

>>

>>

>> I don't expect it will be shipped as the part of glibc.  But when people

>> ask for better string/memory functions on Skylake running RHEL 7, I

>> can point them to libcpu-rt-c or I can build one for them on Fedora 28.

>

>

> We cannot produce supportable builds on Fedora 28.  Supported builds must

> come out of the build system.  It so happens that we have Developer Toolset

> (DTS) which matches upstream version requirements, and that is available in

> certain product build environments, so we could produce supportable builds

> with this hybrid approach.  But to give a different example: I don't think

> Debian has this capability, particularly when it comes to binutils.


Each user needs to evaluate pros and cons.

> All that wouldn't be a problem if the changes are fairly isolated and there

> wouldn't be tons of conditionals in generic code, but the current approach


Excluding the cpu-rt-c directory and x86 directories, my approach added
one IS_IN (libcpu_rt_c) check to elf.  It can be hardly called "tons".

> this is implemented looks like quite a bit of an additional burden, for each

> architecture that chooses to participate.


That is true.  My approach avoids duplicated efforts.

> It would be another thing to build a static PIC string library from sources

> and link that both into libc.so and this new support library. Then the build

> process would at least be aligned.  But I don't think our current build

> system is up for that.

>


libcpu-rt-c targets systems with older glibcs.   There is no need for it on
systems where glibc is up to date.

-- 
H.J.
Siddhesh Poyarekar June 13, 2018, 2:19 p.m. | #7
On 06/13/2018 06:55 PM, Florian Weimer wrote:
> It would be another thing to build a static PIC string library from 

> sources and link that both into libc.so and this new support library. 

> Then the build process would at least be aligned.  But I don't think our 

> current build system is up for that.


Why so?  Are there dependencies that avoid splitting out a 
libstring_pic.a that builds only the string routines and then links that 
into everything that needs it, i.e. libc.so, libc.a, libc-rt.so, etc.?

Ideally I'd like to see something like this happen but it is a crazy 
long term project:

1. Put sysdeps for specific routines into their respective directories. 
That is, elf/sysdeps, math/sysdeps, string/sysdeps, etc.

2. Support building all of these as separate lib*_pic.a that then gets 
integrated at the top

3. Add a new target in string/Makefile that builds a libstring-rt.so 
from its libstring_pic.a

4. Pull string out into a separate project and make it a submodule of 
the glibc repo.

This could even get used to replace libio in future, where 
libio-3.0_pic.a gets linked in by default and libio-2.x_pic.a is built 
into a libio_compat.so that can then be preloaded for programs that need it.

Siddhesh
Florian Weimer June 18, 2018, 1:41 p.m. | #8
On 06/13/2018 04:19 PM, Siddhesh Poyarekar wrote:
> On 06/13/2018 06:55 PM, Florian Weimer wrote:

>> It would be another thing to build a static PIC string library from 

>> sources and link that both into libc.so and this new support library. 

>> Then the build process would at least be aligned.  But I don't think 

>> our current build system is up for that.

> 

> Why so?  Are there dependencies that avoid splitting out a 

> libstring_pic.a that builds only the string routines and then links that 

> into everything that needs it, i.e. libc.so, libc.a, libc-rt.so, etc.?


I don't think there are any actual obstacles in the code, or at least 
not that many.  It's just that it would be difficult to express this in 
the current build system, in a concise manner.

> Ideally I'd like to see something like this happen but it is a crazy 

> long term project:

> 

> 1. Put sysdeps for specific routines into their respective directories. 

> That is, elf/sysdeps, math/sysdeps, string/sysdeps, etc.

> 

> 2. Support building all of these as separate lib*_pic.a that then gets 

> integrated at the top

> 

> 3. Add a new target in string/Makefile that builds a libstring-rt.so 

> from its libstring_pic.a

> 

> 4. Pull string out into a separate project and make it a submodule of 

> the glibc repo.

> 

> This could even get used to replace libio in future, where 

> libio-3.0_pic.a gets linked in by default and libio-2.x_pic.a is built 

> into a libio_compat.so that can then be preloaded for programs that need 

> it.


Not sure if this is the right approach.  Another way of doing this would 
involve linking almost all of libc_pic.a (basically everything without 
initializers), while restricting symbol visibility to a subset of the 
symbols.  ld should discard all objects which define only unexported and 
otherwise unused symbols.

Thanks,
Florian
Carlos O'Donell June 18, 2018, 5:09 p.m. | #9
On 06/12/2018 06:19 PM, H.J. Lu wrote:
> The current glibc has memory and string functions highly optimized for

> the current processors on the market.  But it takes years for released

> glibc to be installed on end-users machines.  In 2018, many machines

> with the latest Intel processors are still running glibc 2.17, which

> was released in February, 2013.

> 

> This patch set introduces the CPU run-time library for C, libcpu-rt-c.

> libcpu-rt-c contains a subset of the C library with the optimized

> functions.  The resulting libcpu-rt-c.so is binary compatible with

> older versions of libc.so so that libcpu-rt-c.so can be used with

> LD_PRELOAD or linked directly with applications.  For some workloads,

> LD_PRELOAD=libcpu-rt-c.so has shown to improve performance by as much

> as 20% on Skylake machine.

>


How do you envision downstream distributions using this option?

Who does the building?

Who deploys it?

Who supports it? For how long?

You list one use so far.

Use Case 1:

- User installs RHEL7 on production Skylake systems.
- User complains to Intel about poor peformance of RHEL 7 on Skylake.
- H.J. builds libcpu-rt.so on Fedora 28 and delivers to customer.
- Customer deploys unsupported build of libcpu-rt.so on
  production systems to get performance gain.

My worry here is that no customer will want to deploy unsupported
(I don't expect you to be signing up to support customers directly)
binaries on production systems.

So in this case you need to convince distributions to do the following:

* Build and ship libcpu-rt.so on a modern version of the distribution.
* Change established practice in distributions to allow newer distribution
  packages of libcpu-rt.so to be installed in an older distribution
  (often difficult because of RPM feature dependencies, scriptlet dependencies,
  and other such dependencies).

  or

  Enhance the distribution to ship newer alternative toolchains to allow
  libcpu-rt.so to be built from alternative newer glibc sources, and then
  support that result for customers to LD_PRELOAD.

It's entirely possible that the newer alternative toolchain is a viable
option for things like RHEL which have Developer Toolset (as Florian mentions),
but as far as I know it's the only distribution doing this in a fully supportable
way.

However, for Debian they would have to look at letting an unstable package build
of libcpu-rt.so be a supported install in stable. Which would be something new.

Either way this is similar in some sense to Florian's separate-libm work to allow
libm to be built distinctly from glibc. Note that I say "similar" because it's
not the same, but tackles a similar problem of replacing parts of glibc with
newer parts which can be preloaded for applications to use without riks of
significant ABI/API deviation. The string functions, and the math functions, both
have very well defined interfaces and are candidates for just this work.

This kind of approach has actually worked well for a number of internal projects
we've worked on in the past, and has become my #1 go-to strategy for this kind
of stuff... but... and here is the "but", we never asked upstream to do it because
it's all support burden with no reward.

In closing, I think you need to prototype this in a distinct project, see how
your customers use it, and then come back with a proposal derived from real
world use cases.

Cheers,
Carlos.
Siddhesh Poyarekar June 19, 2018, 8:19 a.m. | #10
On 06/18/2018 07:11 PM, Florian Weimer wrote:
>> Why so?  Are there dependencies that avoid splitting out a 

>> libstring_pic.a that builds only the string routines and then links 

>> that into everything that needs it, i.e. libc.so, libc.a, libc-rt.so, 

>> etc.?

> 

> I don't think there are any actual obstacles in the code, or at least 

> not that many.  It's just that it would be difficult to express this in 

> the current build system, in a concise manner.


That's generally true for anything we do with our build system; it is a 
complex web of insanity.  That is also why I suggested a project to 
rework it so that it is more modular and has less tentacles across 
subdirectories like it currently does.

I am hand-waving of course because it is quite a big project.

>> This could even get used to replace libio in future, where 

>> libio-3.0_pic.a gets linked in by default and libio-2.x_pic.a is built 

>> into a libio_compat.so that can then be preloaded for programs that 

>> need it.

> 

> Not sure if this is the right approach.  Another way of doing this would 

> involve linking almost all of libc_pic.a (basically everything without 

> initializers), while restricting symbol visibility to a subset of the 

> symbols.  ld should discard all objects which define only unexported and 

> otherwise unused symbols.


Sure, that doesn't sound not too different from what I'm suggesting, 
which is to drop everything deprecated from the main library and have it 
loaded on demand.

Siddhesh