[1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex

Message ID 20210913230506.546749-1-goldstein.w.n@gmail.com
State New
Headers show
Series
  • [1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
Related show

Commit Message

Florian Weimer via Libc-alpha Sept. 13, 2021, 11:05 p.m.
No bug. This commit adds support for an optimized bcmp implementation.
Support is for sse2, sse4_1, avx2, and evex.

All string tests passing and build succeeding.
---
This commit is essentially because compilers will optimize the
idiomatic use of memcmp return as a boolean:
    
https://godbolt.org/z/Tbhefh6cv
    
so it seems reasonable to have an optimized bcmp implementation as we
can get ~0-25% improvement (generally larger improvement for the
smaller size ranges which ultimately are the most important to opimize
for).
    
Numbers for new implementations attached in reply.

Tests where run on the following CPUs:

Tigerlake: https://ark.intel.com/content/www/us/en/ark/products/208921/intel-core-i7-1165g7-processor-12m-cache-up-to-4-70-ghz-with-ipu.html
Skylake: https://ark.intel.com/content/www/us/en/ark/products/149091/intel-core-i7-8565u-processor-8m-cache-up-to-4-60-ghz.html

Some notes on the numbers.

There are some regressions in the sse2/sse4_1 versions. I didn't
optimize these versions beyond defining out obviously irrelivant code
for bcmp. My intuition is that the slowdowns are alignment related. I
am not sure if these issues would translate to architectures that
would actually use sse2/sse4_1.

I add the sse2/sse4_1 implementations mostly so that the ifunc would
have something to fallback on. With the lackluster numbers it may not
be worth it, especially factoring in code size costs. Thoughts?

The Tigerlake and Skylake versions are basically universal
improvements for evex and avx2. I opted to align bcmp to 64 byte as
opposed to 16. The rational is that to optimize for frontend behavior
on either machine, only 16 byte gurantees is not enough. I think in
any function where throughput (which I think bcmp can be) might be
important good frontend behavior is important.

    
 benchtests/Makefile                        |  2 +-
 benchtests/bench-bcmp.c                    | 20 ++++++++
 benchtests/bench-memcmp.c                  |  4 +-
 string/Makefile                            |  4 +-
 string/test-bcmp.c                         | 21 +++++++++
 string/test-memcmp.c                       | 27 +++++++----
 sysdeps/x86_64/memcmp.S                    |  2 -
 sysdeps/x86_64/multiarch/Makefile          |  3 ++
 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   | 12 +++++
 sysdeps/x86_64/multiarch/bcmp-avx2.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-evex.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-sse2.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-sse4.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp.c            | 35 ++++++++++++++
 sysdeps/x86_64/multiarch/ifunc-bcmp.h      | 53 ++++++++++++++++++++++
 sysdeps/x86_64/multiarch/ifunc-impl-list.c | 23 ++++++++++
 sysdeps/x86_64/multiarch/memcmp-sse2.S     |  4 +-
 sysdeps/x86_64/multiarch/memcmp.c          |  2 -
 18 files changed, 286 insertions(+), 18 deletions(-)
 create mode 100644 benchtests/bench-bcmp.c
 create mode 100644 string/test-bcmp.c
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse2.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse4.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp.c
 create mode 100644 sysdeps/x86_64/multiarch/ifunc-bcmp.h

-- 
2.25.1

Comments

Florian Weimer via Libc-alpha Sept. 13, 2021, 11:22 p.m. | #1
On Mon, Sep 13, 2021 at 6:21 PM Noah Goldstein <goldstein.w.n@gmail.com>
wrote:

> No bug. This commit adds support for an optimized bcmp implementation.

> Support is for sse2, sse4_1, avx2, and evex.

>

> All string tests passing and build succeeding.

> ---

> This commit is essentially because compilers will optimize the

> idiomatic use of memcmp return as a boolean:

>

> https://godbolt.org/z/Tbhefh6cv

>

> so it seems reasonable to have an optimized bcmp implementation as we

> can get ~0-25% improvement (generally larger improvement for the

> smaller size ranges which ultimately are the most important to opimize

> for).

>

> Numbers for new implementations attached in reply.

>


Numbers in this email.


>

> Tests where run on the following CPUs:

>

> Tigerlake:

> https://ark.intel.com/content/www/us/en/ark/products/208921/intel-core-i7-1165g7-processor-12m-cache-up-to-4-70-ghz-with-ipu.html

> Skylake:

> https://ark.intel.com/content/www/us/en/ark/products/149091/intel-core-i7-8565u-processor-8m-cache-up-to-4-60-ghz.html

>

> Some notes on the numbers.

>

> There are some regressions in the sse2/sse4_1 versions. I didn't

> optimize these versions beyond defining out obviously irrelivant code

> for bcmp. My intuition is that the slowdowns are alignment related. I

> am not sure if these issues would translate to architectures that

> would actually use sse2/sse4_1.

>

> I add the sse2/sse4_1 implementations mostly so that the ifunc would

> have something to fallback on. With the lackluster numbers it may not

> be worth it, especially factoring in code size costs. Thoughts?

>

> The Tigerlake and Skylake versions are basically universal

> improvements for evex and avx2. I opted to align bcmp to 64 byte as

> opposed to 16. The rational is that to optimize for frontend behavior

> on either machine, only 16 byte gurantees is not enough. I think in

> any function where throughput (which I think bcmp can be) might be

> important good frontend behavior is important.

>

>

>  benchtests/Makefile                        |  2 +-

>  benchtests/bench-bcmp.c                    | 20 ++++++++

>  benchtests/bench-memcmp.c                  |  4 +-

>  string/Makefile                            |  4 +-

>  string/test-bcmp.c                         | 21 +++++++++

>  string/test-memcmp.c                       | 27 +++++++----

>  sysdeps/x86_64/memcmp.S                    |  2 -

>  sysdeps/x86_64/multiarch/Makefile          |  3 ++

>  sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   | 12 +++++

>  sysdeps/x86_64/multiarch/bcmp-avx2.S       | 23 ++++++++++

>  sysdeps/x86_64/multiarch/bcmp-evex.S       | 23 ++++++++++

>  sysdeps/x86_64/multiarch/bcmp-sse2.S       | 23 ++++++++++

>  sysdeps/x86_64/multiarch/bcmp-sse4.S       | 23 ++++++++++

>  sysdeps/x86_64/multiarch/bcmp.c            | 35 ++++++++++++++

>  sysdeps/x86_64/multiarch/ifunc-bcmp.h      | 53 ++++++++++++++++++++++

>  sysdeps/x86_64/multiarch/ifunc-impl-list.c | 23 ++++++++++

>  sysdeps/x86_64/multiarch/memcmp-sse2.S     |  4 +-

>  sysdeps/x86_64/multiarch/memcmp.c          |  2 -

>  18 files changed, 286 insertions(+), 18 deletions(-)

>  create mode 100644 benchtests/bench-bcmp.c

>  create mode 100644 string/test-bcmp.c

>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S

>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2.S

>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-evex.S

>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse2.S

>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse4.S

>  create mode 100644 sysdeps/x86_64/multiarch/bcmp.c

>  create mode 100644 sysdeps/x86_64/multiarch/ifunc-bcmp.h

>

> diff --git a/benchtests/Makefile b/benchtests/Makefile

> index 1530939a8c..5fc495eb57 100644

> --- a/benchtests/Makefile

> +++ b/benchtests/Makefile

> @@ -47,7 +47,7 @@ bench := $(foreach B,$(filter bench-%,${BENCHSET}),

> ${${B}})

>  endif

>

>  # String function benchmarks.

> -string-benchset := memccpy memchr memcmp memcpy memmem memmove \

> +string-benchset := bcmp memccpy memchr memcmp memcpy memmem memmove \

>                    mempcpy memset rawmemchr stpcpy stpncpy strcasecmp

> strcasestr \

>                    strcat strchr strchrnul strcmp strcpy strcspn strlen \

>                    strncasecmp strncat strncmp strncpy strnlen strpbrk

> strrchr \

> diff --git a/benchtests/bench-bcmp.c b/benchtests/bench-bcmp.c

> new file mode 100644

> index 0000000000..1023639787

> --- /dev/null

> +++ b/benchtests/bench-bcmp.c

> @@ -0,0 +1,20 @@

> +/* Measure bcmp functions.

> +   Copyright (C) 2015-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +#define TEST_BCMP 1

> +#include "bench-memcmp.c"

> diff --git a/benchtests/bench-memcmp.c b/benchtests/bench-memcmp.c

> index 744c7ec5ba..4d5f8fb766 100644

> --- a/benchtests/bench-memcmp.c

> +++ b/benchtests/bench-memcmp.c

> @@ -17,7 +17,9 @@

>     <https://www.gnu.org/licenses/>.  */

>

>  #define TEST_MAIN

> -#ifdef WIDE

> +#ifdef TEST_BCMP

> +# define TEST_NAME "bcmp"

> +#elif defined WIDE

>  # define TEST_NAME "wmemcmp"

>  #else

>  # define TEST_NAME "memcmp"

> diff --git a/string/Makefile b/string/Makefile

> index f0fce2a0b8..f1f67ee157 100644

> --- a/string/Makefile

> +++ b/string/Makefile

> @@ -35,7 +35,7 @@ routines      := strcat strchr strcmp strcoll strcpy

> strcspn          \

>                    strncat strncmp strncpy                              \

>                    strrchr strpbrk strsignal strspn strstr strtok       \

>                    strtok_r strxfrm memchr memcmp memmove memset        \

> -                  mempcpy bcopy bzero ffs ffsll stpcpy stpncpy         \

> +                  mempcpy bcmp bcopy bzero ffs ffsll stpcpy stpncpy

>       \

>                    strcasecmp strncase strcasecmp_l strncase_l          \

>                    memccpy memcpy wordcopy strsep strcasestr            \

>                    swab strfry memfrob memmem rawmemchr strchrnul       \

> @@ -52,7 +52,7 @@ strop-tests   := memchr memcmp memcpy memmove mempcpy

> memset memccpy  \

>                    stpcpy stpncpy strcat strchr strcmp strcpy strcspn   \

>                    strlen strncmp strncpy strpbrk strrchr strspn memmem \

>                    strstr strcasestr strnlen strcasecmp strncasecmp     \

> -                  strncat rawmemchr strchrnul bcopy bzero memrchr      \

> +                  strncat rawmemchr strchrnul bcmp bcopy bzero memrchr \

>                    explicit_bzero

>  tests          := tester inl-tester noinl-tester testcopy test-ffs     \

>                    tst-strlen stratcliff tst-svc tst-inlcall            \

> diff --git a/string/test-bcmp.c b/string/test-bcmp.c

> new file mode 100644

> index 0000000000..6d19a4a87c

> --- /dev/null

> +++ b/string/test-bcmp.c

> @@ -0,0 +1,21 @@

> +/* Test and measure bcmp functions.

> +   Copyright (C) 2012-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +#define BAD_RESULT(result, expec) ((!(result)) != (!(expec)))

> +#define TEST_BCMP 1

> +#include "test-memcmp.c"

> diff --git a/string/test-memcmp.c b/string/test-memcmp.c

> index 6ddbc05d2f..c630e6799d 100644

> --- a/string/test-memcmp.c

> +++ b/string/test-memcmp.c

> @@ -17,11 +17,14 @@

>     <https://www.gnu.org/licenses/>.  */

>

>  #define TEST_MAIN

> -#ifdef WIDE

> +#ifdef TEST_BCMP

> +# define TEST_NAME "bcmp"

> +#elif defined WIDE

>  # define TEST_NAME "wmemcmp"

>  #else

>  # define TEST_NAME "memcmp"

>  #endif

> +

>  #include "test-string.h"

>  #ifdef WIDE

>  # include <inttypes.h>

> @@ -35,6 +38,7 @@

>  # define CHARBYTES 4

>  # define CHAR__MIN WCHAR_MIN

>  # define CHAR__MAX WCHAR_MAX

> +

>  int

>  simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)

>  {

> @@ -48,8 +52,11 @@ simple_wmemcmp (const wchar_t *s1, const wchar_t *s2,

> size_t n)

>  }

>  #else

>  # include <limits.h>

> -

> -# define MEMCMP memcmp

> +# ifdef TEST_BCMP

> +#  define MEMCMP bcmp

> +# else

> +#  define MEMCMP memcmp

> +# endif

>  # define MEMCPY memcpy

>  # define SIMPLE_MEMCMP simple_memcmp

>  # define CHAR char

> @@ -69,6 +76,12 @@ simple_memcmp (const char *s1, const char *s2, size_t n)

>  }

>  #endif

>

> +# ifndef BAD_RESULT

> +#  define BAD_RESULT(result, expec)                                     \

> +    (((result) == 0 && (expec)) || ((result) < 0 && (expec) >= 0) ||    \

> +     ((result) > 0 && (expec) <= 0))

> +#  endif

> +

>  typedef int (*proto_t) (const CHAR *, const CHAR *, size_t);

>

>  IMPL (SIMPLE_MEMCMP, 0)

> @@ -79,9 +92,7 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR

> *s2, size_t len,

>               int exp_result)

>  {

>    int result = CALL (impl, s1, s2, len);

> -  if ((exp_result == 0 && result != 0)

> -      || (exp_result < 0 && result >= 0)

> -      || (exp_result > 0 && result <= 0))

> +  if (BAD_RESULT(result, exp_result))

>      {

>        error (0, 0, "Wrong result in function %s %d %d", impl->name,

>              result, exp_result);

> @@ -186,9 +197,7 @@ do_random_tests (void)

>         {

>           r = CALL (impl, (CHAR *) p1 + align1, (const CHAR *) p2 + align2,

>                     len);

> -         if ((r == 0 && result)

> -             || (r < 0 && result >= 0)

> -             || (r > 0 && result <= 0))

> +         if (BAD_RESULT(r, result))

>             {

>               error (0, 0, "Iteration %zd - wrong result in function %s

> (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",

>                      n, impl->name, align1 * CHARBYTES & 63,  align2 *

> CHARBYTES & 63, len, pos, r, result, p1, p2);

> diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S

> index 870e15c5a0..dfd0269db2 100644

> --- a/sysdeps/x86_64/memcmp.S

> +++ b/sysdeps/x86_64/memcmp.S

> @@ -356,6 +356,4 @@ L(ATR32res):

>         .p2align 4,, 4

>  END(memcmp)

>

> -#undef bcmp

> -weak_alias (memcmp, bcmp)

>  libc_hidden_builtin_def (memcmp)

> diff --git a/sysdeps/x86_64/multiarch/Makefile

> b/sysdeps/x86_64/multiarch/Makefile

> index 26be40959c..9dd0d8c3ff 100644

> --- a/sysdeps/x86_64/multiarch/Makefile

> +++ b/sysdeps/x86_64/multiarch/Makefile

> @@ -1,6 +1,7 @@

>  ifeq ($(subdir),string)

>

>  sysdep_routines += strncat-c stpncpy-c strncpy-c \

> +                  bcmp-sse2 bcmp-sse4 bcmp-avx2 \

>                    strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3  \

>                    strcmp-sse4_2 strcmp-avx2 \

>                    strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \

> @@ -40,6 +41,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \

>                    memset-sse2-unaligned-erms \

>                    memset-avx2-unaligned-erms \

>                    memset-avx512-unaligned-erms \

> +                  bcmp-avx2-rtm \

>                    memchr-avx2-rtm \

>                    memcmp-avx2-movbe-rtm \

>                    memmove-avx-unaligned-erms-rtm \

> @@ -59,6 +61,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \

>                    strncpy-avx2-rtm \

>                    strnlen-avx2-rtm \

>                    strrchr-avx2-rtm \

> +                  bcmp-evex \

>                    memchr-evex \

>                    memcmp-evex-movbe \

>                    memmove-evex-unaligned-erms \

> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S

> b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S

> new file mode 100644

> index 0000000000..d742257e4e

> --- /dev/null

> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S

> @@ -0,0 +1,12 @@

> +#ifndef MEMCMP

> +# define MEMCMP __bcmp_avx2_rtm

> +#endif

> +

> +#define ZERO_UPPER_VEC_REGISTERS_RETURN \

> +  ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST

> +

> +#define VZEROUPPER_RETURN jmp   L(return_vzeroupper)

> +

> +#define SECTION(p) p##.avx.rtm

> +

> +#include "bcmp-avx2.S"

> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S

> b/sysdeps/x86_64/multiarch/bcmp-avx2.S

> new file mode 100644

> index 0000000000..93a9a20b17

> --- /dev/null

> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S

> @@ -0,0 +1,23 @@

> +/* bcmp optimized with AVX2.

> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +#ifndef MEMCMP

> +# define MEMCMP        __bcmp_avx2

> +#endif

> +

> +#include "bcmp-avx2.S"

> diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S

> b/sysdeps/x86_64/multiarch/bcmp-evex.S

> new file mode 100644

> index 0000000000..ade52e8c68

> --- /dev/null

> +++ b/sysdeps/x86_64/multiarch/bcmp-evex.S

> @@ -0,0 +1,23 @@

> +/* bcmp optimized with EVEX.

> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +#ifndef MEMCMP

> +# define MEMCMP        __bcmp_evex

> +#endif

> +

> +#include "memcmp-evex-movbe.S"

> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse2.S

> b/sysdeps/x86_64/multiarch/bcmp-sse2.S

> new file mode 100644

> index 0000000000..b18d570386

> --- /dev/null

> +++ b/sysdeps/x86_64/multiarch/bcmp-sse2.S

> @@ -0,0 +1,23 @@

> +/* bcmp optimized with SSE2

> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +# ifndef memcmp

> +#  define memcmp       __bcmp_sse2

> +# endif

> +# define USE_AS_BCMP   1

> +#include "memcmp-sse2.S"

> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse4.S

> b/sysdeps/x86_64/multiarch/bcmp-sse4.S

> new file mode 100644

> index 0000000000..ed9804053f

> --- /dev/null

> +++ b/sysdeps/x86_64/multiarch/bcmp-sse4.S

> @@ -0,0 +1,23 @@

> +/* bcmp optimized with SSE4.1

> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +# ifndef MEMCMP

> +#  define MEMCMP       __bcmp_sse4_1

> +# endif

> +# define USE_AS_BCMP   1

> +#include "memcmp-sse4.S"

> diff --git a/sysdeps/x86_64/multiarch/bcmp.c

> b/sysdeps/x86_64/multiarch/bcmp.c

> new file mode 100644

> index 0000000000..6e26b73ecc

> --- /dev/null

> +++ b/sysdeps/x86_64/multiarch/bcmp.c

> @@ -0,0 +1,35 @@

> +/* Multiple versions of bcmp.

> +   All versions must be listed in ifunc-impl-list.c.

> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +/* Define multiple versions only for the definition in libc.  */

> +#if IS_IN (libc)

> +# define bcmp __redirect_bcmp

> +# include <string.h>

> +# undef bcmp

> +

> +# define SYMBOL_NAME bcmp

> +# include "ifunc-bcmp.h"

> +

> +libc_ifunc_redirected (__redirect_bcmp, bcmp, IFUNC_SELECTOR ());

> +

> +# ifdef SHARED

> +__hidden_ver1 (bcmp, __GI_bcmp, __redirect_bcmp)

> +  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (bcmp);

> +# endif

> +#endif

> diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h

> b/sysdeps/x86_64/multiarch/ifunc-bcmp.h

> new file mode 100644

> index 0000000000..b0dacd8526

> --- /dev/null

> +++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h

> @@ -0,0 +1,53 @@

> +/* Common definition for bcmp ifunc selections.

> +   All versions must be listed in ifunc-impl-list.c.

> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.

> +   This file is part of the GNU C Library.

> +

> +   The GNU C Library is free software; you can redistribute it and/or

> +   modify it under the terms of the GNU Lesser General Public

> +   License as published by the Free Software Foundation; either

> +   version 2.1 of the License, or (at your option) any later version.

> +

> +   The GNU C Library is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +   Lesser General Public License for more details.

> +

> +   You should have received a copy of the GNU Lesser General Public

> +   License along with the GNU C Library; if not, see

> +   <https://www.gnu.org/licenses/>.  */

> +

> +# include <init-arch.h>

> +

> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;

> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;

> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;

> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;

> +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;

> +

> +static inline void *

> +IFUNC_SELECTOR (void)

> +{

> +  const struct cpu_features* cpu_features = __get_cpu_features ();

> +

> +  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)

> +      && CPU_FEATURE_USABLE_P (cpu_features, BMI2)

> +      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)

> +      && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))

> +    {

> +      if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)

> +         && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))

> +       return OPTIMIZE (evex);

> +

> +      if (CPU_FEATURE_USABLE_P (cpu_features, RTM))

> +       return OPTIMIZE (avx2_rtm);

> +

> +      if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))

> +       return OPTIMIZE (avx2);

> +    }

> +

> +  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))

> +    return OPTIMIZE (sse4_1);

> +

> +  return OPTIMIZE (sse2);

> +}

> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c

> b/sysdeps/x86_64/multiarch/ifunc-impl-list.c

> index 39ab10613b..dd0c393c7d 100644

> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c

> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c

> @@ -38,6 +38,29 @@ __libc_ifunc_impl_list (const char *name, struct

> libc_ifunc_impl *array,

>

>    size_t i = 0;

>

> +  /* Support sysdeps/x86_64/multiarch/bcmp.c.  */

> +  IFUNC_IMPL (i, name, bcmp,

> +             IFUNC_IMPL_ADD (array, i, bcmp,

> +                             (CPU_FEATURE_USABLE (AVX2)

> +                   && CPU_FEATURE_USABLE (MOVBE)

> +                              && CPU_FEATURE_USABLE (BMI2)),

> +                             __bcmp_avx2)

> +             IFUNC_IMPL_ADD (array, i, bcmp,

> +                             (CPU_FEATURE_USABLE (AVX2)

> +                              && CPU_FEATURE_USABLE (BMI2)

> +                   && CPU_FEATURE_USABLE (MOVBE)

> +                              && CPU_FEATURE_USABLE (RTM)),

> +                             __bcmp_avx2_rtm)

> +             IFUNC_IMPL_ADD (array, i, bcmp,

> +                             (CPU_FEATURE_USABLE (AVX512VL)

> +                              && CPU_FEATURE_USABLE (AVX512BW)

> +                   && CPU_FEATURE_USABLE (MOVBE)

> +                              && CPU_FEATURE_USABLE (BMI2)),

> +                             __bcmp_evex)

> +             IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),

> +                             __bcmp_sse4_1)

> +             IFUNC_IMPL_ADD (array, i, bcmp, 1, __bcmp_sse2))

> +

>    /* Support sysdeps/x86_64/multiarch/memchr.c.  */

>    IFUNC_IMPL (i, name, memchr,

>               IFUNC_IMPL_ADD (array, i, memchr,

> diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S

> b/sysdeps/x86_64/multiarch/memcmp-sse2.S

> index b135fa2d40..2a4867ad18 100644

> --- a/sysdeps/x86_64/multiarch/memcmp-sse2.S

> +++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S

> @@ -17,7 +17,9 @@

>     <https://www.gnu.org/licenses/>.  */

>

>  #if IS_IN (libc)

> -# define memcmp __memcmp_sse2

> +# ifndef memcmp

> +#  define memcmp __memcmp_sse2

> +# endif

>

>  # ifdef SHARED

>  #  undef libc_hidden_builtin_def

> diff --git a/sysdeps/x86_64/multiarch/memcmp.c

> b/sysdeps/x86_64/multiarch/memcmp.c

> index fe725f3563..1760e045df 100644

> --- a/sysdeps/x86_64/multiarch/memcmp.c

> +++ b/sysdeps/x86_64/multiarch/memcmp.c

> @@ -27,8 +27,6 @@

>  # include "ifunc-memcmp.h"

>

>  libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ());

> -# undef bcmp

> -weak_alias (memcmp, bcmp)

>

>  # ifdef SHARED

>  __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)

> --

> 2.25.1

>

>
Joseph Myers Sept. 15, 2021, midnight | #2
bcmp is an obsolescent function that no modern programs should be using, 
and it's not in the implementation namespace either so compilers shouldn't 
translate memcmp calls to bcmp.

If you want to define memcmp ABI variants optimized for particular usages, 
I suggest the following:

1. Add reserved-namespace names for such variants to the x86_64 psABI 
document (working with the ABI mailing list).  The 32-bit Arm RTABI 
<https://github.com/ARM-software/abi-aa/blob/main/rtabi32/rtabi32.rst> 
provides a precedent for defining such function variants in a psABI (it 
includes various __aeabi_mem*, though no memcmp variants).

2. Add those names to glibc, as well as teaching compilers to generate 
calls to them (with appropriate conditionals for whether the functions are 
known to be available in the target libc; in GCC, that would be based on 
GCC_GLIBC_VERSION_GTE_IFELSE configure tests for targets using glibc).


As a variant, you could define such names as architecture-independent GNU 
extensions rather than in a psABI, especially if there's nothing 
architecture-specific about the variants you think are useful (e.g. no use 
for having changes to calling conventions / call-clobbered registers for 
the variants).  But what should not be done in any case is tying an 
optimization to an obsolescent non-reserved name - any such optimized 
variants should use only implementation-namespace names.

-- 
Joseph S. Myers
joseph@codesourcery.com
Florian Weimer via Libc-alpha Sept. 15, 2021, 1:37 p.m. | #3
On Tue, Sep 14, 2021, at 8:00 PM, Joseph Myers wrote:
> bcmp is an obsolescent function that no modern programs should be using, 

> and it's not in the implementation namespace either so compilers shouldn't 

> translate memcmp calls to bcmp.


I want to add that glibc has made bcmp an alias for memcmp for many years, which means that Linux- or Hurd-specific programs that are still using bcmp may have come to depend on its return value indicating ordering rather than just equality.  I myself had been under the impression that they were *specified* exactly the same, until this thread prompted me to double-check the specifications.  As such I don't think it's safe for *glibc* to accept patches that optimize bcmp separately from memcmp.

I do rather like the idea of a __gnu_memeq() that compilers could optimize memcmp calls to, when they can prove that the result is used only for its truth value.

zw
Florian Weimer via Libc-alpha Sept. 15, 2021, 2:01 p.m. | #4
* Zack Weinberg via Libc-alpha:

> On Tue, Sep 14, 2021, at 8:00 PM, Joseph Myers wrote:

>> bcmp is an obsolescent function that no modern programs should be using, 

>> and it's not in the implementation namespace either so compilers shouldn't 

>> translate memcmp calls to bcmp.

>

> I want to add that glibc has made bcmp an alias for memcmp for many

> years, which means that Linux- or Hurd-specific programs that are

> still using bcmp may have come to depend on its return value

> indicating ordering rather than just equality.  I myself had been

> under the impression that they were *specified* exactly the same,

> until this thread prompted me to double-check the specifications.  As

> such I don't think it's safe for *glibc* to accept patches that

> optimize bcmp separately from memcmp.


That's a very good point.

> I do rather like the idea of a __gnu_memeq() that compilers could

> optimize memcmp calls to, when they can prove that the result is used

> only for its truth value.


Yes, we should use a name in the implementation namespace because even
if we pick an obvious like memequal, it will probably come back under a
different name from the C committee.

Thanks,
Florian
Florian Weimer via Libc-alpha Sept. 15, 2021, 6:06 p.m. | #5
On Wed, Sep 15, 2021 at 9:02 AM Florian Weimer via Libc-alpha <
libc-alpha@sourceware.org> wrote:

> * Zack Weinberg via Libc-alpha:

>

> > On Tue, Sep 14, 2021, at 8:00 PM, Joseph Myers wrote:

> >> bcmp is an obsolescent function that no modern programs should be

> using,

> >> and it's not in the implementation namespace either so compilers

> shouldn't

> >> translate memcmp calls to bcmp.

> >

> > I want to add that glibc has made bcmp an alias for memcmp for many

> > years, which means that Linux- or Hurd-specific programs that are

> > still using bcmp may have come to depend on its return value

> > indicating ordering rather than just equality.  I myself had been

> > under the impression that they were *specified* exactly the same,

> > until this thread prompted me to double-check the specifications.  As

> > such I don't think it's safe for *glibc* to accept patches that

> > optimize bcmp separately from memcmp.

>

> That's a very good point.

>

> > I do rather like the idea of a __gnu_memeq() that compilers could

> > optimize memcmp calls to, when they can prove that the result is used

> > only for its truth value.

>

> Yes, we should use a name in the implementation namespace because even

> if we pick an obvious like memequal, it will probably come back under a

> different name from the C committee.

>


+1

What would be the steps for getting that into GLIBC?


>

> Thanks,

> Florian

>

>
Joseph Myers Sept. 15, 2021, 6:30 p.m. | #6
On Wed, 15 Sep 2021, Noah Goldstein via Libc-alpha wrote:

> > > I do rather like the idea of a __gnu_memeq() that compilers could

> > > optimize memcmp calls to, when they can prove that the result is used

> > > only for its truth value.

> >

> > Yes, we should use a name in the implementation namespace because even

> > if we pick an obvious like memequal, it will probably come back under a

> > different name from the C committee.

> >

> 

> +1

> 

> What would be the steps for getting that into GLIBC?


Define what the exact interface you want is (the exact function type and 
(reserved) name and semantics of the return value and arguments; 
explicitly including details such as whether the full n bytes of each 
argument are required to be mapped into memory even if they compare 
unequal before n bytes).

Discuss it on the libc-coord mailing list (probably include compiler 
mailing lists as well) to get agreement on semantics that are good for 
both libc implementations and for compilers to generate; it's best if this 
interface is acceptable to multiple libc implementations and suitable for 
multiple compilers to generate calls to (when available in libc).

Implement in glibc, across all glibc ports and including all the ABI test 
baseline updates.  If the semantics are such that an alias to memcmp is a 
valid implementation, that probably means adding such an alias to every 
memcmp implementation in glibc (and verifying with build-many-glibcs.py 
that they all build and pass the ABI tests), as well as allowing for 
architectures to add their own separate implementation of the new function 
if they wish.  There should also be execution tests that the new function 
works correctly at runtime (with different alignment, arguments just 
before unmapped pages, etc., as with other string function tests).  If the 
new function is purely an ABI, not an API, it doesn't need user manual 
documentation, however (although there will at least need to be a comment 
giving the detailed semantics that were agreed on libc-coord).

-- 
Joseph S. Myers
joseph@codesourcery.com

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 1530939a8c..5fc495eb57 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -47,7 +47,7 @@  bench := $(foreach B,$(filter bench-%,${BENCHSET}), ${${B}})
 endif
 
 # String function benchmarks.
-string-benchset := memccpy memchr memcmp memcpy memmem memmove \
+string-benchset := bcmp memccpy memchr memcmp memcpy memmem memmove \
 		   mempcpy memset rawmemchr stpcpy stpncpy strcasecmp strcasestr \
 		   strcat strchr strchrnul strcmp strcpy strcspn strlen \
 		   strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
diff --git a/benchtests/bench-bcmp.c b/benchtests/bench-bcmp.c
new file mode 100644
index 0000000000..1023639787
--- /dev/null
+++ b/benchtests/bench-bcmp.c
@@ -0,0 +1,20 @@ 
+/* Measure bcmp functions.
+   Copyright (C) 2015-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define TEST_BCMP 1
+#include "bench-memcmp.c"
diff --git a/benchtests/bench-memcmp.c b/benchtests/bench-memcmp.c
index 744c7ec5ba..4d5f8fb766 100644
--- a/benchtests/bench-memcmp.c
+++ b/benchtests/bench-memcmp.c
@@ -17,7 +17,9 @@ 
    <https://www.gnu.org/licenses/>.  */
 
 #define TEST_MAIN
-#ifdef WIDE
+#ifdef TEST_BCMP
+# define TEST_NAME "bcmp"
+#elif defined WIDE
 # define TEST_NAME "wmemcmp"
 #else
 # define TEST_NAME "memcmp"
diff --git a/string/Makefile b/string/Makefile
index f0fce2a0b8..f1f67ee157 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -35,7 +35,7 @@  routines	:= strcat strchr strcmp strcoll strcpy strcspn		\
 		   strncat strncmp strncpy				\
 		   strrchr strpbrk strsignal strspn strstr strtok	\
 		   strtok_r strxfrm memchr memcmp memmove memset	\
-		   mempcpy bcopy bzero ffs ffsll stpcpy stpncpy		\
+		   mempcpy bcmp bcopy bzero ffs ffsll stpcpy stpncpy		\
 		   strcasecmp strncase strcasecmp_l strncase_l		\
 		   memccpy memcpy wordcopy strsep strcasestr		\
 		   swab strfry memfrob memmem rawmemchr strchrnul	\
@@ -52,7 +52,7 @@  strop-tests	:= memchr memcmp memcpy memmove mempcpy memset memccpy	\
 		   stpcpy stpncpy strcat strchr strcmp strcpy strcspn	\
 		   strlen strncmp strncpy strpbrk strrchr strspn memmem	\
 		   strstr strcasestr strnlen strcasecmp strncasecmp	\
-		   strncat rawmemchr strchrnul bcopy bzero memrchr	\
+		   strncat rawmemchr strchrnul bcmp bcopy bzero memrchr	\
 		   explicit_bzero
 tests		:= tester inl-tester noinl-tester testcopy test-ffs	\
 		   tst-strlen stratcliff tst-svc tst-inlcall		\
diff --git a/string/test-bcmp.c b/string/test-bcmp.c
new file mode 100644
index 0000000000..6d19a4a87c
--- /dev/null
+++ b/string/test-bcmp.c
@@ -0,0 +1,21 @@ 
+/* Test and measure bcmp functions.
+   Copyright (C) 2012-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define BAD_RESULT(result, expec) ((!(result)) != (!(expec)))
+#define TEST_BCMP 1
+#include "test-memcmp.c"
diff --git a/string/test-memcmp.c b/string/test-memcmp.c
index 6ddbc05d2f..c630e6799d 100644
--- a/string/test-memcmp.c
+++ b/string/test-memcmp.c
@@ -17,11 +17,14 @@ 
    <https://www.gnu.org/licenses/>.  */
 
 #define TEST_MAIN
-#ifdef WIDE
+#ifdef TEST_BCMP
+# define TEST_NAME "bcmp"
+#elif defined WIDE
 # define TEST_NAME "wmemcmp"
 #else
 # define TEST_NAME "memcmp"
 #endif
+
 #include "test-string.h"
 #ifdef WIDE
 # include <inttypes.h>
@@ -35,6 +38,7 @@ 
 # define CHARBYTES 4
 # define CHAR__MIN WCHAR_MIN
 # define CHAR__MAX WCHAR_MAX
+
 int
 simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
 {
@@ -48,8 +52,11 @@  simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
 }
 #else
 # include <limits.h>
-
-# define MEMCMP memcmp
+# ifdef TEST_BCMP
+#  define MEMCMP bcmp
+# else
+#  define MEMCMP memcmp
+# endif
 # define MEMCPY memcpy
 # define SIMPLE_MEMCMP simple_memcmp
 # define CHAR char
@@ -69,6 +76,12 @@  simple_memcmp (const char *s1, const char *s2, size_t n)
 }
 #endif
 
+# ifndef BAD_RESULT
+#  define BAD_RESULT(result, expec)                                     \
+    (((result) == 0 && (expec)) || ((result) < 0 && (expec) >= 0) ||    \
+     ((result) > 0 && (expec) <= 0))
+#  endif
+
 typedef int (*proto_t) (const CHAR *, const CHAR *, size_t);
 
 IMPL (SIMPLE_MEMCMP, 0)
@@ -79,9 +92,7 @@  check_result (impl_t *impl, const CHAR *s1, const CHAR *s2, size_t len,
 	      int exp_result)
 {
   int result = CALL (impl, s1, s2, len);
-  if ((exp_result == 0 && result != 0)
-      || (exp_result < 0 && result >= 0)
-      || (exp_result > 0 && result <= 0))
+  if (BAD_RESULT(result, exp_result))
     {
       error (0, 0, "Wrong result in function %s %d %d", impl->name,
 	     result, exp_result);
@@ -186,9 +197,7 @@  do_random_tests (void)
 	{
 	  r = CALL (impl, (CHAR *) p1 + align1, (const CHAR *) p2 + align2,
 		    len);
-	  if ((r == 0 && result)
-	      || (r < 0 && result >= 0)
-	      || (r > 0 && result <= 0))
+	  if (BAD_RESULT(r, result))
 	    {
 	      error (0, 0, "Iteration %zd - wrong result in function %s (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
 		     n, impl->name, align1 * CHARBYTES & 63,  align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
index 870e15c5a0..dfd0269db2 100644
--- a/sysdeps/x86_64/memcmp.S
+++ b/sysdeps/x86_64/memcmp.S
@@ -356,6 +356,4 @@  L(ATR32res):
 	.p2align 4,, 4
 END(memcmp)
 
-#undef bcmp
-weak_alias (memcmp, bcmp)
 libc_hidden_builtin_def (memcmp)
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index 26be40959c..9dd0d8c3ff 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -1,6 +1,7 @@ 
 ifeq ($(subdir),string)
 
 sysdep_routines += strncat-c stpncpy-c strncpy-c \
+		   bcmp-sse2 bcmp-sse4 bcmp-avx2 \
 		   strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3  \
 		   strcmp-sse4_2 strcmp-avx2 \
 		   strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \
@@ -40,6 +41,7 @@  sysdep_routines += strncat-c stpncpy-c strncpy-c \
 		   memset-sse2-unaligned-erms \
 		   memset-avx2-unaligned-erms \
 		   memset-avx512-unaligned-erms \
+		   bcmp-avx2-rtm \
 		   memchr-avx2-rtm \
 		   memcmp-avx2-movbe-rtm \
 		   memmove-avx-unaligned-erms-rtm \
@@ -59,6 +61,7 @@  sysdep_routines += strncat-c stpncpy-c strncpy-c \
 		   strncpy-avx2-rtm \
 		   strnlen-avx2-rtm \
 		   strrchr-avx2-rtm \
+		   bcmp-evex \
 		   memchr-evex \
 		   memcmp-evex-movbe \
 		   memmove-evex-unaligned-erms \
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
new file mode 100644
index 0000000000..d742257e4e
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
@@ -0,0 +1,12 @@ 
+#ifndef MEMCMP
+# define MEMCMP __bcmp_avx2_rtm
+#endif
+
+#define ZERO_UPPER_VEC_REGISTERS_RETURN \
+  ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
+
+#define VZEROUPPER_RETURN jmp	 L(return_vzeroupper)
+
+#define SECTION(p) p##.avx.rtm
+
+#include "bcmp-avx2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S b/sysdeps/x86_64/multiarch/bcmp-avx2.S
new file mode 100644
index 0000000000..93a9a20b17
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
@@ -0,0 +1,23 @@ 
+/* bcmp optimized with AVX2.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MEMCMP
+# define MEMCMP	__bcmp_avx2
+#endif
+
+#include "bcmp-avx2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S b/sysdeps/x86_64/multiarch/bcmp-evex.S
new file mode 100644
index 0000000000..ade52e8c68
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
@@ -0,0 +1,23 @@ 
+/* bcmp optimized with EVEX.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MEMCMP
+# define MEMCMP	__bcmp_evex
+#endif
+
+#include "memcmp-evex-movbe.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-sse2.S b/sysdeps/x86_64/multiarch/bcmp-sse2.S
new file mode 100644
index 0000000000..b18d570386
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-sse2.S
@@ -0,0 +1,23 @@ 
+/* bcmp optimized with SSE2
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# ifndef memcmp
+#  define memcmp	__bcmp_sse2
+# endif
+# define USE_AS_BCMP	1
+#include "memcmp-sse2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-sse4.S b/sysdeps/x86_64/multiarch/bcmp-sse4.S
new file mode 100644
index 0000000000..ed9804053f
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-sse4.S
@@ -0,0 +1,23 @@ 
+/* bcmp optimized with SSE4.1
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# ifndef MEMCMP
+#  define MEMCMP	__bcmp_sse4_1
+# endif
+# define USE_AS_BCMP	1
+#include "memcmp-sse4.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp.c b/sysdeps/x86_64/multiarch/bcmp.c
new file mode 100644
index 0000000000..6e26b73ecc
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp.c
@@ -0,0 +1,35 @@ 
+/* Multiple versions of bcmp.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define bcmp __redirect_bcmp
+# include <string.h>
+# undef bcmp
+
+# define SYMBOL_NAME bcmp
+# include "ifunc-bcmp.h"
+
+libc_ifunc_redirected (__redirect_bcmp, bcmp, IFUNC_SELECTOR ());
+
+# ifdef SHARED
+__hidden_ver1 (bcmp, __GI_bcmp, __redirect_bcmp)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (bcmp);
+# endif
+#endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
new file mode 100644
index 0000000000..b0dacd8526
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
@@ -0,0 +1,53 @@ 
+/* Common definition for bcmp ifunc selections.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# include <init-arch.h>
+
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+  const struct cpu_features* cpu_features = __get_cpu_features ();
+
+  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
+      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
+      && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
+    {
+      if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
+	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
+	return OPTIMIZE (evex);
+
+      if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
+	return OPTIMIZE (avx2_rtm);
+
+      if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
+	return OPTIMIZE (avx2);
+    }
+
+  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))
+    return OPTIMIZE (sse4_1);
+
+  return OPTIMIZE (sse2);
+}
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index 39ab10613b..dd0c393c7d 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -38,6 +38,29 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   size_t i = 0;
 
+  /* Support sysdeps/x86_64/multiarch/bcmp.c.  */
+  IFUNC_IMPL (i, name, bcmp,
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX2)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (BMI2)),
+			      __bcmp_avx2)
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX2)
+			       && CPU_FEATURE_USABLE (BMI2)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (RTM)),
+			      __bcmp_avx2_rtm)
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX512VL)
+			       && CPU_FEATURE_USABLE (AVX512BW)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (BMI2)),
+			      __bcmp_evex)
+	      IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
+			      __bcmp_sse4_1)
+	      IFUNC_IMPL_ADD (array, i, bcmp, 1, __bcmp_sse2))
+
   /* Support sysdeps/x86_64/multiarch/memchr.c.  */
   IFUNC_IMPL (i, name, memchr,
 	      IFUNC_IMPL_ADD (array, i, memchr,
diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S b/sysdeps/x86_64/multiarch/memcmp-sse2.S
index b135fa2d40..2a4867ad18 100644
--- a/sysdeps/x86_64/multiarch/memcmp-sse2.S
+++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S
@@ -17,7 +17,9 @@ 
    <https://www.gnu.org/licenses/>.  */
 
 #if IS_IN (libc)
-# define memcmp __memcmp_sse2
+# ifndef memcmp
+#  define memcmp __memcmp_sse2
+# endif
 
 # ifdef SHARED
 #  undef libc_hidden_builtin_def
diff --git a/sysdeps/x86_64/multiarch/memcmp.c b/sysdeps/x86_64/multiarch/memcmp.c
index fe725f3563..1760e045df 100644
--- a/sysdeps/x86_64/multiarch/memcmp.c
+++ b/sysdeps/x86_64/multiarch/memcmp.c
@@ -27,8 +27,6 @@ 
 # include "ifunc-memcmp.h"
 
 libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ());
-# undef bcmp
-weak_alias (memcmp, bcmp)
 
 # ifdef SHARED
 __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)