[updated] Use a proper C tokenizer to implement the obsolete typedefs test.

Message ID 20190218143920.20885-1-zackw@panix.com
State New
Headers show
Series
  • [updated] Use a proper C tokenizer to implement the obsolete typedefs test.
Related show

Commit Message

Zack Weinberg Feb. 18, 2019, 2:39 p.m.
The test for obsolete typedefs in installed headers was implemented
using “grep” and could therefore get false positives on e.g. “ulong”
in a comment.  It was also scanning all of the headers included by our
headers, and therefore testing headers we don’t control, e.g. Linux
kernel headers.

This patch splits the obsolete-typedef test from
scripts/check-installed-headers.sh to a separate program,
scripts/check-obsolete-constructs.py.  Being implemented in Python,
it is feasible to make it tokenize C accurately enough to avoid false
positives on the contents of comments and strings.  It also only
examines $(headers) in each subdirectory--all the headers we install,
but not any external dependencies of those headers.

It is also feasible to make the new test understand the difference
between _defining_ the obsolete typedefs and _using_ the obsolete
typedefs, which means posix/{bits,sys}/types.h no longer need to be
exempted.  This uncovered an actual bug in bits/types.h: __quad_t and
__u_quad_t were being used to define __S64_TYPE, __U64_TYPE,
__SQUAD_TYPE and __UQUAD_TYPE.  These are changed to __int64_t and
__uint64_t respectively.  (__SQUAD_TYPE and __UQUAD_TYPE should be
considered obsolete as well, but that is more of a change than I feel
is safe during the freeze.  Note that the comments in bits/types.h
claiming a difference between *QUAD_TYPE and *64_TYPE were also in
error: supposedly QUAD_TYPE is ‘long long’ in all ABIs whereas 64_TYPE
is ‘long’ in LP64 ABIs, but that appears never to have been true; both
are defined as ‘long’ in LP64 ABIs.  I made a minimal change to make
the comments not completely wrong and will revisit this whole area for
the next release.)

The change to sys/types.h removes a construct that was too complicated
for the new script (which lexes C but does not attempt to parse it) to
understand.  It should have absolutely no functional effect.  We might
want to consider limiting sys/types.h’s definition of intN_t and
u_intN_t to __USE_MISC, and we might also want to consider adding
register_t to the set of obsolete typedefs, but those changes are much
too risky during the freeze (even within our own headers there are
places where we assume sys/types.h defines intN_t unconditionally).

	* scripts/check-obsolete-constructs.py: New test script.
        * scripts/check-installed-headers.sh: Don’t test for obsolete
        typedefs.
        * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
        as a special test.  Update commentary.
        * bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
        (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
        Update commentary.
        * sys/types.h (__u_intN_t): Remove.
        (u_int8_t): Typedef using __uint8_t.
        (u_int16_t): Typedef using __uint16_t.
        (u_int32_t): Typedef using __uint32_t.
        (u_int64_t): Typedef using __uint64_t.
---

This patch has been rebased on current master.  No other changes.
I do still intend to clean up the QUAD_TYPE vs 64_TYPE mess but
that's not going to be backportable to release branches, and I
suspect distro maintainers may want the test improvements on at
least the 2.29 branch, so I'll do that separately.

 Rules                                |  17 +-
 posix/bits/types.h                   |  10 +-
 posix/sys/types.h                    |  33 +---
 scripts/check-installed-headers.sh   |  37 +----
 scripts/check-obsolete-constructs.py | 225 +++++++++++++++++++++++++++
 5 files changed, 259 insertions(+), 63 deletions(-)
 create mode 100755 scripts/check-obsolete-constructs.py

-- 
2.20.1

Comments

Carlos O'Donell Feb. 20, 2019, 4:32 p.m. | #1
Zack,

This looks good to me.

OK for master if you:
* Fix commit message to remove freeze comments.
* Convincingly explain the gcc 2.7 testing worries I have.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>


On 2/18/19 9:39 AM, Zack Weinberg wrote:
> The test for obsolete typedefs in installed headers was implemented

> using “grep” and could therefore get false positives on e.g. “ulong”

> in a comment.  It was also scanning all of the headers included by our

> headers, and therefore testing headers we don’t control, e.g. Linux

> kernel headers.


Correct.

> This patch splits the obsolete-typedef test from

> scripts/check-installed-headers.sh to a separate program,

> scripts/check-obsolete-constructs.py.  Being implemented in Python,

> it is feasible to make it tokenize C accurately enough to avoid false

> positives on the contents of comments and strings.  It also only

> examines $(headers) in each subdirectory--all the headers we install,

> but not any external dependencies of those headers.


OK.

> It is also feasible to make the new test understand the difference

> between _defining_ the obsolete typedefs and _using_ the obsolete

> typedefs, which means posix/{bits,sys}/types.h no longer need to be

> exempted.  This uncovered an actual bug in bits/types.h: __quad_t and

> __u_quad_t were being used to define __S64_TYPE, __U64_TYPE,

> __SQUAD_TYPE and __UQUAD_TYPE.  These are changed to __int64_t and

> __uint64_t respectively.  (__SQUAD_TYPE and __UQUAD_TYPE should be

> considered obsolete as well, but that is more of a change than I feel

> is safe during the freeze.  Note that the comments in bits/types.h

> claiming a difference between *QUAD_TYPE and *64_TYPE were also in

> error: supposedly QUAD_TYPE is ‘long long’ in all ABIs whereas 64_TYPE

> is ‘long’ in LP64 ABIs, but that appears never to have been true; both

> are defined as ‘long’ in LP64 ABIs.  I made a minimal change to make

> the comments not completely wrong and will revisit this whole area for

> the next release.)


We are no longer in a freeze, please adjust commit comment as required.

OK. Reviewed *QUAD_TYPE for all machines and found it to be the appropriately
signed long type. Confirmed. However, in 2.12 I found traces that an arbitrary
'#define QUAD_TYPE _Quad' was used for IA64 where it did actually mean that
this was a 128-bit type, but it was never used for anything and seems to be
Intel compiler specific junk that was cargo-culted.

> The change to sys/types.h removes a construct that was too complicated

> for the new script (which lexes C but does not attempt to parse it) to

> understand.  It should have absolutely no functional effect.  We might

> want to consider limiting sys/types.h’s definition of intN_t and

> u_intN_t to __USE_MISC, and we might also want to consider adding

> register_t to the set of obsolete typedefs, but those changes are much

> too risky during the freeze (even within our own headers there are

> places where we assume sys/types.h defines intN_t unconditionally).


We are no longer in a freeze, please adjust commit comment as required.

> 

> 	* scripts/check-obsolete-constructs.py: New test script.

>         * scripts/check-installed-headers.sh: Don’t test for obsolete

>         typedefs.

>         * Rules: Run scripts/check-obsolete-constructs.py over $(headers)

>         as a special test.  Update commentary.

>         * bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.

>         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.

>         Update commentary.

>         * sys/types.h (__u_intN_t): Remove.

>         (u_int8_t): Typedef using __uint8_t.

>         (u_int16_t): Typedef using __uint16_t.

>         (u_int32_t): Typedef using __uint32_t.

>         (u_int64_t): Typedef using __uint64_t.

> ---

> 

> This patch has been rebased on current master.  No other changes.

> I do still intend to clean up the QUAD_TYPE vs 64_TYPE mess but

> that's not going to be backportable to release branches, and I

> suspect distro maintainers may want the test improvements on at

> least the 2.29 branch, so I'll do that separately.


Agreed.

> 

>  Rules                                |  17 +-

>  posix/bits/types.h                   |  10 +-

>  posix/sys/types.h                    |  33 +---

>  scripts/check-installed-headers.sh   |  37 +----

>  scripts/check-obsolete-constructs.py | 225 +++++++++++++++++++++++++++

>  5 files changed, 259 insertions(+), 63 deletions(-)

>  create mode 100755 scripts/check-obsolete-constructs.py

> 

> diff --git a/Rules b/Rules

> index e08a28d9f3..222dba6dcb 100644

> --- a/Rules

> +++ b/Rules

> @@ -82,7 +82,8 @@ $(common-objpfx)dummy.c:

>  common-generated += dummy.o dummy.c

>  

>  ifneq "$(headers)" ""

> -# Special test of all the installed headers in this directory.

> +# Test that all of the headers installed by this directory can be compiled

> +# in isolation.


OK.

>  tests-special += $(objpfx)check-installed-headers-c.out

>  libof-check-installed-headers-c := testsuite

>  $(objpfx)check-installed-headers-c.out: \

> @@ -93,6 +94,8 @@ $(objpfx)check-installed-headers-c.out: \

>  	$(evaluate-test)

>  

>  ifneq "$(CXX)" ""

> +# If a C++ compiler is available, also test that they can be compiled

> +# in isolation as C++.


OK.

>  tests-special += $(objpfx)check-installed-headers-cxx.out

>  libof-check-installed-headers-cxx := testsuite

>  $(objpfx)check-installed-headers-cxx.out: \

> @@ -103,12 +106,24 @@ $(objpfx)check-installed-headers-cxx.out: \

>  	$(evaluate-test)

>  endif # $(CXX)

>  

> +# Test that a wrapper header exists in include/ for each non-sysdeps header.

> +# This script does not need $(py-env).


OK.

>  tests-special += $(objpfx)check-wrapper-headers.out

>  $(objpfx)check-wrapper-headers.out: \

>    $(..)scripts/check-wrapper-headers.py $(headers)

>  	$(PYTHON) $< --root=$(..) --subdir=$(subdir) $(headers) > $@; \

>  	  $(evaluate-test)

>  

> +# Test that none of the headers installed by this directory use certain

> +# obsolete constructs (e.g. legacy BSD typedefs superseded by stdint.h).

> +# This script does not need $(py-env).

> +tests-special += $(objpfx)check-obsolete-constructs.out

> +libof-check-obsolete-constructs := testsuite

> +$(objpfx)check-obsolete-constructs.out: \

> +    $(..)scripts/check-obsolete-constructs.py $(headers)

> +	$(PYTHON) $^ > $@ 2>&1; \

> +	$(evaluate-test)


OK.

> +

>  endif # $(headers)

>  

>  # This makes all the auxiliary and test programs.

> diff --git a/posix/bits/types.h b/posix/bits/types.h

> index 27e065c3be..0de6c59bb4 100644

> --- a/posix/bits/types.h

> +++ b/posix/bits/types.h

> @@ -87,7 +87,7 @@ __extension__ typedef unsigned long long int __uintmax_t;

>  	32		-- "natural" 32-bit type (always int)

>  	64		-- "natural" 64-bit type (long or long long)

>  	LONG32		-- 32-bit type, traditionally long

> -	QUAD		-- 64-bit type, always long long

> +	QUAD		-- 64-bit type, traditionally long long


OK.

>  	WORD		-- natural type of __WORDSIZE bits (int or long)

>  	LONGWORD	-- type of __WORDSIZE bits, traditionally long

>  

> @@ -113,14 +113,14 @@ __extension__ typedef unsigned long long int __uintmax_t;

>  #define __SLONGWORD_TYPE	long int

>  #define __ULONGWORD_TYPE	unsigned long int

>  #if __WORDSIZE == 32

> -# define __SQUAD_TYPE		__quad_t

> -# define __UQUAD_TYPE		__u_quad_t

> +# define __SQUAD_TYPE		__int64_t

> +# define __UQUAD_TYPE		__uint64_t


OK.

>  # define __SWORD_TYPE		int

>  # define __UWORD_TYPE		unsigned int

>  # define __SLONG32_TYPE		long int

>  # define __ULONG32_TYPE		unsigned long int

> -# define __S64_TYPE		__quad_t

> -# define __U64_TYPE		__u_quad_t

> +# define __S64_TYPE		__int64_t

> +# define __U64_TYPE		__uint64_t


OK.

>  /* We want __extension__ before typedef's that use nonstandard base types

>     such as `long long' in C89 mode.  */

>  # define __STD_TYPE		__extension__ typedef

> diff --git a/posix/sys/types.h b/posix/sys/types.h

> index 27129c5c23..0e37b1ce6a 100644

> --- a/posix/sys/types.h

> +++ b/posix/sys/types.h

> @@ -154,37 +154,20 @@ typedef unsigned int uint;

>  

>  #include <bits/stdint-intn.h>

>  

> -#if !__GNUC_PREREQ (2, 7)

> -

>  /* These were defined by ISO C without the first `_'.  */

> -typedef	unsigned char u_int8_t;

> -typedef	unsigned short int u_int16_t;

> -typedef	unsigned int u_int32_t;

> -# if __WORDSIZE == 64

> -typedef unsigned long int u_int64_t;

> -# else

> -__extension__ typedef unsigned long long int u_int64_t;

> -# endif

> -

> -typedef int register_t;

> -

> -#else

> -

> -/* For GCC 2.7 and later, we can use specific type-size attributes.  */

> -# define __u_intN_t(N, MODE) \

> -  typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE)))

> -

> -__u_intN_t (8, __QI__);

> -__u_intN_t (16, __HI__);

> -__u_intN_t (32, __SI__);

> -__u_intN_t (64, __DI__);

> +typedef __uint8_t u_int8_t;

> +typedef __uint16_t u_int16_t;

> +typedef __uint32_t u_int32_t;

> +typedef __uint64_t u_int64_t;


This is a very old compiler check in a public header.

Did you verify gcc 2.7 can actually handle the new constructs?

It looks OK to me, and I think the u_int*_t types have existed forever,
but I'm always nervous about touching these defines in public headers.

It's not like anyone actually uses gcc 2.7, so I don't know why I'm worried,
I bet lots of stuff is broken in a modern day set of headers that if you tried
to slot in a new glibc it would just blow up (we don't test this combination
regularly enough).

So either I have to think about the theoretical wrongs this patch commits,
or just admit that it doesn't matter.

Thoughts?

>  

> +#if __GNUC_PREREQ (2, 7)

>  typedef int register_t __attribute__ ((__mode__ (__word__)));

> -

> +#else

> +typedef int register_t;

> +#endif

>  

>  /* Some code from BIND tests this macro to see if the types above are

>     defined.  */

> -#endif

>  #define __BIT_TYPES_DEFINED__	1

>  

>  

> diff --git a/scripts/check-installed-headers.sh b/scripts/check-installed-headers.sh

> index 8e7beffd82..63bc8d4fa6 100644

> --- a/scripts/check-installed-headers.sh

> +++ b/scripts/check-installed-headers.sh

> @@ -16,11 +16,9 @@

>  # License along with the GNU C Library; if not, see

>  # <http://www.gnu.org/licenses/>.

>  

> -# Check installed headers for cleanliness.  For each header, confirm

> -# that it's possible to compile a file that includes that header and

> -# does nothing else, in several different compilation modes.  Also,

> -# scan the header for a set of obsolete typedefs that should no longer

> -# appear.

> +# For each installed header, confirm that it's possible to compile a

> +# file that includes that header and does nothing else, in several

> +# different compilation modes.


OK.

>  

>  # These compilation switches assume GCC or compatible, which is probably

>  # fine since we also assume that when _building_ glibc.

> @@ -31,13 +29,6 @@ cxx_modes="-std=c++98 -std=gnu++98 -std=c++11 -std=gnu++11"

>  # These are probably the most commonly used three.

>  lib_modes="-D_DEFAULT_SOURCE=1 -D_GNU_SOURCE=1 -D_XOPEN_SOURCE=700"

>  

> -# sys/types.h+bits/types.h have to define the obsolete types.

> -# rpc(svc)/* have the obsolete types too deeply embedded in their API

> -# to remove.

> -skip_obsolete_type_check='*/sys/types.h|*/bits/types.h|*/rpc/*|*/rpcsvc/*'

> -obsolete_type_re=\

> -'\<((__)?(quad_t|u(short|int|long|_(char|short|int([0-9]+_t)?|long|quad_t))))\>'

> -


OK.

>  if [ $# -lt 3 ]; then

>      echo "usage: $0 c|c++ \"compile command\" header header header..." >&2

>      exit 2

> @@ -46,14 +37,10 @@ case "$1" in

>      (c)

>          lang_modes="$c_modes"

>          cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.c)

> -        already="$skip_obsolete_type_check"

>      ;;

>      (c++)

>          lang_modes="$cxx_modes"

>          cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.cc)

> -        # The obsolete-type check can be skipped for C++; it is

> -        # sufficient to do it for C.

> -        already="*"

>      ;;

>      (*)

>          echo "usage: $0 c|c++ \"compile command\" header header header..." >&2

> @@ -151,22 +138,8 @@ $expanded_lib_mode

>  int avoid_empty_translation_unit;

>  EOF

>              if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1

> -            then

> -                includes=$($cc_cmd -fsyntax-only -H $lang_mode \

> -                              "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')

> -                for h in $includes; do

> -                    # Don't repeat work.

> -                    eval 'case "$h" in ('"$already"') continue;; esac'

> -

> -                    if grep -qE "$obsolete_type_re" "$h"; then

> -                        echo "*** Obsolete types detected:"

> -                        grep -HE "$obsolete_type_re" "$h"

> -                        failed=1

> -                    fi

> -                    already="$already|$h"

> -                done

> -            else

> -                failed=1

> +            then :

> +            else failed=1


OK.

>              fi

>          done

>      done

> diff --git a/scripts/check-obsolete-constructs.py b/scripts/check-obsolete-constructs.py

> new file mode 100755

> index 0000000000..a0af229dc8

> --- /dev/null

> +++ b/scripts/check-obsolete-constructs.py

> @@ -0,0 +1,225 @@

> +#! /usr/bin/python3

> +# Copyright (C) 2019 Free Software Foundation, Inc.

> +# This file is part of the GNU C Library.


OK.

> +#

> +# The GNU C Library is free software; you can redistribute it and/or

> +# modify it under the terms of the GNU Lesser General Public

> +# License as published by the Free Software Foundation; either

> +# version 2.1 of the License, or (at your option) any later version.

> +#

> +# The GNU C Library is distributed in the hope that it will be useful,

> +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> +# Lesser General Public License for more details.

> +#

> +# You should have received a copy of the GNU Lesser General Public

> +# License along with the GNU C Library; if not, see

> +# <http://www.gnu.org/licenses/>.

> +

> +"""Verifies that installed headers do not use any obsolete constructs:

> + * legacy BSD typedefs superseded by <stdint.h>:

> +   ushort uint ulong u_short u_int u_long u_intNN_t quad_t u_quad_t

> +   (sys/types.h is allowed to _define_ these types, but not to use them

> +    to define anything else).


OK.

> +"""

> +

> +import argparse

> +import collections

> +import os

> +import re

> +import sys

> +

> +# Simplified lexical analyzer for C preprocessing tokens.

> +# Does not implement trigraphs.

> +# Does not implement backslash-newline in the middle of any lexical

> +#   item other than a string literal.

> +# Does not implement universal-character-names in identifiers.

> +# Does not implement header-names.

> +# Treats prefixed strings (e.g. L"...") as two tokens (L and "...")

> +# Accepts non-ASCII characters only within comments and strings.


OK.

> +

> +# Caution: The order of the outermost alternation matters.

> +# STRING must be before BAD_STRING, CHARCONST before BAD_CHARCONST,

> +# BLOCK_COMMENT before BAD_BLOCK_COM before PUNCTUATOR, and OTHER must

> +# be last.

> +# Caution: There should be no capturing groups other than the named

> +# captures in the outermost alternation.


OK.

> +

> +# For reference, these are all of the C punctuators as of C11:

> +#   [ ] ( ) { } , ; ? ~

> +#   ! != * *= / /= ^ ^= = ==

> +#   # ##

> +#   % %= %> %: %:%:

> +#   & &= &&

> +#   | |= ||

> +#   + += ++

> +#   - -= -- ->

> +#   . ...

> +#   : :>

> +#   < <% <: << <<= <=

> +#   > >= >> >>=

> +

> +# The BAD_* tokens are not part of the official definition of pp-tokens;

> +# they match unclosed strings, character constants, and block comments,

> +# so that the regex engine doesn't have to backtrack all the way to the

> +# beginning of a broken construct and then emit dozens of junk tokens.

> +

> +PP_TOKEN_RE_ = re.compile(r'''

> +    (?P<STRING>        \"(?:[^\"\\\r\n]|\\(?:[\r\n -~]|\r\n))*\")

> +   |(?P<BAD_STRING>    \"(?:[^\"\\\r\n]|\\[ -~])*)

> +   |(?P<CHARCONST>     \'(?:[^\'\\\r\n]|\\(?:[\r\n -~]|\r\n))*\')

> +   |(?P<BAD_CHARCONST> \'(?:[^\'\\\r\n]|\\[ -~])*)

> +   |(?P<BLOCK_COMMENT> /\*(?:\*(?!/)|[^*])*\*/)

> +   |(?P<BAD_BLOCK_COM> /\*(?:\*(?!/)|[^*])*\*?)

> +   |(?P<LINE_COMMENT>  //[^\r\n]*)

> +   |(?P<IDENT>         [_a-zA-Z][_a-zA-Z0-9]*)

> +   |(?P<PP_NUMBER>     \.?[0-9](?:[0-9a-df-oq-zA-DF-OQ-Z_.]|[eEpP][+-]?)*)

> +   |(?P<PUNCTUATOR>

> +       [,;?~(){}\[\]]

> +     | [!*/^=]=?

> +     | \#\#?

> +     | %(?:[=>]|:(?:%:)?)?

> +     | &[=&]?

> +     |\|[=|]?

> +     |\+[=+]?

> +     | -[=->]?

> +     |\.(?:\.\.)?

> +     | :>?

> +     | <(?:[%:]|<(?:=|<=?)?)?

> +     | >(?:=|>=?)?)

> +   |(?P<ESCNL>         \\(?=[\r\n]))

> +   |(?P<WHITESPACE>    [ \t\n\r\v\f]+)

> +   |(?P<OTHER>         .)

> +''', re.DOTALL | re.VERBOSE)

> +ENDLINE_RE_ = re.compile(r'\r|\n|\r\n')


OK.

> +

> +# based on the sample code in the Python re documentation

> +Token_ = collections.namedtuple('Token', (

> +    'kind', 'text', 'line', 'column'))

> +def tokenize_c(file_contents):

> +    Token = Token_

> +    PP_TOKEN_RE = PP_TOKEN_RE_

> +    ENDLINE_RE = ENDLINE_RE_

> +

> +    line_num = 1

> +    line_start = 0

> +    for mo in PP_TOKEN_RE.finditer(file_contents):

> +        kind = mo.lastgroup

> +        text = mo.group()

> +        line = line_num

> +        column = mo.start() - line_start

> +        # only these kinds can contain a newline

> +        if kind in ('WHITESPACE', 'BLOCK_COMMENT', 'LINE_COMMENT',

> +                    'STRING', 'CHARCONST', 'BAD_BLOCK_COM'):

> +            adj_line_start = 0

> +            for tmo in ENDLINE_RE.finditer(text):

> +                line_num += 1

> +                adj_line_start = tmo.end()

> +            if adj_line_start:

> +                line_start = mo.start() + adj_line_start

> +        yield Token(kind, text, line, column + 1)


OK.

> +

> +

> +class Checker:

> +    def __init__(self):

> +        self.status = 0

> +

> +    def error(self, fname, tok, message):

> +        self.status = 1

> +        if '{!r}' in message:

> +            message = message.format(tok.text)

> +        sys.stderr.write("{}:{}:{}: error: {}\n".format(

> +            fname, tok.line, tok.column, message))


OK.

> +

> +    # The obsolete type names we're looking for:

> +    OBSOLETE_TYPE_RE_ = re.compile(r'''\A

> +      (__)?

> +      (?: quad_t

> +        | u(?: short | int | long

> +             | _(?: char | short | int([0-9]+_t)? | long | quad_t )))

> +    \Z''', re.VERBOSE)


OK.

> +

> +    def check_file(self, fname):

> +        OBSOLETE_TYPE_RE = self.OBSOLETE_TYPE_RE_

> +

> +        try:

> +            with open(fname, "rt") as fp:

> +                contents = fp.read()

> +        except OSError as e:

> +            sys.stderr.write("{}: {}\n".format(fname, e.strerror))

> +            self.status = 1

> +            return


OK.

> +

> +        # The obsolete rpc/ and rpcsvc/ headers are allowed to use the

> +        # obsolete types, because it would be more trouble than it's

> +        # worth to remove them from headers that we intend to stop

> +        # installing eventually anyway.

> +        uses_ok = (fname.startswith("rpc/") or

> +                   fname.startswith("rpcsvc/") or

> +                   "/rpc/" in fname or

> +                   "/rpcsvc/" in fname)


OK.

> +

> +        # sys/types.h and bits/types.h are allowed to define the

> +        # obsolete types, and they're allowed to use the __-versions

> +        # of the obsolete types to define the unprefixed versions of

> +        # the obsolete types.

> +        typedefs_ok = (fname == "sys/types.h" or

> +                       fname == "bits/types.h" or

> +                       fname.endswith("/sys/types.h") or

> +                       fname.endswith("/bits/types.h"))


OK.

> +        in_typedef = False

> +        delayed_error_token = None

> +

> +        for tok in tokenize_c(contents):


OK. Start tokenization loop.

> +            kind = tok.kind

> +            if kind == 'BAD_STRING':

> +                self.error(fname, tok, "unclosed string")

> +            elif kind == 'BAD_CHARCONST':

> +                self.error(fname, tok, "unclosed char constant")

> +            elif kind == 'BAD_BLOCK_COM':

> +                self.error(fname, tok, "unclosed block comment")

> +            elif kind == 'OTHER':

> +                self.error(fname, tok, "stray {!r} in program")


OK.

> +

> +            elif kind == 'IDENT':

> +                if tok.text == 'typedef':

> +                    in_typedef = True

> +                    continue

> +

> +                mo = OBSOLETE_TYPE_RE.match(tok.text)


OK. Match the token to the obsolete types.

> +                if mo:

> +                    if uses_ok:

> +                        continue

> +

> +                    if typedefs_ok and in_typedef:

> +                        # inside a typedef, the underscore versions

> +                        # are allowed unconditionally, and the

> +                        # non-underscore versions are allowed as the

> +                        # final identifier (the one being defined).

> +                        if mo.group(1) != '__':

> +                            delayed_error_token = tok

> +                        continue


OK.

> +

> +                    self.error(fname, tok, "use of {!r}")

> +

> +            elif kind == 'PUNCTUATOR':

> +                if in_typedef and tok.text == ';':

> +                    delayed_error_token = None

> +                    in_typedef = False

> +

> +            if delayed_error_token is not None:

> +                self.error(fname, delayed_error_token, "use of {!r}")

> +                delayed_error_token = None

> +

> +def main():

> +    ap = argparse.ArgumentParser(description=__doc__)

> +    ap.add_argument("headers", metavar="header", nargs="+",

> +                    help="one or more headers to scan for obsolete constructs")

> +    args = ap.parse_args()

> +

> +    checker = Checker()

> +    for fname in args.headers:

> +        checker.check_file(fname)

> +    sys.exit(checker.status)

> +

> +main()


OK.

> 



-- 
Cheers,
Carlos.
Florian Weimer Feb. 20, 2019, 4:44 p.m. | #2
* Zack Weinberg:

>         * sys/types.h (__u_intN_t): Remove.

>         (u_int8_t): Typedef using __uint8_t.

>         (u_int16_t): Typedef using __uint16_t.

>         (u_int32_t): Typedef using __uint32_t.

>         (u_int64_t): Typedef using __uint64_t.


Why doesn't this change C++ name mangling?

The ABI doesn't have names for the mode attribute, and at least
uint32_t, there are two conceivable representations on 32-bit (unsigned
int and unsigned long int).  See the architecture variance in the
manging of size_t.

Thanks,
Florian
Andreas Schwab Feb. 20, 2019, 4:51 p.m. | #3
On Feb 18 2019, Zack Weinberg <zackw@panix.com> wrote:

>         * bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.

>         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.

>         Update commentary.

>         * sys/types.h (__u_intN_t): Remove.


These files are both in posix/.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Joseph Myers Feb. 20, 2019, 5:29 p.m. | #4
On Wed, 20 Feb 2019, Florian Weimer wrote:

> * Zack Weinberg:

> 

> >         * sys/types.h (__u_intN_t): Remove.

> >         (u_int8_t): Typedef using __uint8_t.

> >         (u_int16_t): Typedef using __uint16_t.

> >         (u_int32_t): Typedef using __uint32_t.

> >         (u_int64_t): Typedef using __uint64_t.

> 

> Why doesn't this change C++ name mangling?


Because the logic in c_common_type_for_mode prefers unsigned int over long 
unsigned int, and long unsigned int over long long unsigned int, if they 
have the same mode. which matches the logic used in bits/types.h.  (Also, 
the mangling of those types is verified in c++-types.data so the testsuite 
would show if it changed.)

-- 
Joseph S. Myers
joseph@codesourcery.com
Zack Weinberg Feb. 20, 2019, 6:03 p.m. | #5
On Wed, Feb 20, 2019 at 11:32 AM Carlos O'Donell <carlos@redhat.com> wrote:
>

> This looks good to me.


Thanks for review.  A build-many-glibcs run found problems with
arch-specific headers which I believe I have corrected, but they
involve complex changes to the test script so I would appreciate
another round of review.  (No further changes to installed headers are
required, though.) Will post a revised patch either late tonight or
tomorrow morning, depending how long another b-m-g run takes to
complete.

> * Fix commit message to remove freeze comments.


Done.

> * Convincingly explain the gcc 2.7 testing worries I have.


I believe you have misread the patch.  There are several 'typedef int
foo_t __attribute__ ((__mode__ (bar)))' constructs currently present
in sys/types.h; my patch removes some of them.  I'm confident that it
does not introduce any new ones, and I think (but wouldn't mind a
second pair of eyes verifying that) it does not change the
preprocessor conditionals controlling the remaining ones.

I have a follow-up patch under development that removes the remaining
uses of __attribute__ ((__mode__ (xxx))).

zw
Zack Weinberg Feb. 20, 2019, 6:05 p.m. | #6
On Wed, Feb 20, 2019 at 11:51 AM Andreas Schwab <schwab@suse.de> wrote:
> On Feb 18 2019, Zack Weinberg <zackw@panix.com> wrote:

>

> >         * bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.

> >         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.

> >         Update commentary.

> >         * sys/types.h (__u_intN_t): Remove.

>

> These files are both in posix/.


I understand what you are trying to communicate here, but someone with
less experience would not necessarily realize that you were pointing
out that these change log entries refer to the files by the wrong
pathnames.  Would it kill you to be a little more specific?

Anyway, corrected.

zw
Zack Weinberg Feb. 20, 2019, 6:07 p.m. | #7
On Wed, Feb 20, 2019 at 12:29 PM Joseph Myers <joseph@codesourcery.com> wrote:
> On Wed, 20 Feb 2019, Florian Weimer wrote:

>

> > * Zack Weinberg:

> >

> > >         * sys/types.h (__u_intN_t): Remove.

> > >         (u_int8_t): Typedef using __uint8_t.

> > >         (u_int16_t): Typedef using __uint16_t.

> > >         (u_int32_t): Typedef using __uint32_t.

> > >         (u_int64_t): Typedef using __uint64_t.

> >

> > Why doesn't this change C++ name mangling?

>

> Because the logic in c_common_type_for_mode prefers unsigned int over long

> unsigned int, and long unsigned int over long long unsigned int, if they

> have the same mode. which matches the logic used in bits/types.h.  (Also,

> the mangling of those types is verified in c++-types.data so the testsuite

> would show if it changed.)


Also also, bits/stdint-uintn.h uses __uintN_t to define uintN_t, and
u_intN_t and uintN_t were formerly used interchangeably throughout
inet/ (before I cleaned all of that up), so if there was a mismatch we
would have been getting incompatible type errors all over the place.

zw
Andreas Schwab Feb. 21, 2019, 9:53 a.m. | #8
On Feb 20 2019, Zack Weinberg <zackw@panix.com> wrote:

> On Wed, Feb 20, 2019 at 11:51 AM Andreas Schwab <schwab@suse.de> wrote:

>> On Feb 18 2019, Zack Weinberg <zackw@panix.com> wrote:

>>

>> >         * bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.

>> >         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.

>> >         Update commentary.

>> >         * sys/types.h (__u_intN_t): Remove.

>>

>> These files are both in posix/.

>

> I understand what you are trying to communicate here, but someone with

> less experience would not necessarily realize that you were pointing

> out that these change log entries refer to the files by the wrong

> pathnames.  Would it kill you to be a little more specific?


That's exactly what I did.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Zack Weinberg Feb. 21, 2019, 11:38 a.m. | #9
On Thu, Feb 21, 2019 at 4:53 AM Andreas Schwab <schwab@suse.de> wrote:
> On Feb 20 2019, Zack Weinberg <zackw@panix.com> wrote:

> > On Wed, Feb 20, 2019 at 11:51 AM Andreas Schwab <schwab@suse.de> wrote:

> >> On Feb 18 2019, Zack Weinberg <zackw@panix.com> wrote:

> >>

> >> >         * bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.

> >> >         (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.

> >> >         Update commentary.

> >> >         * sys/types.h (__u_intN_t): Remove.

> >>

> >> These files are both in posix/.

> >

> > I understand what you are trying to communicate here, but someone with

> > less experience would not necessarily realize that you were pointing

> > out that these change log entries refer to the files by the wrong

> > pathnames.  Would it kill you to be a little more specific?

>

> That's exactly what I did.


Are you being deliberately obtuse?

zw
Andreas Schwab Feb. 21, 2019, 11:49 a.m. | #10
Stop your harassment.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Zack Weinberg Feb. 21, 2019, 12:13 p.m. | #11
On Thu, Feb 21, 2019 at 6:49 AM Andreas Schwab <schwab@suse.de> wrote:
> Stop your harassment.


???
Carlos O'Donell Feb. 21, 2019, 1:31 p.m. | #12
On 2/21/19 6:49 AM, Andreas Schwab wrote:
> Stop your harassment.


I suggest that each of you, Andreas, and Zack, assume that the
opposite party has only the best intentions and that they are
trying to the best of their abilities to support you.

Andreas, I thank you for your review.

Zack, I thank you for your patch.

Zack, If you want something specific from Andreas, please ask
for it directly and in clear and plain language.

-- 
Cheers,
Carlos.
Zack Weinberg Feb. 26, 2019, 4:01 p.m. | #13
On Thu, Feb 21, 2019 at 7:13 AM Zack Weinberg <zackw@panix.com> wrote:
> On Thu, Feb 21, 2019 at 6:49 AM Andreas Schwab <schwab@suse.de> wrote:

> > Stop your harassment.

>

> ???


I want to publicly apologize for this message.  It was not an
appropriate reaction to a request to leave someone alone and I regret
posting it.

zw

Patch

diff --git a/Rules b/Rules
index e08a28d9f3..222dba6dcb 100644
--- a/Rules
+++ b/Rules
@@ -82,7 +82,8 @@  $(common-objpfx)dummy.c:
 common-generated += dummy.o dummy.c
 
 ifneq "$(headers)" ""
-# Special test of all the installed headers in this directory.
+# Test that all of the headers installed by this directory can be compiled
+# in isolation.
 tests-special += $(objpfx)check-installed-headers-c.out
 libof-check-installed-headers-c := testsuite
 $(objpfx)check-installed-headers-c.out: \
@@ -93,6 +94,8 @@  $(objpfx)check-installed-headers-c.out: \
 	$(evaluate-test)
 
 ifneq "$(CXX)" ""
+# If a C++ compiler is available, also test that they can be compiled
+# in isolation as C++.
 tests-special += $(objpfx)check-installed-headers-cxx.out
 libof-check-installed-headers-cxx := testsuite
 $(objpfx)check-installed-headers-cxx.out: \
@@ -103,12 +106,24 @@  $(objpfx)check-installed-headers-cxx.out: \
 	$(evaluate-test)
 endif # $(CXX)
 
+# Test that a wrapper header exists in include/ for each non-sysdeps header.
+# This script does not need $(py-env).
 tests-special += $(objpfx)check-wrapper-headers.out
 $(objpfx)check-wrapper-headers.out: \
   $(..)scripts/check-wrapper-headers.py $(headers)
 	$(PYTHON) $< --root=$(..) --subdir=$(subdir) $(headers) > $@; \
 	  $(evaluate-test)
 
+# Test that none of the headers installed by this directory use certain
+# obsolete constructs (e.g. legacy BSD typedefs superseded by stdint.h).
+# This script does not need $(py-env).
+tests-special += $(objpfx)check-obsolete-constructs.out
+libof-check-obsolete-constructs := testsuite
+$(objpfx)check-obsolete-constructs.out: \
+    $(..)scripts/check-obsolete-constructs.py $(headers)
+	$(PYTHON) $^ > $@ 2>&1; \
+	$(evaluate-test)
+
 endif # $(headers)
 
 # This makes all the auxiliary and test programs.
diff --git a/posix/bits/types.h b/posix/bits/types.h
index 27e065c3be..0de6c59bb4 100644
--- a/posix/bits/types.h
+++ b/posix/bits/types.h
@@ -87,7 +87,7 @@  __extension__ typedef unsigned long long int __uintmax_t;
 	32		-- "natural" 32-bit type (always int)
 	64		-- "natural" 64-bit type (long or long long)
 	LONG32		-- 32-bit type, traditionally long
-	QUAD		-- 64-bit type, always long long
+	QUAD		-- 64-bit type, traditionally long long
 	WORD		-- natural type of __WORDSIZE bits (int or long)
 	LONGWORD	-- type of __WORDSIZE bits, traditionally long
 
@@ -113,14 +113,14 @@  __extension__ typedef unsigned long long int __uintmax_t;
 #define __SLONGWORD_TYPE	long int
 #define __ULONGWORD_TYPE	unsigned long int
 #if __WORDSIZE == 32
-# define __SQUAD_TYPE		__quad_t
-# define __UQUAD_TYPE		__u_quad_t
+# define __SQUAD_TYPE		__int64_t
+# define __UQUAD_TYPE		__uint64_t
 # define __SWORD_TYPE		int
 # define __UWORD_TYPE		unsigned int
 # define __SLONG32_TYPE		long int
 # define __ULONG32_TYPE		unsigned long int
-# define __S64_TYPE		__quad_t
-# define __U64_TYPE		__u_quad_t
+# define __S64_TYPE		__int64_t
+# define __U64_TYPE		__uint64_t
 /* We want __extension__ before typedef's that use nonstandard base types
    such as `long long' in C89 mode.  */
 # define __STD_TYPE		__extension__ typedef
diff --git a/posix/sys/types.h b/posix/sys/types.h
index 27129c5c23..0e37b1ce6a 100644
--- a/posix/sys/types.h
+++ b/posix/sys/types.h
@@ -154,37 +154,20 @@  typedef unsigned int uint;
 
 #include <bits/stdint-intn.h>
 
-#if !__GNUC_PREREQ (2, 7)
-
 /* These were defined by ISO C without the first `_'.  */
-typedef	unsigned char u_int8_t;
-typedef	unsigned short int u_int16_t;
-typedef	unsigned int u_int32_t;
-# if __WORDSIZE == 64
-typedef unsigned long int u_int64_t;
-# else
-__extension__ typedef unsigned long long int u_int64_t;
-# endif
-
-typedef int register_t;
-
-#else
-
-/* For GCC 2.7 and later, we can use specific type-size attributes.  */
-# define __u_intN_t(N, MODE) \
-  typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE)))
-
-__u_intN_t (8, __QI__);
-__u_intN_t (16, __HI__);
-__u_intN_t (32, __SI__);
-__u_intN_t (64, __DI__);
+typedef __uint8_t u_int8_t;
+typedef __uint16_t u_int16_t;
+typedef __uint32_t u_int32_t;
+typedef __uint64_t u_int64_t;
 
+#if __GNUC_PREREQ (2, 7)
 typedef int register_t __attribute__ ((__mode__ (__word__)));
-
+#else
+typedef int register_t;
+#endif
 
 /* Some code from BIND tests this macro to see if the types above are
    defined.  */
-#endif
 #define __BIT_TYPES_DEFINED__	1
 
 
diff --git a/scripts/check-installed-headers.sh b/scripts/check-installed-headers.sh
index 8e7beffd82..63bc8d4fa6 100644
--- a/scripts/check-installed-headers.sh
+++ b/scripts/check-installed-headers.sh
@@ -16,11 +16,9 @@ 
 # License along with the GNU C Library; if not, see
 # <http://www.gnu.org/licenses/>.
 
-# Check installed headers for cleanliness.  For each header, confirm
-# that it's possible to compile a file that includes that header and
-# does nothing else, in several different compilation modes.  Also,
-# scan the header for a set of obsolete typedefs that should no longer
-# appear.
+# For each installed header, confirm that it's possible to compile a
+# file that includes that header and does nothing else, in several
+# different compilation modes.
 
 # These compilation switches assume GCC or compatible, which is probably
 # fine since we also assume that when _building_ glibc.
@@ -31,13 +29,6 @@  cxx_modes="-std=c++98 -std=gnu++98 -std=c++11 -std=gnu++11"
 # These are probably the most commonly used three.
 lib_modes="-D_DEFAULT_SOURCE=1 -D_GNU_SOURCE=1 -D_XOPEN_SOURCE=700"
 
-# sys/types.h+bits/types.h have to define the obsolete types.
-# rpc(svc)/* have the obsolete types too deeply embedded in their API
-# to remove.
-skip_obsolete_type_check='*/sys/types.h|*/bits/types.h|*/rpc/*|*/rpcsvc/*'
-obsolete_type_re=\
-'\<((__)?(quad_t|u(short|int|long|_(char|short|int([0-9]+_t)?|long|quad_t))))\>'
-
 if [ $# -lt 3 ]; then
     echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
     exit 2
@@ -46,14 +37,10 @@  case "$1" in
     (c)
         lang_modes="$c_modes"
         cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.c)
-        already="$skip_obsolete_type_check"
     ;;
     (c++)
         lang_modes="$cxx_modes"
         cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.cc)
-        # The obsolete-type check can be skipped for C++; it is
-        # sufficient to do it for C.
-        already="*"
     ;;
     (*)
         echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
@@ -151,22 +138,8 @@  $expanded_lib_mode
 int avoid_empty_translation_unit;
 EOF
             if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1
-            then
-                includes=$($cc_cmd -fsyntax-only -H $lang_mode \
-                              "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')
-                for h in $includes; do
-                    # Don't repeat work.
-                    eval 'case "$h" in ('"$already"') continue;; esac'
-
-                    if grep -qE "$obsolete_type_re" "$h"; then
-                        echo "*** Obsolete types detected:"
-                        grep -HE "$obsolete_type_re" "$h"
-                        failed=1
-                    fi
-                    already="$already|$h"
-                done
-            else
-                failed=1
+            then :
+            else failed=1
             fi
         done
     done
diff --git a/scripts/check-obsolete-constructs.py b/scripts/check-obsolete-constructs.py
new file mode 100755
index 0000000000..a0af229dc8
--- /dev/null
+++ b/scripts/check-obsolete-constructs.py
@@ -0,0 +1,225 @@ 
+#! /usr/bin/python3
+# Copyright (C) 2019 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+"""Verifies that installed headers do not use any obsolete constructs:
+ * legacy BSD typedefs superseded by <stdint.h>:
+   ushort uint ulong u_short u_int u_long u_intNN_t quad_t u_quad_t
+   (sys/types.h is allowed to _define_ these types, but not to use them
+    to define anything else).
+"""
+
+import argparse
+import collections
+import os
+import re
+import sys
+
+# Simplified lexical analyzer for C preprocessing tokens.
+# Does not implement trigraphs.
+# Does not implement backslash-newline in the middle of any lexical
+#   item other than a string literal.
+# Does not implement universal-character-names in identifiers.
+# Does not implement header-names.
+# Treats prefixed strings (e.g. L"...") as two tokens (L and "...")
+# Accepts non-ASCII characters only within comments and strings.
+
+# Caution: The order of the outermost alternation matters.
+# STRING must be before BAD_STRING, CHARCONST before BAD_CHARCONST,
+# BLOCK_COMMENT before BAD_BLOCK_COM before PUNCTUATOR, and OTHER must
+# be last.
+# Caution: There should be no capturing groups other than the named
+# captures in the outermost alternation.
+
+# For reference, these are all of the C punctuators as of C11:
+#   [ ] ( ) { } , ; ? ~
+#   ! != * *= / /= ^ ^= = ==
+#   # ##
+#   % %= %> %: %:%:
+#   & &= &&
+#   | |= ||
+#   + += ++
+#   - -= -- ->
+#   . ...
+#   : :>
+#   < <% <: << <<= <=
+#   > >= >> >>=
+
+# The BAD_* tokens are not part of the official definition of pp-tokens;
+# they match unclosed strings, character constants, and block comments,
+# so that the regex engine doesn't have to backtrack all the way to the
+# beginning of a broken construct and then emit dozens of junk tokens.
+
+PP_TOKEN_RE_ = re.compile(r'''
+    (?P<STRING>        \"(?:[^\"\\\r\n]|\\(?:[\r\n -~]|\r\n))*\")
+   |(?P<BAD_STRING>    \"(?:[^\"\\\r\n]|\\[ -~])*)
+   |(?P<CHARCONST>     \'(?:[^\'\\\r\n]|\\(?:[\r\n -~]|\r\n))*\')
+   |(?P<BAD_CHARCONST> \'(?:[^\'\\\r\n]|\\[ -~])*)
+   |(?P<BLOCK_COMMENT> /\*(?:\*(?!/)|[^*])*\*/)
+   |(?P<BAD_BLOCK_COM> /\*(?:\*(?!/)|[^*])*\*?)
+   |(?P<LINE_COMMENT>  //[^\r\n]*)
+   |(?P<IDENT>         [_a-zA-Z][_a-zA-Z0-9]*)
+   |(?P<PP_NUMBER>     \.?[0-9](?:[0-9a-df-oq-zA-DF-OQ-Z_.]|[eEpP][+-]?)*)
+   |(?P<PUNCTUATOR>
+       [,;?~(){}\[\]]
+     | [!*/^=]=?
+     | \#\#?
+     | %(?:[=>]|:(?:%:)?)?
+     | &[=&]?
+     |\|[=|]?
+     |\+[=+]?
+     | -[=->]?
+     |\.(?:\.\.)?
+     | :>?
+     | <(?:[%:]|<(?:=|<=?)?)?
+     | >(?:=|>=?)?)
+   |(?P<ESCNL>         \\(?=[\r\n]))
+   |(?P<WHITESPACE>    [ \t\n\r\v\f]+)
+   |(?P<OTHER>         .)
+''', re.DOTALL | re.VERBOSE)
+ENDLINE_RE_ = re.compile(r'\r|\n|\r\n')
+
+# based on the sample code in the Python re documentation
+Token_ = collections.namedtuple('Token', (
+    'kind', 'text', 'line', 'column'))
+def tokenize_c(file_contents):
+    Token = Token_
+    PP_TOKEN_RE = PP_TOKEN_RE_
+    ENDLINE_RE = ENDLINE_RE_
+
+    line_num = 1
+    line_start = 0
+    for mo in PP_TOKEN_RE.finditer(file_contents):
+        kind = mo.lastgroup
+        text = mo.group()
+        line = line_num
+        column = mo.start() - line_start
+        # only these kinds can contain a newline
+        if kind in ('WHITESPACE', 'BLOCK_COMMENT', 'LINE_COMMENT',
+                    'STRING', 'CHARCONST', 'BAD_BLOCK_COM'):
+            adj_line_start = 0
+            for tmo in ENDLINE_RE.finditer(text):
+                line_num += 1
+                adj_line_start = tmo.end()
+            if adj_line_start:
+                line_start = mo.start() + adj_line_start
+        yield Token(kind, text, line, column + 1)
+
+
+class Checker:
+    def __init__(self):
+        self.status = 0
+
+    def error(self, fname, tok, message):
+        self.status = 1
+        if '{!r}' in message:
+            message = message.format(tok.text)
+        sys.stderr.write("{}:{}:{}: error: {}\n".format(
+            fname, tok.line, tok.column, message))
+
+    # The obsolete type names we're looking for:
+    OBSOLETE_TYPE_RE_ = re.compile(r'''\A
+      (__)?
+      (?: quad_t
+        | u(?: short | int | long
+             | _(?: char | short | int([0-9]+_t)? | long | quad_t )))
+    \Z''', re.VERBOSE)
+
+    def check_file(self, fname):
+        OBSOLETE_TYPE_RE = self.OBSOLETE_TYPE_RE_
+
+        try:
+            with open(fname, "rt") as fp:
+                contents = fp.read()
+        except OSError as e:
+            sys.stderr.write("{}: {}\n".format(fname, e.strerror))
+            self.status = 1
+            return
+
+        # The obsolete rpc/ and rpcsvc/ headers are allowed to use the
+        # obsolete types, because it would be more trouble than it's
+        # worth to remove them from headers that we intend to stop
+        # installing eventually anyway.
+        uses_ok = (fname.startswith("rpc/") or
+                   fname.startswith("rpcsvc/") or
+                   "/rpc/" in fname or
+                   "/rpcsvc/" in fname)
+
+        # sys/types.h and bits/types.h are allowed to define the
+        # obsolete types, and they're allowed to use the __-versions
+        # of the obsolete types to define the unprefixed versions of
+        # the obsolete types.
+        typedefs_ok = (fname == "sys/types.h" or
+                       fname == "bits/types.h" or
+                       fname.endswith("/sys/types.h") or
+                       fname.endswith("/bits/types.h"))
+        in_typedef = False
+        delayed_error_token = None
+
+        for tok in tokenize_c(contents):
+            kind = tok.kind
+            if kind == 'BAD_STRING':
+                self.error(fname, tok, "unclosed string")
+            elif kind == 'BAD_CHARCONST':
+                self.error(fname, tok, "unclosed char constant")
+            elif kind == 'BAD_BLOCK_COM':
+                self.error(fname, tok, "unclosed block comment")
+            elif kind == 'OTHER':
+                self.error(fname, tok, "stray {!r} in program")
+
+            elif kind == 'IDENT':
+                if tok.text == 'typedef':
+                    in_typedef = True
+                    continue
+
+                mo = OBSOLETE_TYPE_RE.match(tok.text)
+                if mo:
+                    if uses_ok:
+                        continue
+
+                    if typedefs_ok and in_typedef:
+                        # inside a typedef, the underscore versions
+                        # are allowed unconditionally, and the
+                        # non-underscore versions are allowed as the
+                        # final identifier (the one being defined).
+                        if mo.group(1) != '__':
+                            delayed_error_token = tok
+                        continue
+
+                    self.error(fname, tok, "use of {!r}")
+
+            elif kind == 'PUNCTUATOR':
+                if in_typedef and tok.text == ';':
+                    delayed_error_token = None
+                    in_typedef = False
+
+            if delayed_error_token is not None:
+                self.error(fname, delayed_error_token, "use of {!r}")
+                delayed_error_token = None
+
+def main():
+    ap = argparse.ArgumentParser(description=__doc__)
+    ap.add_argument("headers", metavar="header", nargs="+",
+                    help="one or more headers to scan for obsolete constructs")
+    args = ap.parse_args()
+
+    checker = Checker()
+    for fname in args.headers:
+        checker.check_file(fname)
+    sys.exit(checker.status)
+
+main()