[v2,00/10] Undefined Behavior Sanitizer, this time with docs

Message ID 20181002044420.17628-1-tom@tromey.com
Headers show
Series
  • Undefined Behavior Sanitizer, this time with docs
Related show

Message

Tom Tromey Oct. 2, 2018, 4:44 a.m.
This is a new version of the series to add -fsanitize=undefined to the
build.

It's only added to gdb, though it occurred to me later that it would
probably be better to add it to all the libraries as well.

This version addresses the review comments, and in particular adds
documentation in patch #10 about performance.  It also fixes a bug
observed on the S390 builds in patch #2.

Regression tested by the buildbot.

Tom

Comments

Pedro Alves Oct. 3, 2018, 5:57 p.m. | #1
On 10/02/2018 05:44 AM, Tom Tromey wrote:
> This is a new version of the series to add -fsanitize=undefined to the

> build.

> 

> It's only added to gdb, though it occurred to me later that it would

> probably be better to add it to all the libraries as well.


Yeah.  I'm fine with starting with gdb only first, though.

Thanks,
Pedro Alves
Tom Tromey Oct. 3, 2018, 9:09 p.m. | #2
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:


Pedro> On 10/02/2018 05:44 AM, Tom Tromey wrote:
>> This is a new version of the series to add -fsanitize=undefined to the

>> build.

>> 

>> It's only added to gdb, though it occurred to me later that it would

>> probably be better to add it to all the libraries as well.


Pedro> Yeah.  I'm fine with starting with gdb only first, though.

Thanks everyone for the reviews.
I'm going to check this in now.

Tom
John Baldwin Oct. 8, 2018, 7:14 p.m. | #3
On 10/1/18 9:44 PM, Tom Tromey wrote:
> This is a new version of the series to add -fsanitize=undefined to the

> build.

> 

> It's only added to gdb, though it occurred to me later that it would

> probably be better to add it to all the libraries as well.

> 

> This version addresses the review comments, and in particular adds

> documentation in patch #10 about performance.  It also fixes a bug

> observed on the S390 builds in patch #2.

> 

> Regression tested by the buildbot.


FWIW, I built GDB master today and ubsan (from LLVM, not GCC) flagged a
use of obstack_blank_fast() in minsyms.c with a negative offset (used to
shrink an obstack) when trying to do 'start' on /bin/ls:

(gdb) start
Temporary breakpoint 1 at 0x402674: file /usr/src/bin/ls/ls.c, line 161.
Starting program: /bin/ls 
../../gdb/minsyms.c:1378:7: runtime error: addition of unsigned offset to 0x00080907cd10 overflowed to 0x00080907cc38

This corresponds to the invocation of obstack_blank_fast here:

      /* Compact out any duplicates, and free up whatever space we are
         no longer using.  */

      mcount = compact_minimal_symbols (msymbols, mcount, m_objfile);

      obstack_blank_fast (&m_objfile->per_bfd->storage_obstack,
	       (mcount + 1 - alloc_count) * sizeof (struct minimal_symbol));

The case that triggered the failure for me had these values that resulted
in a negative offset:

(top-gdb) p mcount
$23 = 5740
(top-gdb) p alloc_count
$24 = 5744
...
(top-gdb) p mcount + 1 - alloc_count
$26 = -3

I guess since sizeof's return type is size_t, then the promotion rules on
LP64 mean that the resulting value is unsigned?  Anyway, we might consider
making ubsan only default to on for GCC for now?

-- 
John Baldwin
Joel Brobecker Oct. 8, 2018, 8:22 p.m. | #4
> FWIW, I built GDB master today and ubsan (from LLVM, not GCC) flagged a

> use of obstack_blank_fast() in minsyms.c with a negative offset (used to

> shrink an obstack) when trying to do 'start' on /bin/ls:


On my end, I am working on a resync of AdaCore's "head" version,
which would include this change as well. My testing shows that
it revealed some issues; I haven't had a chance to look into them yet
(next on my list today, hopefully), but this is making me think
there might be bona fide issues. Getting the location where things
are happening is pretty nice :).

-- 
Joel
Pedro Alves Oct. 9, 2018, 10:44 a.m. | #5
On 10/08/2018 09:22 PM, Joel Brobecker wrote:
>> FWIW, I built GDB master today and ubsan (from LLVM, not GCC) flagged a

>> use of obstack_blank_fast() in minsyms.c with a negative offset (used to

>> shrink an obstack) when trying to do 'start' on /bin/ls:

> 

> On my end, I am working on a resync of AdaCore's "head" version,

> which would include this change as well. My testing shows that

> it revealed some issues; I haven't had a chance to look into them yet

> (next on my list today, hopefully), but this is making me think

> there might be bona fide issues. Getting the location where things

> are happening is pretty nice :).


This also revealed PR 23743:
  https://sourceware.org/bugzilla/show_bug.cgi?id=23743

I think the side-effects are blocking/disturbing enough (GDB dies) that I
think the best course of action is to temporarily disable the ubsan by
default until these issues are fixed.  Then try re-enabling again, rinse/repeat.
Otherwise, as is, this is sort of imposing that the community prioritizes
fixing these ubsan-exposed bugs, which I think is unreasonable.

Thanks,
Pedro Alves
Joel Brobecker Oct. 12, 2018, 9:07 p.m. | #6
> This also revealed PR 23743:

>   https://sourceware.org/bugzilla/show_bug.cgi?id=23743


For the record, I've also discovered something I didn't realize
about struct memory allocations, which is a bit disturbing, because
I think we have a lot of areas where we malloc struct objects...
I reported it under https://sourceware.org/bugzilla/show_bug.cgi?id=23768

For now, it seems like the lessons learned is that it seems we should
be very careful when using XNEW/XCNEW with struct types?

----------------------------------------------------------------------

on ppc-linux, a GDB built with --enable-ubsan=yes crashes immediately as soon as one tries to enter anything at the prompt:

$ /path/to/gdb
GNU gdb (GDB) 8.2.50.20181012-git (with AdaCore local changes)
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
See your support agreement for details of warranty and support.
If you do not have a current support agreement, then there is absolutely
no warranty for this version of GDB.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc-generic-linux-gnu".
Type "show configuration" for configuration details.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) /homes/brobecke/act/gdb/gdb-head/gdb/common/common-exceptions.c:84:26: runtime error: member access within misaligned address 0x124af518 for type 'struct catcher', which requires 16 byte alignment

The code that triggers the error is fairly straightforward:

| jmp_buf *
| exceptions_state_mc_init (void)
| {
|   struct catcher *new_catcher = XCNEW (struct catcher);
|
|   /* Start with no exception.  */
|   new_catcher->exception = exception_none;

The problem happens because "struct catcher" is a structure which
requires a 16 bytes alignment, due to one of its members (the jmpbuf)
having itself a 16bytes alignment:

From bits/setjmp.h on ppc-linux:

| /* The current powerpc 32-bit Altivec ABI specifies for SVR4 ABI and EABI
|    the vrsave must be at byte 248 & v20 at byte 256.  So we must pad this
|    correctly on 32 bit.  It also insists that vecregs are only gauranteed
|    4 byte alignment so we need to use vperm in the setjmp/longjmp routines.
|    We have to version the code because members like  int __mask_was_saved
|    in the jmp_buf will move as jmp_buf is now larger than 248 bytes.  We
|    cannot keep the altivec jmp_buf backward compatible with the jmp_buf.  */
| #ifndef _ASM
| # if __WORDSIZE == 64
| typedef long int __jmp_buf[64] __attribute__ ((__aligned__ (16)));
| # else
| /* The alignment is not essential, i.e.the buffer can be copied to a 4 byte
|    aligned buffer as per the ABI it is just added for performance reasons.  */
| typedef long int __jmp_buf[64 + (12 * 4)] __attribute__ ((__aligned__ (16)));
| # endif

XCNEW performs calls to malloc without worrying about alignment,
so there is indeed no guaranty that the returned address is
aligned on 16 bytes. We looked at the GNU libc documentation,
for instance, and they guaranty 8 bytes only on 32bit machines

| https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html
|
| 3.2.3.6 Allocating Aligned Memory Blocks
|
| The address of a block returned by malloc or realloc in GNU systems is
| always a multiple of eight (or sixteen on 64-bit systems). If you need a
| block whose address is a multiple of a higher power of two than that,
| use aligned_alloc or posix_memalign. aligned_alloc and posix_memalign
| are declared in stdlib.h.

We can certainly start by plugging the hole in this particular case,
and swap the call to XCNEW with a call to an aligned malloc. But
this made me realize we have a general issue, because potentially,
all struct allocations are potentially problematic, if they happen
to have alignment requirements that are stronger than what malloc
itself follows. I hope that, in practice, aligment requirements
beyond 4 or 8 bytes are rare.

Perhaps one possible idea was to have XNEW/XCNEW et all take the
alignment into account when performing the memory allocation
(we pass the type, so it could use alignof); however, that is
a major change affecting everyone. And looking closer at those
macros, one notices the following important comment...

    /* Scalar allocators.  */

... which tells me the macros may not have been meant to be used
in the context of allocating structures.

It's unclear to me what we would want to do...

-- 
Joel