[v2,0/8] Non-contiguous address range support

Message ID 20180813165002.29ddb8dd@pinnacle.lan
Headers show
Series
  • Non-contiguous address range support
Related show

Message

Kevin Buettner Aug. 13, 2018, 11:50 p.m.
This is version 2 of an eight part patch series which adds further
support for non-contiguous address ranges to GDB.

In this v2 series, I've addressed the concerns from Simon Marchi's
review of the earlier patch set.  I've also changed my mind about how
return values *ADDRESS and *ENDADDR ought to be set for
find_pc_partial_function.  I'll discuss this matter in the remarks
preceding the relevant patches.

Everything below this point was copy/pasted from the introductory
message for the v1 patch set...

This sequence of patches was motivated by GCC bug 84550:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84550

There is a test case posted to that bug along with some analysis of
the underlying problem.

There is also a GDB bug for the same issue; it's 23021, but at the
moment, there is little there aside from a link to the GCC bug
mentioned above.  But here's a link anyway:

https://sourceware.org/bugzilla/show_bug.cgi?id=23021

A quick synopsis of the problem is as follows...

Recent versions of gcc can generate code in which a function is split
into at least two non-contiguous address ranges.  As I understand it,
the idea here is to separate code which gcc does not expect to execute
in normal operation from the rest of the code.  Doing this may result
in better cache locality for the normal case.  The generated code for
the example in GCC bug 84550 separated a call to abort() from the rest
of the code comprising the function.

In the course of my investigation, I identified at least four
problems:

1) Stepping into a function from a function which occupies non-contiguous
   address ranges does not always work.  It was not uncommon to see the
   program run to completion when attempting to do a step.

2) Setting a breakpoint on a function with non-contiguous address ranges
   causes a breakpoint to be placed on more than one location.  When a
   breakpoint is set on the "cold" address range, this is almost certainly
   incorrect.  The breakpoint should instead be set only on code near the
   entry point(s).

3) The disassemble command did not work correctly.  E.g. here is what I
   found during my analysis of 84550:

	(gdb) x/i 'main.cold.0'
	   0x4010e0 <main()>:   mov    %rax,%rdi
	(gdb) x/i main
	   0x4011a0 <main>:     push   %r12
	(gdb) disassemble main
	Dump of assembler code for function main():
	   0x00000000004010e0 <+0>:     mov    %rax,%rdi
	   ...
        [No addresses starting at 0x4011a0 are shown]

4) Display of addresses associated with the non-contiguous function are
   confusing.  E.g. in the above example, note that GDB thinks that
   the address associated with main.cold.0 is <main()>, but that there's
   also a minsym called main which is displayed as <main>.

There are probably several other problems which are related those
identified above.

I discovered that the stepping problem could be "fixed" by disabling
the find_pc_partial_function cache.  This cache keeps track of the
most recent result (of calling find_pc_partial_function).  If
find_pc_partial_function is called with an address which falls within
the cache range, then that's considered to be a cache hit and the most
recent result is returned.  Obviously, this won't work correctly for
functions which occupy non-contiguous (disjoint) address ranges where
other functions might be placed in the gap.

So one of the problems that needed to be solved was to make the
caching code work correctly.  It is interesting to note that stepping
_did_ work when the cache was disabled.  This is/was due to GDB
already having some (albeit incomplete) support for non-contiguous
addresses in the form of blockvector address maps.  Code responsible
for mapping addresses to blocks (which form the lower levels of
find_pc_partial_function) handle this case correctly.

To solve the problem of incorrect disassembly, we need to be able
to iterate over all of the ranges associated with a block.

Finally, we need to distinguish between the entry pc and the lowest
address in a block.  I discovered that this lack of distinction was
the cause of the remainder of the bugs including some which seemed to
be introduced by fixing the problems noted above.  Once this
distinction is made, it will be straightforward to add full support for
DW_AT_entry_pc.  I considered adding this support as part of this
patch series, but decided to wait until the community weighs in on my
work thus far...

Comments

Simon Marchi Aug. 19, 2018, 1:35 p.m. | #1
On 2018-08-13 7:50 p.m., Kevin Buettner wrote:
> This is version 2 of an eight part patch series which adds further

> support for non-contiguous address ranges to GDB.

> 

> In this v2 series, I've addressed the concerns from Simon Marchi's

> review of the earlier patch set.  I've also changed my mind about how

> return values *ADDRESS and *ENDADDR ought to be set for

> find_pc_partial_function.  I'll discuss this matter in the remarks

> preceding the relevant patches.

> 

> Everything below this point was copy/pasted from the introductory

> message for the v1 patch set...

> 

> This sequence of patches was motivated by GCC bug 84550:

> 

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84550

> 

> There is a test case posted to that bug along with some analysis of

> the underlying problem.

> 

> There is also a GDB bug for the same issue; it's 23021, but at the

> moment, there is little there aside from a link to the GCC bug

> mentioned above.  But here's a link anyway:

> 

> https://sourceware.org/bugzilla/show_bug.cgi?id=23021

> 

> A quick synopsis of the problem is as follows...

> 

> Recent versions of gcc can generate code in which a function is split

> into at least two non-contiguous address ranges.  As I understand it,

> the idea here is to separate code which gcc does not expect to execute

> in normal operation from the rest of the code.  Doing this may result

> in better cache locality for the normal case.  The generated code for

> the example in GCC bug 84550 separated a call to abort() from the rest

> of the code comprising the function.

> 

> In the course of my investigation, I identified at least four

> problems:

> 

> 1) Stepping into a function from a function which occupies non-contiguous

>    address ranges does not always work.  It was not uncommon to see the

>    program run to completion when attempting to do a step.

> 

> 2) Setting a breakpoint on a function with non-contiguous address ranges

>    causes a breakpoint to be placed on more than one location.  When a

>    breakpoint is set on the "cold" address range, this is almost certainly

>    incorrect.  The breakpoint should instead be set only on code near the

>    entry point(s).

> 

> 3) The disassemble command did not work correctly.  E.g. here is what I

>    found during my analysis of 84550:

> 

> 	(gdb) x/i 'main.cold.0'

> 	   0x4010e0 <main()>:   mov    %rax,%rdi

> 	(gdb) x/i main

> 	   0x4011a0 <main>:     push   %r12

> 	(gdb) disassemble main

> 	Dump of assembler code for function main():

> 	   0x00000000004010e0 <+0>:     mov    %rax,%rdi

> 	   ...

>         [No addresses starting at 0x4011a0 are shown]

> 

> 4) Display of addresses associated with the non-contiguous function are

>    confusing.  E.g. in the above example, note that GDB thinks that

>    the address associated with main.cold.0 is <main()>, but that there's

>    also a minsym called main which is displayed as <main>.

> 

> There are probably several other problems which are related those

> identified above.

> 

> I discovered that the stepping problem could be "fixed" by disabling

> the find_pc_partial_function cache.  This cache keeps track of the

> most recent result (of calling find_pc_partial_function).  If

> find_pc_partial_function is called with an address which falls within

> the cache range, then that's considered to be a cache hit and the most

> recent result is returned.  Obviously, this won't work correctly for

> functions which occupy non-contiguous (disjoint) address ranges where

> other functions might be placed in the gap.

> 

> So one of the problems that needed to be solved was to make the

> caching code work correctly.  It is interesting to note that stepping

> _did_ work when the cache was disabled.  This is/was due to GDB

> already having some (albeit incomplete) support for non-contiguous

> addresses in the form of blockvector address maps.  Code responsible

> for mapping addresses to blocks (which form the lower levels of

> find_pc_partial_function) handle this case correctly.

> 

> To solve the problem of incorrect disassembly, we need to be able

> to iterate over all of the ranges associated with a block.

> 

> Finally, we need to distinguish between the entry pc and the lowest

> address in a block.  I discovered that this lack of distinction was

> the cause of the remainder of the bugs including some which seemed to

> be introduced by fixing the problems noted above.  Once this

> distinction is made, it will be straightforward to add full support for

> DW_AT_entry_pc.  I considered adding this support as part of this

> patch series, but decided to wait until the community weighs in on my

> work thus far...

> 


Hi Kevin,

The patchset seems quite outdated (based on an old commit).  Could you
send a rebased version?

Thanks,

Simon
Kevin Buettner Aug. 20, 2018, 10:53 p.m. | #2
> The patchset seems quite outdated (based on an old commit).  Could you

> send a rebased version?


Done.  The v3 0/8 patch is here:

https://sourceware.org/ml/gdb-patches/2018-08/msg00467.html

(I thought that I had rebased v2 prior to posting, but perhaps I hadn't...)

Kevin