[1/8] Add block range data structure for blocks with non-contiguous address ranges

Message ID 20180625234210.1dbf3a9a@pinnacle.lan
State New
Headers show
Series
  • Non-contiguous address range support
Related show

Commit Message

Kevin Buettner June 26, 2018, 6:42 a.m.
This patch does the following:

- Introduces a block range data structure which is accessed via
  a new field in struct block.
- Defines several macros for accessing block ranges.
- Defines a new function, make_blockrange, which is responsible for
  creating the new data structure.
- Defines a predicate (function) which checks to see if an address
  is present in a block.

It should be noted that some support for non-contiguous ranges already
existed in GDB in the form of blockvector addrmaps.  This support
allowed GDB to quickly find a block containing a particular address
even when the block consists of non-contiguous addresses.  See
find_block_in_blockvector() in block.c, dwarf2_record_block_ranges()
in dwarf2read.c, and record_block_range() in buildsym.c.

Addrmaps do not provide a convenient way to examine address ranges
associated with a particular block.  This data structure (and its
interface) is set up for quickly finding the value (which in this case
is a block) associated with a particular address.  The interface
does not include a method for doing a reverse mapping from blocks to
addresses.  A linear time mapping might be attempted via use of the
addrmap's foreach method, but this is not as straightforward as it
might first appear due to the fact that blocks corresponding to inline
function instances and lexical blocks w/ variables end up getting
interspersed in in the set of transitions.

Note:  If this approach is deemed to be too expensive in terms of
space, an alternate approach might be to attempt the linear time
mapping noted above.  find_pc_partial_function() needs to be able to
quickly know whether there are discontiguous ranges (for determining
cache validity), so a flag for this property would have to be added to
struct block.  Also integral to this set of changes is the concept of
an "entry pc" which might be different than the block's start address.
An entry_pc field would also need to be added to struct block.  This
does not result in any space savings in struct block though since the
space for the flag and entry_pc use more space than the blockranges
struct pointer that I've added.  There would, however, be some space
savings due to the fact that the new data structures that I've added
for this patch would not need to be allocated.  (I happen to like the
approach I've come up with, but I wanted to mention another possibility
just in case someone does not.)

gdb/ChangeLog:
    
    	* block.h (blockrange, blockranges): New struct declarations.
    	(struct block): Add new field named `ranges'.
    	(BLOCK_RANGES, BLOCK_NRANGES, BLOCK_RANGE, BLOCK_CONTIGUOUS_P)
    	(BLOCK_RANGE_START, BLOCK_RANGE_END, BLOCK_ENTRY_PC): New
    	macros for accessing ranges in struct block.
    	(make_blockranges): New declaration.
    	(block_contains_pc): New declaration.
    	block.c (make_blockranges): New function.
    	(block_contains_pc): New function.
---
 gdb/block.c | 39 ++++++++++++++++++++++++++++
 gdb/block.h | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+)

Comments

Simon Marchi Aug. 1, 2018, 1:35 a.m. | #1
On 2018-06-26 02:42 AM, Kevin Buettner wrote:
> This patch does the following:

> 

> - Introduces a block range data structure which is accessed via

>   a new field in struct block.

> - Defines several macros for accessing block ranges.

> - Defines a new function, make_blockrange, which is responsible for

>   creating the new data structure.

> - Defines a predicate (function) which checks to see if an address

>   is present in a block.

> 

> It should be noted that some support for non-contiguous ranges already

> existed in GDB in the form of blockvector addrmaps.  This support

> allowed GDB to quickly find a block containing a particular address

> even when the block consists of non-contiguous addresses.  See

> find_block_in_blockvector() in block.c, dwarf2_record_block_ranges()

> in dwarf2read.c, and record_block_range() in buildsym.c.

> 

> Addrmaps do not provide a convenient way to examine address ranges

> associated with a particular block.  This data structure (and its

> interface) is set up for quickly finding the value (which in this case

> is a block) associated with a particular address.  The interface

> does not include a method for doing a reverse mapping from blocks to

> addresses.  A linear time mapping might be attempted via use of the

> addrmap's foreach method, but this is not as straightforward as it

> might first appear due to the fact that blocks corresponding to inline

> function instances and lexical blocks w/ variables end up getting

> interspersed in in the set of transitions.

> 

> Note:  If this approach is deemed to be too expensive in terms of

> space, an alternate approach might be to attempt the linear time

> mapping noted above.  find_pc_partial_function() needs to be able to

> quickly know whether there are discontiguous ranges (for determining

> cache validity), so a flag for this property would have to be added to

> struct block.  Also integral to this set of changes is the concept of

> an "entry pc" which might be different than the block's start address.

> An entry_pc field would also need to be added to struct block.  This

> does not result in any space savings in struct block though since the

> space for the flag and entry_pc use more space than the blockranges

> struct pointer that I've added.  There would, however, be some space

> savings due to the fact that the new data structures that I've added

> for this patch would not need to be allocated.  (I happen to like the

> approach I've come up with, but I wanted to mention another possibility

> just in case someone does not.)


I assume the impact won't be too big (there probably isn't a ton of ranges
per block), but some actual data would be interesting.

Just two nits below, otherwise it LGTM.

> @@ -807,3 +822,27 @@ block_find_non_opaque_type_preferred (struct symbol *sym, void *data)

>    *best = sym;

>    return 0;

>  }

> +

> +/* See block.h.  */

> +

> +struct blockranges *

> +make_blockranges (struct objfile * objfile,


Spurious space after the *.

> @@ -322,4 +401,9 @@ extern int block_find_non_opaque_type_preferred (struct symbol *sym,

>         (sym) != NULL;							\

>         (sym) = block_iter_match_next ((name), &(iter)))

>  

> +/* Given a vector of pairs, allocate and build an obstack allocated

> +   blockranges struct for a block.  */

> +struct blockranges *make_blockranges (struct objfile *objfile,

> +                                      std::vector<std::pair<CORE_ADDR, CORE_ADDR>> &);


Can you please name the second parameter, and make it const?

Thanks,

Simon
Simon Marchi Aug. 1, 2018, 1:40 a.m. | #2
On 2018-06-26 02:42 AM, Kevin Buettner wrote:
> @@ -322,4 +401,9 @@ extern int block_find_non_opaque_type_preferred (struct symbol *sym,

>         (sym) != NULL;							\

>         (sym) = block_iter_match_next ((name), &(iter)))

>  

> +/* Given a vector of pairs, allocate and build an obstack allocated

> +   blockranges struct for a block.  */

> +struct blockranges *make_blockranges (struct objfile *objfile,

> +                                      std::vector<std::pair<CORE_ADDR, CORE_ADDR>> &);


Oh, also, this vector could be a vector of struct blockrange.  It would be more
expressive than a vector of pairs.

Simon
Kevin Buettner Aug. 1, 2018, 11:57 p.m. | #3
On Tue, 31 Jul 2018 21:35:50 -0400
Simon Marchi <simon.marchi@ericsson.com> wrote:

> On 2018-06-26 02:42 AM, Kevin Buettner wrote:

> > This patch does the following:

> > 

> > - Introduces a block range data structure which is accessed via

> >   a new field in struct block.

> > - Defines several macros for accessing block ranges.

> > - Defines a new function, make_blockrange, which is responsible for

> >   creating the new data structure.

> > - Defines a predicate (function) which checks to see if an address

> >   is present in a block.

> > 

> > It should be noted that some support for non-contiguous ranges already

> > existed in GDB in the form of blockvector addrmaps.  This support

> > allowed GDB to quickly find a block containing a particular address

> > even when the block consists of non-contiguous addresses.  See

> > find_block_in_blockvector() in block.c, dwarf2_record_block_ranges()

> > in dwarf2read.c, and record_block_range() in buildsym.c.

> > 

> > Addrmaps do not provide a convenient way to examine address ranges

> > associated with a particular block.  This data structure (and its

> > interface) is set up for quickly finding the value (which in this case

> > is a block) associated with a particular address.  The interface

> > does not include a method for doing a reverse mapping from blocks to

> > addresses.  A linear time mapping might be attempted via use of the

> > addrmap's foreach method, but this is not as straightforward as it

> > might first appear due to the fact that blocks corresponding to inline

> > function instances and lexical blocks w/ variables end up getting

> > interspersed in in the set of transitions.

> > 

> > Note:  If this approach is deemed to be too expensive in terms of

> > space, an alternate approach might be to attempt the linear time

> > mapping noted above.  find_pc_partial_function() needs to be able to

> > quickly know whether there are discontiguous ranges (for determining

> > cache validity), so a flag for this property would have to be added to

> > struct block.  Also integral to this set of changes is the concept of

> > an "entry pc" which might be different than the block's start address.

> > An entry_pc field would also need to be added to struct block.  This

> > does not result in any space savings in struct block though since the

> > space for the flag and entry_pc use more space than the blockranges

> > struct pointer that I've added.  There would, however, be some space

> > savings due to the fact that the new data structures that I've added

> > for this patch would not need to be allocated.  (I happen to like the

> > approach I've come up with, but I wanted to mention another possibility

> > just in case someone does not.)  

> 

> I assume the impact won't be too big (there probably isn't a ton of ranges

> per block), but some actual data would be interesting.


In real code, I've seen at most two ranges per block.  At the moment,
I don't think that GCC is overly aggressive at splitting functions into
multiple ranges.  Most functions occupy just one range.

Jakub's test case called abort().  The function in which abort() appeared
was split into two ranges.  One range contained the call to abort; the other
had the rest of the function.

I've also seen C++ functions with try/catch broken up into more than one
range.  The less frequently used case is placed within a "cold" range.

Kevin

Patch

diff --git a/gdb/block.c b/gdb/block.c
index f26d169..43234ee 100644
--- a/gdb/block.c
+++ b/gdb/block.c
@@ -65,6 +65,21 @@  block_gdbarch (const struct block *block)
   return get_objfile_arch (block_objfile (block));
 }
 
+/* See block.h.  */
+
+bool
+block_contains_pc (const struct block *b, CORE_ADDR pc)
+{
+  if (BLOCK_CONTIGUOUS_P (b))
+    return (BLOCK_START (b) <= pc && pc < BLOCK_END (b));
+  else
+    for (int i = 0; i < BLOCK_NRANGES (b); i++)
+      if (BLOCK_RANGE_START (b, i) <= pc && pc < BLOCK_RANGE_END (b, i))
+        return true;
+
+  return false;
+}
+
 /* Return Nonzero if block a is lexically nested within block b,
    or if a and b have the same pc range.
    Return zero otherwise.  */
@@ -807,3 +822,27 @@  block_find_non_opaque_type_preferred (struct symbol *sym, void *data)
   *best = sym;
   return 0;
 }
+
+/* See block.h.  */
+
+struct blockranges *
+make_blockranges (struct objfile * objfile,
+                  std::vector<std::pair<CORE_ADDR, CORE_ADDR>> &rangevec)
+{
+  struct blockranges *blr;
+  size_t n = rangevec.size();
+
+  blr = (struct blockranges *)
+    obstack_alloc (&objfile->objfile_obstack,
+                   sizeof (struct blockranges)
+		   + (n - 1) * sizeof (struct blockrange));
+
+  blr->nranges = n;
+  for (int i = 0; i < n; i++)
+    {
+      blr->range[i].startaddr = rangevec[i].first;
+      blr->range[i].endaddr = rangevec[i].second;
+    }
+  return blr;
+}
+
diff --git a/gdb/block.h b/gdb/block.h
index ff12725..ec89b62 100644
--- a/gdb/block.h
+++ b/gdb/block.h
@@ -31,6 +31,32 @@  struct using_direct;
 struct obstack;
 struct addrmap;
 
+/* Blocks can occupy non-contiguous address ranges.  When this occurs,
+   startaddr and endaddr within struct block (still) specify the lowest
+   and highest addresses of all ranges, but each individual range is
+   specified by the addresses in struct blockrange.  */
+
+struct blockrange
+{
+
+  /* Lowest address in this range.  */
+
+  CORE_ADDR startaddr;
+
+  /* One past the highest address in the range.  */
+
+  CORE_ADDR endaddr;
+};
+
+/* Two or more non-contiguous ranges in the same order as that provided
+   via the debug info.  */
+
+struct blockranges
+{
+  int nranges;
+  struct blockrange range[1];
+};
+
 /* All of the name-scope contours of the program
    are represented by `struct block' objects.
    All of these objects are pointed to by the blockvector.
@@ -86,6 +112,12 @@  struct block
      using directives and the current namespace scope.  */
 
   struct block_namespace_info *namespace_info;
+
+  /* Address ranges for blocks with non-contiguous ranges.  If this
+     is NULL, then there is only one range which is specified by
+     startaddr and endaddr above.  */
+
+  struct blockranges *ranges;
 };
 
 /* The global block is singled out so that we can provide a back-link
@@ -109,6 +141,49 @@  struct global_block
 #define BLOCK_DICT(bl)		(bl)->dict
 #define BLOCK_NAMESPACE(bl)	(bl)->namespace_info
 
+/* Accessor for ranges field within block BL.  */
+
+#define BLOCK_RANGES(bl)	(bl)->ranges
+
+/* Number of ranges within a block.  */
+
+#define BLOCK_NRANGES(bl)	(bl)->ranges->nranges
+
+/* Access range array for block BL.  */
+
+#define BLOCK_RANGE(bl)		(bl)->ranges->range
+
+/* Are all addresses within a block contiguous?  */
+
+#define BLOCK_CONTIGUOUS_P(bl)	(BLOCK_RANGES (bl) == nullptr \
+				 || BLOCK_NRANGES (bl) <= 1)
+
+/* Obtain the start address of the Nth range for block BL.  */
+
+#define BLOCK_RANGE_START(bl,n) (BLOCK_RANGE (bl)[n].startaddr)
+
+/* Obtain the end address of the Nth range for block BL.  */
+
+#define BLOCK_RANGE_END(bl,n)	(BLOCK_RANGE (bl)[n].endaddr)
+
+/* Define the "entry pc" for a block BL to be the lowest (start) address
+   for the block when all addresses within the block are contiguous.  If
+   non-contiguous, then use the start address for the first range in the
+   block.
+
+   At the moment, this almost matches what DWARF specifies as the entry
+   pc.  (The missing bit is support for DW_AT_entry_pc which should be
+   preferred over range data and the low_pc.)
+
+   Once support for DW_AT_entry_pc is added, I expect that an entry_pc
+   field will be added to one of these data structures.  Once that's done,
+   the entry_pc field can be set from the dwarf reader (and other readers
+   too).  BLOCK_ENTRY_PC can then be redefined to be less DWARF-centric.  */
+
+#define BLOCK_ENTRY_PC(bl)	(BLOCK_CONTIGUOUS_P (bl) \
+				 ? BLOCK_START (bl) \
+				 : BLOCK_RANGE_START (bl,0))
+
 struct blockvector
 {
   /* Number of blocks in the list.  */
@@ -139,6 +214,10 @@  extern struct symbol *block_containing_function (const struct block *);
 
 extern int block_inlined_p (const struct block *block);
 
+/* Return true if PC is in block BLOCK, false otherwise.  */
+
+extern bool block_contains_pc (const struct block *block, CORE_ADDR pc);
+
 extern int contained_in (const struct block *, const struct block *);
 
 extern const struct blockvector *blockvector_for_pc (CORE_ADDR,
@@ -322,4 +401,9 @@  extern int block_find_non_opaque_type_preferred (struct symbol *sym,
        (sym) != NULL;							\
        (sym) = block_iter_match_next ((name), &(iter)))
 
+/* Given a vector of pairs, allocate and build an obstack allocated
+   blockranges struct for a block.  */
+struct blockranges *make_blockranges (struct objfile *objfile,
+                                      std::vector<std::pair<CORE_ADDR, CORE_ADDR>> &);
+
 #endif /* BLOCK_H */