Message ID | 20200215165444.32653-2-tom@tromey.com |
---|---|
State | New |
Headers | show |
Series |
|
Related | show |
On 2020-02-15 11:54 a.m., Tom Tromey wrote: > dwarf2_find_containing_comp_unit has this in its binary search: > > if (mid_cu->is_dwz > offset_in_dwz > || (mid_cu->is_dwz == offset_in_dwz > && mid_cu->sect_off + mid_cu->length >= sect_off)) > high = mid; > > The intent here is to determine whether SECT_OFF appears in or before > MID_CU. > > I believe this has an off-by-one error, and that the check should use > ">" rather than ">=". If the two side are equal, then SECT_OFF > actually appears at the start of the next CU. > > I've had this patch kicking around for ages but I forget how I found > the problem. > > gdb/ChangeLog > 2020-02-15 Tom Tromey <tom@tromey.com> > > * dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not > ">=", in binary search. > --- > gdb/ChangeLog | 5 +++++ > gdb/dwarf2/read.c | 2 +- > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c > index e74383e01dc..e373cc0af2c 100644 > --- a/gdb/dwarf2/read.c > +++ b/gdb/dwarf2/read.c > @@ -24171,7 +24171,7 @@ dwarf2_find_containing_comp_unit (sect_offset sect_off, > mid_cu = dwarf2_per_objfile->all_comp_units[mid]; > if (mid_cu->is_dwz > offset_in_dwz > || (mid_cu->is_dwz == offset_in_dwz > - && mid_cu->sect_off + mid_cu->length >= sect_off)) > + && mid_cu->sect_off + mid_cu->length > sect_off)) > high = mid; > else > low = mid + 1; I can only convince myself of these things by plugging in real numbers. So let's say mid_cu's range is [100,150[, so 150 bytes long, and we are looking for sect_off == 150. 150 is outside (just after) mid_cu's range, so the right answer will be the cu just after this one. Without your change, we would have gone in the "if", and therefore excluded the right answer. With your change, we would have gone in the "else", and therefore chosen the range with the right answer. So that looks correct to me. I'm going to ask if you thought about a way to test this. I don't think writing a typical test case for this is feasible. By refactoring the code a bit, we could maybe factor out the meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead of on a dwarf2_per_objfile. It should then be feasible to create an std::vector with dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge cases like this. Also, do you understand what's the logic with this dwz stuff? Is it that all the CUs coming from a dwz file are at the end of the list, or something like that? Simon
>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:
Simon> I'm going to ask if you thought about a way to test this. I
Simon> don't think writing a typical test case for this is feasible.
Yeah. And like I said, I don't remember how I encountered this.
Maybe only with some other dubious patch in place.
Simon> By refactoring the code a bit, we could maybe factor out the
Simon> meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead
Simon> of on a dwarf2_per_objfile. It should then be feasible to create an std::vector with
Simon> dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge
Simon> cases like this.
Another idea that occurred to me is that we could just use
std::binary_search here. Then instead of implementing a binary search,
we'd only need to implement a comparison function -- which seems
simpler.
Anyway I will try to write a unit test, that's a good idea.
Simon> Also, do you understand what's the logic with this dwz stuff? Is it that all the CUs
Simon> coming from a dwz file are at the end of the list, or something like that?
Yes, exactly. Maybe this can be cleaned up a bit after this series,
since now we have the index directly in the dwarf2_per_cu_data. I am
not sure offhand.
Tom
Simon> By refactoring the code a bit, we could maybe factor out the meat Simon> of this function into one that operates on an Simon> std::vector<dwarf2_per_cu_data> instead of on a Simon> dwarf2_per_objfile. It should then be feasible to create an Simon> std::vector with dwarf2_per_cu_data elements out of thin air to Simon> unit-test the function, including edge cases like this. Here is a new version of this patch. I propose landing it separately, because it is just a straightforward latent bug fix. Simon> Also, do you understand what's the logic with this dwz stuff? Is it that all the CUs Simon> coming from a dwz file are at the end of the list, or something like that? Tom> Yes, exactly. To expand on that, in multi-file mode, dwz may create a combined, shared file of partial symtabs. When reading the DWARF for the main file, gdb may find it needs the dwz file; and in this case the additional CUs are put into the same vector, at the end. It's done this way so that we can easily find the correct CU (since the section offsets between the main and shared dwz file will overlap). It would probably be better to try to share these CUs across objfiles. That may be somewhat involved given the current code, though. Tom
Simon> By refactoring the code a bit, we could maybe factor out the Simon> meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead Simon> of on a dwarf2_per_objfile. It should then be feasible to create an std::vector with Simon> dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge Simon> cases like this. Oops, I meant to attach the patch in that last email. Let me know what you think. Tom commit 5d02b8ad013451b03e004ef1e71c4636b43252c2 Author: Tom Tromey <tom@tromey.com> Date: Sat Feb 15 09:08:09 2020 -0700 Fix latent bug in dwarf2_find_containing_comp_unit dwarf2_find_containing_comp_unit has this in its binary search: if (mid_cu->is_dwz > offset_in_dwz || (mid_cu->is_dwz == offset_in_dwz && mid_cu->sect_off + mid_cu->length >= sect_off)) high = mid; The intent here is to determine whether SECT_OFF appears in or before MID_CU. I believe this has an off-by-one error, and that the check should use ">" rather than ">=". If the two side are equal, then SECT_OFF actually appears at the start of the next CU. I've had this patch kicking around for ages but I forget how I found the problem. gdb/ChangeLog 2020-02-19 Tom Tromey <tom@tromey.com> * dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not ">=", in binary search. (dwarf2_find_containing_comp_unit): New overload. (run_test): New self-test. (_initialize_dwarf2_read): Register new test. diff --git a/gdb/ChangeLog b/gdb/ChangeLog index b4a586c333a..b07c9156e28 100644 --- a/gdb/ChangeLog +++ b/gdb/ChangeLog @@ -1,3 +1,11 @@ +2020-02-19 Tom Tromey <tom@tromey.com> + + * dwarf2/read.c (dwarf2_find_containing_comp_unit): Use ">", not + ">=", in binary search. + (dwarf2_find_containing_comp_unit): New overload. + (run_test): New self-test. + (_initialize_dwarf2_read): Register new test. + 2020-02-19 Simon Marchi <simon.marchi@efficios.com> * dwarf2/read.c (allocate_signatured_type_table, diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index 4d767a59af7..f998fe6b8d0 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -24136,34 +24136,53 @@ dwarf2_per_cu_data::addr_type () const return addr_type; } -/* Locate the .debug_info compilation unit from CU's objfile which contains - the DIE at OFFSET. Raises an error on failure. */ +/* A helper function for dwarf2_find_containing_comp_unit that returns + the index of the result, and that searches a vector. It will + return a result even if the offset in question does not actually + occur in any CU. This is separate so that it can be unit + tested. */ -static struct dwarf2_per_cu_data * -dwarf2_find_containing_comp_unit (sect_offset sect_off, - unsigned int offset_in_dwz, - struct dwarf2_per_objfile *dwarf2_per_objfile) +static int +dwarf2_find_containing_comp_unit + (sect_offset sect_off, + unsigned int offset_in_dwz, + const std::vector<dwarf2_per_cu_data *> &all_comp_units) { - struct dwarf2_per_cu_data *this_cu; int low, high; low = 0; - high = dwarf2_per_objfile->all_comp_units.size () - 1; + high = all_comp_units.size () - 1; while (high > low) { struct dwarf2_per_cu_data *mid_cu; int mid = low + (high - low) / 2; - mid_cu = dwarf2_per_objfile->all_comp_units[mid]; + mid_cu = all_comp_units[mid]; if (mid_cu->is_dwz > offset_in_dwz || (mid_cu->is_dwz == offset_in_dwz - && mid_cu->sect_off + mid_cu->length >= sect_off)) + && mid_cu->sect_off + mid_cu->length > sect_off)) high = mid; else low = mid + 1; } gdb_assert (low == high); - this_cu = dwarf2_per_objfile->all_comp_units[low]; + return low; +} + +/* Locate the .debug_info compilation unit from CU's objfile which contains + the DIE at OFFSET. Raises an error on failure. */ + +static struct dwarf2_per_cu_data * +dwarf2_find_containing_comp_unit (sect_offset sect_off, + unsigned int offset_in_dwz, + struct dwarf2_per_objfile *dwarf2_per_objfile) +{ + int low + = dwarf2_find_containing_comp_unit (sect_off, offset_in_dwz, + dwarf2_per_objfile->all_comp_units); + struct dwarf2_per_cu_data *this_cu + = dwarf2_per_objfile->all_comp_units[low]; + if (this_cu->is_dwz != offset_in_dwz || this_cu->sect_off > sect_off) { if (low == 0 || this_cu->is_dwz != offset_in_dwz) @@ -24186,6 +24205,57 @@ dwarf2_find_containing_comp_unit (sect_offset sect_off, } } +#if GDB_SELF_TEST + +namespace selftests { +namespace find_containing_comp_unit { + +static void +run_test () +{ + struct dwarf2_per_cu_data one {}; + struct dwarf2_per_cu_data two {}; + struct dwarf2_per_cu_data three {}; + struct dwarf2_per_cu_data four {}; + + one.length = 5; + two.sect_off = sect_offset (one.length); + two.length = 7; + + three.length = 5; + three.is_dwz = 1; + four.sect_off = sect_offset (three.length); + four.length = 7; + four.is_dwz = 1; + + std::vector<dwarf2_per_cu_data *> units; + units.push_back (&one); + units.push_back (&two); + units.push_back (&three); + units.push_back (&four); + + int result; + + result = dwarf2_find_containing_comp_unit (sect_offset (0), 0, units); + SELF_CHECK (units[result] == &one); + result = dwarf2_find_containing_comp_unit (sect_offset (3), 0, units); + SELF_CHECK (units[result] == &one); + result = dwarf2_find_containing_comp_unit (sect_offset (5), 0, units); + SELF_CHECK (units[result] == &two); + + result = dwarf2_find_containing_comp_unit (sect_offset (0), 1, units); + SELF_CHECK (units[result] == &three); + result = dwarf2_find_containing_comp_unit (sect_offset (3), 1, units); + SELF_CHECK (units[result] == &three); + result = dwarf2_find_containing_comp_unit (sect_offset (5), 1, units); + SELF_CHECK (units[result] == &four); +} + +} +} + +#endif /* GDB_SELF_TEST */ + /* Initialize dwarf2_cu CU, owned by PER_CU. */ dwarf2_cu::dwarf2_cu (struct dwarf2_per_cu_data *per_cu_) @@ -24690,5 +24760,7 @@ Warning: This option must be enabled before gdb reads the file."), #if GDB_SELF_TEST selftests::register_test ("dw2_expand_symtabs_matching", selftests::dw2_expand_symtabs_matching::run_test); + selftests::register_test ("dwarf2_find_containing_comp_unit", + selftests::find_containing_comp_unit::run_test); #endif }
On 2020-02-19 7:12 p.m., Tom Tromey wrote: > Simon> By refactoring the code a bit, we could maybe factor out the > Simon> meat of this function into one that operates on an std::vector<dwarf2_per_cu_data> instead > Simon> of on a dwarf2_per_objfile. It should then be feasible to create an std::vector with > Simon> dwarf2_per_cu_data elements out of thin air to unit-test the function, including edge > Simon> cases like this. > > Oops, I meant to attach the patch in that last email. > > Let me know what you think. > > Tom Thanks, that LGTM. If you want to use std::binary_search, that would be fine as well. Simon
>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:
Simon> Thanks, that LGTM. If you want to use std::binary_search, that would be fine
Simon> as well.
I tried this but it wasn't notably simpler and it meant changes to the
error handling, so I dropped it.
Tom
Re std::binary_search, we do have GDB::binary_search which is more useful/easier to use. On Thu, Feb 20, 2020, 10:50 Tom Tromey <tom@tromey.com> wrote: > >>>>> "Simon" == Simon Marchi <simark@simark.ca> writes: > > Simon> Thanks, that LGTM. If you want to use std::binary_search, that > would be fine > Simon> as well. > > I tried this but it wasn't notably simpler and it meant changes to the > error handling, so I dropped it. > > Tom >
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index e74383e01dc..e373cc0af2c 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -24171,7 +24171,7 @@ dwarf2_find_containing_comp_unit (sect_offset sect_off, mid_cu = dwarf2_per_objfile->all_comp_units[mid]; if (mid_cu->is_dwz > offset_in_dwz || (mid_cu->is_dwz == offset_in_dwz - && mid_cu->sect_off + mid_cu->length >= sect_off)) + && mid_cu->sect_off + mid_cu->length > sect_off)) high = mid; else low = mid + 1;