[RFC,gdb/symtab] Lazy expansion of full symbol table

Message ID 20210614093908.GA22709@delia
State New
Headers show
Series
  • [RFC,gdb/symtab] Lazy expansion of full symbol table
Related show

Commit Message

Tom de Vries June 14, 2021, 9:39 a.m.
Hi,

[ I'm not posting the experimental patch series as such for now.  Available
here ( https://github.com/vries/gdb/commits/lazy-full-symtab ). ]

In PR23710, the stated problem is that gdb is slow and memory hungry when
consuming debug information generated by GCC with LTO.

I.

Taking the range of final releases 8.1.1 to 10.2, as well as a recent trunk
commit (3633d4fb446), and an experiment using cc1:
...
$ gdb -q -batch cc1 -ex "b do_rpo_vn"
...
we get:
...
+---------+----------------+
| Version | real (seconds) |
+---------+----------------+
| 8.1.1   | 9.42           |
| 8.2.1   | - (PR23712)    |
| 8.3.1   | 9.31           |
| 9.2     | 8.50           |
| 10.2    | 5.86           |
| trunk   | 6.36           |
+---------+----------------+
...
which is nice progress in the releases.  The regression on trunk since 10.2
has been filed as PR27937.

[ The 10.2 score can be further improved to 5.23, by setting dwarf
max-cache-age to 1000.  Defaults to 5, see PR25703. ]

However, the best score is still more than a factor 3 slower than lldb:
...
+-------------+----------------+
| Version     | real (seconds) |
+-------------+----------------+
| gdb 10.2    | 5.86           |
| lldb 10.0.1 | 1.74           |
+-------------+----------------+
...

II.

Breaking down the 10.2 time of 5.86, we have:
...
+-----------------+----------------+
|                 | real (seconds) |
+-----------------+----------------+
| Minimal symbols | 0.18           |
| Partial symbols | 2.34           |
| Full symbols    | 3.34           |
+-----------------+----------------+
...

So:
- the minimal symbols and partial symbols are processed for all CUs, while
  the full symbols are processed only for the necessary CUs
- still the majority of the time is spent for the full symbols

This is due to the combination of:
- the one-CU-at-a-time strategy of gdb, and
- the code generation for LTO which combines several CUs into an
  artificial CU.
In other words, LTO increases the scope of processing from individual CUs to
larger artificial CUs, and consequently things take much longer.

III.

A way to fix this is to do processing of the full symbols in a lazy fashion.

This patch series implements a first attempt at this, for now intended not to
be functionally correct, but to assess the kind of performance improvement we
get from this.

With current trunk (commit 987610f2d68), we get 3.44, instead of the 6.44
without this patch series.

IV.

The patch series consists of:

[gdb/symtab] Cover letter -- Lazy expansion of full symbol table
This.

[gdb/symtab] Add lazy_expand_symtab_p, default to false
Add variable to enable lazy expansion.  Keep false for now to enable gradual
introduction of implementation.

[gdb/symtab] Add sect_off field to partial_symbol
Keep track of section offset in partial symbol.

[gdb/symtab] Keep track of interesting_symbols in partial_symtab
When searching for a symbol in a partial symbol table:
- keep going after finding a match
- store the matching partial symbols in a vector interesting_symbols

[gdb/symtab] Add interesting_symbols to dwarf2_per_cu_data
Make the vector interesting_symbols available during full symbols expansion.

[gdb/symtab] Handle interesting_symbols in read_file_scope
Using the interesting_symbols vector to filter the DIEs that we process when
doing read_file_scope.

[gdb/symtab] Add reset_compunit_symtab
Add a new function reset_compunit_symtab

[gdb/symtab] Reset compunit_symtab in psymtab_to_symtab
Use the new function to reset the full symbols table before expanding.
[ Without this patch, after finding symbols for the first time, we are not able
to find any others.  With this patch, we are able to find other symbols, but
only after forgetting the first ones.  This obviously needs proper fixing. ]

[gdb/symtab] Set lazy_expand_symtab_p to true
Enable.

V.

The current state of trunk is that expanding full symbols is a two part
process:
- a builder is created during expansion
- after expansion the builder is destroyed after delivering the
  end result: a symbol table

The problem is that we need a way to do this gradually instead:
- expand a few symbols
- get the correspoding symbol table
- expand a few more symbols
- get the updated symbol table contain all expanded symbols

I'm not sure what is the smartest way to do that.

My current idea is to try to keep the builder around rather than destroy it,
and have it generate an updated symbol table each time.

Is this a good idea?

Any other comments?

Thanks,
- Tom

[gdb/symtab] Cover letter -- Lazy expansion of full symbol table

---
 COVER-LETTER | 1 +
 1 file changed, 1 insertion(+)

Comments

Tom Tromey June 14, 2021, 8:54 p.m. | #1
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:


Tom> [ I'm not posting the experimental patch series as such for now.  Available
Tom> here ( https://github.com/vries/gdb/commits/lazy-full-symtab ). ]

Tom> In PR23710, the stated problem is that gdb is slow and memory hungry when
Tom> consuming debug information generated by GCC with LTO.

Tom> | Minimal symbols | 0.18           |
Tom> | Partial symbols | 2.34           |
Tom> | Full symbols    | 3.34           |

I don't have this executable but FWIW my scanner rewrite is ~10x faster
than the current psymtab reader.

Tom> A way to fix this is to do processing of the full symbols in a lazy fashion.

Tom> This patch series implements a first attempt at this, for now intended not to
Tom> be functionally correct, but to assess the kind of performance improvement we
Tom> get from this.

Tom> With current trunk (commit 987610f2d68), we get 3.44, instead of the 6.44
Tom> without this patch series.

That's about in line with the preliminary results I saw as well.
You can see that branch at tromey/lazily-read-function-bodies, but it's
probably unusable since I last rebased it in 2017.

Tom> The problem is that we need a way to do this gradually instead:
Tom> - expand a few symbols
Tom> - get the correspoding symbol table
Tom> - expand a few more symbols
Tom> - get the updated symbol table contain all expanded symbols

Tom> I'm not sure what is the smartest way to do that.

Tom> My current idea is to try to keep the builder around rather than destroy it,
Tom> and have it generate an updated symbol table each time.

Tom> Is this a good idea?

Tom> Any other comments?

I think it's a good idea to do something.  Historically I didn't look
into speeding up CU expansion on the theory that it didn't really matter
-- most CUs are small -- but LTO upends that.

I never finished the earlier patch that I referenced, but the idea there
was to skip function bodies when reading a CU.  Then, the functions
could be fully read when needed.  I got this to work for by-name
lookups, but I never finished the part where a by-address lookup would
resize the blockvector, etc.

This approach could be extended to also not read type bodies until
needed.  This could work by inflating the type during check_typedef.
This can be a little dangerous because we do miss these calls from time
to time; but maybe nearly as good would be to only lazily read
"complicated" types, and use the accessor methods on those types to
ensure they are inflated properly.


Another deeper approach would be to abstract away symtabs (and I guess
the blockvector) the way that we did for psymtabs: hide the details
behind the "quick" methods and let each debug reader decide for itself
how to store and look up symbols.  This is probably more work, but would
allow even more laziness.  For example with the new indexer the symbols
could simply hang off the entries in the index.

Tom
Tom de Vries June 14, 2021, 11:36 p.m. | #2
On 6/14/21 10:54 PM, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

> 

> Tom> [ I'm not posting the experimental patch series as such for now.  Available

> Tom> here ( https://github.com/vries/gdb/commits/lazy-full-symtab ). ]

> 

> Tom> In PR23710, the stated problem is that gdb is slow and memory hungry when

> Tom> consuming debug information generated by GCC with LTO.

> 

> Tom> | Minimal symbols | 0.18           |

> Tom> | Partial symbols | 2.34           |

> Tom> | Full symbols    | 3.34           |

> 

> I don't have this executable


Uploaded to https://ftp.suse.com/pub/people/tdevries/gdb/pr23710/cc1

> but FWIW my scanner rewrite is ~10x faster

> than the current psymtab reader.

> 


Interesting, that's
https://github.com/tromey/gdb/commits/submit/no-more-psym ?

I've tried that branch with the cc1 example, and ran into:
...
DW_FORM_strp pointing outside of .debug_str section [in module cc1]
...

> Tom> A way to fix this is to do processing of the full symbols in a lazy fashion.

> 

> Tom> This patch series implements a first attempt at this, for now intended not to

> Tom> be functionally correct, but to assess the kind of performance improvement we

> Tom> get from this.

> 

> Tom> With current trunk (commit 987610f2d68), we get 3.44, instead of the 6.44

> Tom> without this patch series.

> 

> That's about in line with the preliminary results I saw as well.

> You can see that branch at tromey/lazily-read-function-bodies, but it's

> probably unusable since I last rebased it in 2017.

> 


Ack, thanks for the pointer.

> Tom> The problem is that we need a way to do this gradually instead:

> Tom> - expand a few symbols

> Tom> - get the correspoding symbol table

> Tom> - expand a few more symbols

> Tom> - get the updated symbol table contain all expanded symbols

> 

> Tom> I'm not sure what is the smartest way to do that.

> 

> Tom> My current idea is to try to keep the builder around rather than destroy it,

> Tom> and have it generate an updated symbol table each time.

> 

> Tom> Is this a good idea?

> 

> Tom> Any other comments?

> 

> I think it's a good idea to do something.  Historically I didn't look

> into speeding up CU expansion on the theory that it didn't really matter

> -- most CUs are small -- but LTO upends that.

> 

> I never finished the earlier patch that I referenced, but the idea there

> was to skip function bodies when reading a CU.  Then, the functions

> could be fully read when needed.  I got this to work for by-name

> lookups, but I never finished the part where a by-address lookup would

> resize the blockvector, etc.

> 

> This approach could be extended to also not read type bodies until

> needed.  This could work by inflating the type during check_typedef.

> This can be a little dangerous because we do miss these calls from time

> to time; but maybe nearly as good would be to only lazily read

> "complicated" types, and use the accessor methods on those types to

> ensure they are inflated properly.

> 

> 

> Another deeper approach would be to abstract away symtabs (and I guess

> the blockvector) the way that we did for psymtabs: hide the details

> behind the "quick" methods and let each debug reader decide for itself

> how to store and look up symbols.  This is probably more work, but would

> allow even more laziness.  For example with the new indexer the symbols

> could simply hang off the entries in the index.

> 


OK, I'll try to take a look at the branch and digest your comments.

Thanks,
- Tom
Tom Tromey June 15, 2021, 1:26 p.m. | #3
>> I don't have this executable


Tom> Uploaded to https://ftp.suse.com/pub/people/tdevries/gdb/pr23710/cc1

Thanks, I grabbed it.

>> but FWIW my scanner rewrite is ~10x faster

>> than the current psymtab reader.


Tom> Interesting, that's
Tom> https://github.com/tromey/gdb/commits/submit/no-more-psym ?

Tom> I've tried that branch with the cc1 example, and ran into:
Tom> ...
Tom> DW_FORM_strp pointing outside of .debug_str section [in module cc1]

Thank you for trying it.  This was a bug in the CU importing code.  I've
fixed it and will push momentarily -- but be warned, I rebase this
branch a lot...

On my machine, with "maint time 1", /bin/gdb reports:

    (gdb) file ./cc1
    2021-06-15 07:18:29.554 - command started
    Reading symbols from ./cc1...
    2021-06-15 07:18:35.118 - command finished
    Command execution time: 5.553344 (cpu), 5.563964 (wall)

The branch says:

    (gdb) file ./cc1
    2021-06-15 07:18:50.688 - command started
    Reading symbols from ./cc1...
    2021-06-15 07:18:51.074 - command finished
    Command execution time: 1.260050 (cpu), 0.386460 (wall)

So a bit more than 10x in this case.  I wish it were a tiny bit faster,
so that it wasn't perceptible, but meh, DWARF wasn't designed to be fast
to read.

Currently I think the scanner is complete, I just need to make some
changes to the lookup code, and maybe also how the new index entries are
stored, so that completion is more efficient.  Of course there may still
be bugs, but at the moment I don't anticipate any reader slowdowns.

Tom
Tom de Vries June 16, 2021, 10:20 a.m. | #4
On 6/15/21 3:26 PM, Tom Tromey wrote:
>>> I don't have this executable

> 

> Tom> Uploaded to https://ftp.suse.com/pub/people/tdevries/gdb/pr23710/cc1

> 

> Thanks, I grabbed it.

> 

>>> but FWIW my scanner rewrite is ~10x faster

>>> than the current psymtab reader.

> 

> Tom> Interesting, that's

> Tom> https://github.com/tromey/gdb/commits/submit/no-more-psym ?

> 

> Tom> I've tried that branch with the cc1 example, and ran into:

> Tom> ...

> Tom> DW_FORM_strp pointing outside of .debug_str section [in module cc1]

> 

> Thank you for trying it.  This was a bug in the CU importing code.  I've

> fixed it and will push momentarily -- but be warned, I rebase this

> branch a lot...

> 


Sure, understood.

> On my machine, with "maint time 1", /bin/gdb reports:

> 

>     (gdb) file ./cc1

>     2021-06-15 07:18:29.554 - command started

>     Reading symbols from ./cc1...

>     2021-06-15 07:18:35.118 - command finished

>     Command execution time: 5.553344 (cpu), 5.563964 (wall)

> 

> The branch says:

> 

>     (gdb) file ./cc1

>     2021-06-15 07:18:50.688 - command started

>     Reading symbols from ./cc1...

>     2021-06-15 07:18:51.074 - command finished

>     Command execution time: 1.260050 (cpu), 0.386460 (wall)

> 

> So a bit more than 10x in this case. 


Wow, nice :)

FWIW, my measurement is:

1. branch base 739025e89c4
...
$ gdb -q -ex "set trace-commands on" -ex "maint time 1" -ex "file lto/cc1"
+maint time 1
+file lto/cc1
Reading symbols from lto/cc1...
2021-06-16 11:51:54.007 - command finished
Startup time: 2.949213 (cpu), 2.887778 (wall)
(gdb)
...

2. branch 5bc56d745fd
...
$ gdb -q -ex "set trace-commands on" -ex "maint time 1" -ex "file lto/cc1"
+maint time 1
+file lto/cc1
Reading symbols from lto/cc1...
2021-06-16 12:00:35.060 - command finished
Startup time: 0.947957 (cpu), 0.408178 (wall)
(gdb)
...

Which is a factor 7 speedup.

> I wish it were a tiny bit faster,

> so that it wasn't perceptible, but meh, DWARF wasn't designed to be fast

> to read.

> 

> Currently I think the scanner is complete, I just need to make some

> changes to the lookup code, and maybe also how the new index entries are

> stored, so that completion is more efficient.  Of course there may still

> be bugs, but at the moment I don't anticipate any reader slowdowns.


I did an overnight build and test with the updated branch (5bc56d745fd)
and ran into some trouble.  The first internal-error I investigated
happens when parsing the libstdc++ .debug package (so, it was not
specific to the test-case).  It seems the branch has some trouble with
the dwz layout where an abbrev entry is shared between different CUs:
...
src/gdb/dwarf2/abbrev-cache.c:59: internal-error: void
abbrev_cache::add(abbrev_table_up): Assertion `*slot == nullptr' failed.
...

I also could reproduce this problem with "maint set worker-threads 1".

I set a breakpoint on abbrev_cache::add and abbrev_cache::find, both
limited to section offset 0.

I see then a lot of finds, followed by two adds (and the second add
triggers the assert).

[ If you want to reproduce, download f.i.
https://download.opensuse.org/update/leap/15.2/oss_debug/x86_64/libstdc++6-debuginfo-10.2.1+git583-lp152.4.1.x86_64.rpm
. ]

Thanks,
- Tom
Tom Tromey June 18, 2021, 2:30 a.m. | #5
Tom> I did an overnight build and test with the updated branch (5bc56d745fd)
Tom> and ran into some trouble.  The first internal-error I investigated
Tom> happens when parsing the libstdc++ .debug package (so, it was not
Tom> specific to the test-case).  It seems the branch has some trouble with
Tom> the dwz layout where an abbrev entry is shared between different CUs:

Thank you for trying this, it uncovered several bugs.
As you can see I haven't gotten to the dwz testing yet... one of the
issues with DWARF, btw, is that there are just so many modes.
I.e., I haven't tried DWO or .debug_types yet either.

I pushed some patches to fix the crashes but the result is so fast that
I suspect it is incorrect:

    (gdb) file libstdc++.so.6.0.28-10.2.1+git583-lp152.4.1.x86_64.debug 
    2021-06-17 20:25:34.361 - command started
    Reading symbols from libstdc++.so.6.0.28-10.2.1+git583-lp152.4.1.x86_64.debug...
    2021-06-17 20:25:34.406 - command finished
    Command execution time: 0.075291 (cpu), 0.045521 (wall)

(Though /bin/gdb is also pretty fast here, maybe I'm doing something
else wrong.)

So, at least it doesn't crash, but more investigation is needed.
I'll probably add some code to make it easy to dump the index so it's
easier to see what the scanner recorded.

Tom
Tom de Vries June 19, 2021, 7:36 p.m. | #6
On 6/18/21 4:30 AM, Tom Tromey wrote:
> Tom> I did an overnight build and test with the updated branch (5bc56d745fd)

> Tom> and ran into some trouble.  The first internal-error I investigated

> Tom> happens when parsing the libstdc++ .debug package (so, it was not

> Tom> specific to the test-case).  It seems the branch has some trouble with

> Tom> the dwz layout where an abbrev entry is shared between different CUs:

> 

> Thank you for trying this, it uncovered several bugs.

> As you can see I haven't gotten to the dwz testing yet... one of the

> issues with DWARF, btw, is that there are just so many modes.

> I.e., I haven't tried DWO or .debug_types yet either.

> 


Yeah, very true.

> I pushed some patches to fix the crashes but the result is so fast that

> I suspect it is incorrect:

> 

>     (gdb) file libstdc++.so.6.0.28-10.2.1+git583-lp152.4.1.x86_64.debug 

>     2021-06-17 20:25:34.361 - command started

>     Reading symbols from libstdc++.so.6.0.28-10.2.1+git583-lp152.4.1.x86_64.debug...

>     2021-06-17 20:25:34.406 - command finished

>     Command execution time: 0.075291 (cpu), 0.045521 (wall)

> 

> (Though /bin/gdb is also pretty fast here, maybe I'm doing something

> else wrong.)

> 

> So, at least it doesn't crash, but more investigation is needed.

> I'll probably add some code to make it easy to dump the index so it's

> easier to see what the scanner recorded.


Tried the updated branch and ran into a race condition, fixed in
attached patch.

Thanks,
- Tom
Fix race condition in dwarf2_section_info::read

It can occur in dwarf2_section_info::read that:
- thread A sets readin to true, and buffer to nullptr, and then
- thread B reads readin as true, returns and finds that buffer is nullptr
  and concludes that there no such such section.  Which is not true, it's
  just that thread A hasn't finished reading it yet, which would make
  buffer != nullptr.

---
 gdb/dwarf2/section.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/gdb/dwarf2/section.c b/gdb/dwarf2/section.c
index 8e1c0197c85..e6940103d77 100644
--- a/gdb/dwarf2/section.c
+++ b/gdb/dwarf2/section.c
@@ -30,6 +30,8 @@
 #include "objfiles.h"
 #include "complaints.h"
 
+#include <mutex>
+
 void
 dwarf2_section_info::overflow_complaint () const
 {
@@ -118,10 +120,12 @@ dwarf2_section_info::empty () const
 void
 dwarf2_section_info::read (struct objfile *objfile)
 {
+  static std::mutex mutex;
   asection *sectp;
   bfd *abfd;
   gdb_byte *buf, *retbuf;
 
+  std::lock_guard<std::mutex> guard (mutex);
   if (readin)
     return;
   buffer = NULL;
Tom de Vries June 20, 2021, 9:41 a.m. | #7
On 6/19/21 9:36 PM, Tom de Vries wrote:
> On 6/18/21 4:30 AM, Tom Tromey wrote:

>> Tom> I did an overnight build and test with the updated branch (5bc56d745fd)

>> Tom> and ran into some trouble.  The first internal-error I investigated

>> Tom> happens when parsing the libstdc++ .debug package (so, it was not

>> Tom> specific to the test-case).  It seems the branch has some trouble with

>> Tom> the dwz layout where an abbrev entry is shared between different CUs:

>>

>> Thank you for trying this, it uncovered several bugs.

>> As you can see I haven't gotten to the dwz testing yet... one of the

>> issues with DWARF, btw, is that there are just so many modes.

>> I.e., I haven't tried DWO or .debug_types yet either.

>>

> 

> Yeah, very true.

> 

>> I pushed some patches to fix the crashes but the result is so fast that

>> I suspect it is incorrect:

>>

>>     (gdb) file libstdc++.so.6.0.28-10.2.1+git583-lp152.4.1.x86_64.debug 

>>     2021-06-17 20:25:34.361 - command started

>>     Reading symbols from libstdc++.so.6.0.28-10.2.1+git583-lp152.4.1.x86_64.debug...

>>     2021-06-17 20:25:34.406 - command finished

>>     Command execution time: 0.075291 (cpu), 0.045521 (wall)

>>

>> (Though /bin/gdb is also pretty fast here, maybe I'm doing something

>> else wrong.)

>>

>> So, at least it doesn't crash, but more investigation is needed.

>> I'll probably add some code to make it easy to dump the index so it's

>> easier to see what the scanner recorded.

> 

> Tried the updated branch and ran into a race condition, fixed in

> attached patch.


Another thing I ran into is a not 100% reproducible segfault.

It triggered in gdb.base/advance-until-multiple-locations.exp, when
trying to find "test" in the libc debug package.

The segfault happens in cooked_index_entry::matches due to the entry
parameter being invalid, which is set in this loop:
...
  for (const cooked_index_entry *entry
         : per_objfile->per_bfd->cooked_index_table->find (name_vec.back
()))
    {
      if (!entry->matches (search_flags)
          || !entry->matches (domain)
          || !entry->matches (kind))
        continue;
...

I could reproduce the segfault with maint set worker-thread 1.

Using this debugging code:
...
diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
index 7358352fb0b..c75531ac548 100644
--- a/gdb/dwarf2/cooked-index.c
+++ b/gdb/dwarf2/cooked-index.c
@@ -133,6 +133,43 @@ cooked_index_vector::find (gdb::string_view name)
 {
   range result;

+#if 1
+  auto it = m_entries.begin ();
+  const char *prev_c = nullptr;
+  bool prev = true;
+  for (; it != m_entries.end (); ++it)
+    {
+      auto val = *it;
+      bool res = strncasecmp (val->canonical, name.data (), name.length
()) < 0;
+      if (res && !prev)
+       {
+         fprintf (stderr, "PREV_IT: %s\n", prev_c);
+         fprintf (stderr, "IT: %s\n", (*it)->canonical);
+         fprintf (stderr, "PREV_IT: %d\n",
+                  strncasecmp (prev_c, name.data (), name.length ()));
+         fprintf (stderr, "IT: %d\n",
+                  strncasecmp ((*it)->canonical, name.data (),
name.length ()));
+         gdb_assert_not_reached ("");
+       }
+      prev = res;
+      prev_c = val->canonical;
+    }
+#endif
...
I found out that the precondition for using std::lower_bound of the
vector being sorted in a certain way is not valid:
...
PREV_IT: uint32_t
IT: tcbhead_t
PREV_IT: 1
IT: -2
<gdb_assert>
...
so my hypothesis is that this causes the segfault somehow.

The test passes reliable when sorting at the entry of
cooked_index_vector::find (which is of course inefficient).

Thanks,
- Tom
Tom Tromey June 20, 2021, 6:17 p.m. | #8
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:


Tom> I found out that the precondition for using std::lower_bound of the
Tom> vector being sorted in a certain way is not valid:
[...]

While debugging a different failure today, I found that the 'find'
method wasn't waiting for the future to resolve, so it was allowing use
of the vector before it was ready.  This is fixed on the branch now.

I have 125 regressions to go.  At the same time I think part of the
quick function code still has to be rewritten; I haven't tested DWO yet;
and I am not sure all the modes that Ada supports are handled by the
lookup code.  So there's still a reasonable amount to do.  Then of
course there's the final rebasing to shape it up, the branch history is
particularly ugly at this moment.

Tom
Tom Tromey June 20, 2021, 11:44 p.m. | #9
Tom> Tried the updated branch and ran into a race condition, fixed in
Tom> attached patch.

How did you observe it?

Tom> +  static std::mutex mutex;
Tom>    asection *sectp;
Tom>    bfd *abfd;
Tom>    gdb_byte *buf, *retbuf;
 
Tom> +  std::lock_guard<std::mutex> guard (mutex);
Tom>    if (readin)
Tom>      return;
Tom>    buffer = NULL;

I don't think we want to do this, because BFD isn't thread-safe, and
I've been avoiding touching it.  Instead I think we'll have to track
down the problems spots and arrange for the required sections to be read
in before any threaded work is started.

Making BFD "thread-safe-enough" might be worthwhile, I'm not sure.

Tom
Tom de Vries June 21, 2021, 9:18 a.m. | #10
On 6/21/21 1:44 AM, Tom Tromey wrote:
> Tom> Tried the updated branch and ran into a race condition, fixed in

> Tom> attached patch.

> 

> How did you observe it?

> 


Download and extract
https://download.opensuse.org/tumbleweed/repo/debug/x86_64/glibc-debuginfo-2.33-8.1.x86_64.rpm
.

Then:
...
$ gdb -q -batch
~/reproduce/usr/lib/debug/usr/lib64/ld-2.33.so-2.33-8.1.x86_64.debug
Dwarf Error: DW_FORM_line_strp used without .debug_line_str section [in
module
/home/vries/reproduce/usr/lib/debug/usr/lib64/ld-2.33.so-2.33-8.1.x86_64.debug]
...

> Tom> +  static std::mutex mutex;

> Tom>    asection *sectp;

> Tom>    bfd *abfd;

> Tom>    gdb_byte *buf, *retbuf;

>  

> Tom> +  std::lock_guard<std::mutex> guard (mutex);

> Tom>    if (readin)

> Tom>      return;

> Tom>    buffer = NULL;

> 

> I don't think we want to do this, because BFD isn't thread-safe, and

> I've been avoiding touching it.  Instead I think we'll have to track

> down the problems spots and arrange for the required sections to be read

> in before any threaded work is started.

> 


Ack, I see.

---

I rebuild the updated branch and ran the tests and investigated this
dwarf error:
...
(gdb) file
/home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.arch/amd64-disp-step-avx/amd64-disp-step-avx^M
Reading symbols from
/home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.arch/amd64-disp-step-avx/amd64-disp-step-avx...^M
Dwarf Error: Could not find abbrev number 2391 in CU at offset 0x29 [in
module
/home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.arch/amd64-disp-step-avx/amd64-disp-step-avx]^M
...
when using gcc-11.

I managed to fix that using this patch:
...
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 265ebee1de8..f98acfa33ed 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -7204,7 +7204,8 @@ skip_one_die (const struct die_reader_specs
*reader, const gdb_b
yte *info_ptr,
       /* We only handle DW_FORM_ref4 here.  */
       const gdb_byte *sibling_data = info_ptr + abbrev->sibling_offset;
       unsigned int offset = read_4_bytes (abfd, sibling_data);
-      const gdb_byte *sibling_ptr = buffer + offset;
+      const gdb_byte *sibling_ptr
+       = buffer + to_underlying (cu->header.sect_off) + offset;
       if (sibling_ptr >= info_ptr && sibling_ptr < reader->buffer_end)
        return sibling_ptr;
       /* Fall through to the slow way.  */
...

The DW_FORM_ref4 offset is relative to the start of the CU, but buffer
points to the start of .debug_info.

Thanks,
- Tom
Tom de Vries June 22, 2021, 9:16 a.m. | #11
On 6/20/21 8:17 PM, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

> 

> Tom> I found out that the precondition for using std::lower_bound of the

> Tom> vector being sorted in a certain way is not valid:

> [...]

> 

> While debugging a different failure today, I found that the 'find'

> method wasn't waiting for the future to resolve, so it was allowing use

> of the vector before it was ready.  This is fixed on the branch now.

> 

> I have 125 regressions to go.


I did a build and reg-test with the current state of branch, and the two
patches I've mentioned.

I get:
...
$ grep -c ^FAIL: gdb.sum
218
$ grep -c ^ERROR: gdb.sum
193
...

Possibly more fails because of using
--with-separate-debug-dir=/usr/lib/debug ? Just to give you another data
point, not sure if this is terribly relevant.

I also did a speed test with my running example of branch tip vs merge
base, and got:
...
$ DEBUGINFOD_URLS= DBG=~/dwz/measure/time.sh ./gdb.sh -q -batch lto/cc1
maxmem: 348472
real: 0.50
user: 1.09
system: 0.09
$ DEBUGINFOD_URLS= DBG=~/dwz/measure/time.sh ./gdb.sh -q -batch lto/cc1
maxmem: 509212
real: 2.58
user: 2.53
system: 0.09
...
which is a nice factor 5 speedup, or 2 seconds in absolute terms.

However, when I do also set the breakpoint:
...
$ DEBUGINFOD_URLS= DBG=~/dwz/measure/time.sh ./gdb.sh -q -batch lto/cc1
-ex "b do_rpo_vn"
Breakpoint 1 at 0xd40e30: do_rpo_vn. (2 locations)
maxmem: 1011140
real: 5.09
user: 5.54
system: 0.24
$ DEBUGINFOD_URLS= DBG=~/dwz/measure/time.sh ./gdb.sh -q -batch lto/cc1
-ex "b do_rpo_vn"
Breakpoint 1 at 0xd40e30: do_rpo_vn. (2 locations)
maxmem: 873952
real: 5.55
user: 5.37
system: 0.24
...
I'm left with just 0.5 seconds speedup.

I investigated whether that was caused by more CUs being expanded, but
that doesn't seem to be the case.

So I'm tentatively drawing the conclusion that it makes sense for me to
continue on the lazy expansion effort.

Thanks,
- Tom

> At the same time I think part of the

> quick function code still has to be rewritten; I haven't tested DWO yet;

> and I am not sure all the modes that Ada supports are handled by the

> lookup code.  So there's still a reasonable amount to do.  Then of

> course there's the final rebasing to shape it up, the branch history is

> particularly ugly at this moment.

> 

> Tom

>
Tom Tromey June 27, 2021, 10:24 p.m. | #12
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:


Tom> I managed to fix that using this patch:
Tom> ...
Tom> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
Tom> index 265ebee1de8..f98acfa33ed 100644
Tom> --- a/gdb/dwarf2/read.c
Tom> +++ b/gdb/dwarf2/read.c
Tom> @@ -7204,7 +7204,8 @@ skip_one_die (const struct die_reader_specs
Tom> *reader, const gdb_b
Tom> yte *info_ptr,
Tom>        /* We only handle DW_FORM_ref4 here.  */
Tom>        const gdb_byte *sibling_data = info_ptr + abbrev->sibling_offset;
Tom>        unsigned int offset = read_4_bytes (abfd, sibling_data);
Tom> -      const gdb_byte *sibling_ptr = buffer + offset;
Tom> +      const gdb_byte *sibling_ptr
Tom> +       = buffer + to_underlying (cu->header.sect_off) + offset;
Tom>        if (sibling_ptr >= info_ptr && sibling_ptr < reader->buffer_end)
Tom>         return sibling_ptr;
Tom>        /* Fall through to the slow way.  */
Tom> ...

Tom> The DW_FORM_ref4 offset is relative to the start of the CU, but buffer
Tom> points to the start of .debug_info.

Thanks.  I think I copied this without realizing or remembering that
read_attribute_value adds in the section offset automatically.

Tom
Tom Tromey June 28, 2021, 12:48 a.m. | #13
Tom> Instead I think we'll have to track
Tom> down the problems spots and arrange for the required sections to be read
Tom> in before any threaded work is started.

I pushed a patch to the branch that fixes this.
I didn't look too deeply, it might read too many sections.

Tom

Patch

diff --git a/COVER-LETTER b/COVER-LETTER
new file mode 100644
index 00000000000..d273939703b
--- /dev/null
+++ b/COVER-LETTER
@@ -0,0 +1 @@ 
+Lazy expansion of full symbol table