[2/4] gdb: erase items from the source_cache::m_offset_cache

Message ID dd23d8f76af92abe21bf53cfb847a009eb50146b.1641565040.git.aburgess@redhat.com
State New
Headers show
Series
  • Source highlight non utf-8 characters using Python
Related show

Commit Message

Simon Marchi via Gdb-patches Jan. 7, 2022, 2:23 p.m.
The source_cache class has two member variables m_source_map, which
stores the file contents, and m_offset_cache, which stores offsets
into the file contents.

As source files are read the contents of the file, as well as the
offset data, are stored in the cache using these two member variables.

Whenever GDB needs either the files contents, or the offset data,
source_cache::ensure is called.  This function looks for the file in
m_source_map, and if it's found then this implies the file is also in
m_offset_cache, and we're done.

If the file is not in m_source_map then GDB calls
source_cache::get_plain_source_lines to open the file and read its
contents.  ::get_plain_source_lines also calculates the offset data,
which is then inserted into m_offset_cache.

Back in ::ensure, the file contents are added into m_source_map.  And
finally, if m_source_map contains more than MAX_ENTRIES, an entry is
removed from m_source_map.

The problem is entries are not removed from m_offset_cache at the same
time.

This means that if a program contains enough source files, GDB will
hold at most MAX_ENTRIES cached source file contents, but can contain
offsets data for every source file.

Now, the offsets data is going to be smaller than the cached file
contents, so maybe there's no harm here.  But, when we reload the file
contents we always recalculate the offsets data.  And, when we
::get_line_charpos asking for offset data we still call ::ensure which
will ends up loading and caching the file contents.

So, given the current code does the work of reloading the offset data
anyway, we may as well save memory by capping m_offset_cache to
MAX_ENTRIES just like we do m_source_map.

That's what this commit does.

There should be no user visible changes after this commit, except for
ever so slightly lower memory usage in some cases.
---
 gdb/source-cache.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.25.4

Comments

Tom Tromey Jan. 10, 2022, 3:24 p.m. | #1
>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:


Andrew> So, given the current code does the work of reloading the offset data
Andrew> anyway, we may as well save memory by capping m_offset_cache to
Andrew> MAX_ENTRIES just like we do m_source_map.

Andrew> That's what this commit does.

Andrew> There should be no user visible changes after this commit, except for
Andrew> ever so slightly lower memory usage in some cases.

I think it would be good to update the last part of this comment in
source_cache::ensure:

	  /* This should always hold, because we create the file
	     offsets when reading the file, and never free them
	     without also clearing the contents cache.  */

... but otherwise this looks good.

Tom
Simon Marchi via Gdb-patches Jan. 11, 2022, 12:17 p.m. | #2
* Tom Tromey <tom@tromey.com> [2022-01-10 08:24:48 -0700]:

> >>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:

> 

> Andrew> So, given the current code does the work of reloading the offset data

> Andrew> anyway, we may as well save memory by capping m_offset_cache to

> Andrew> MAX_ENTRIES just like we do m_source_map.

> 

> Andrew> That's what this commit does.

> 

> Andrew> There should be no user visible changes after this commit, except for

> Andrew> ever so slightly lower memory usage in some cases.

> 

> I think it would be good to update the last part of this comment in

> source_cache::ensure:

> 

> 	  /* This should always hold, because we create the file

> 	     offsets when reading the file, and never free them

> 	     without also clearing the contents cache.  */


What should it say now?  I think this comment was wrong previously, it
should have said:

        /* This should always hold, because we create the file
 	   offsets when reading the file, and never free them.  */

After my change the comment now seems to be correct.  But, maybe I'm
missing something...

Thanks,
Andrew
Tom Tromey Jan. 11, 2022, 2:54 p.m. | #3
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:


>> /* This should always hold, because we create the file

>> offsets when reading the file, and never free them

>> without also clearing the contents cache.  */


Andrew> What should it say now?  I think this comment was wrong previously, it
Andrew> should have said:

Andrew>         /* This should always hold, because we create the file
Andrew>  	   offsets when reading the file, and never free them.  */

I think they are freed sometimes by source_cache::clear.

Anyway how about

  /* This should always hold, because we create the file
     offsets when reading the file.  */

and just remove the mention of the freeing.

Tom
Simon Marchi via Gdb-patches Jan. 12, 2022, 11:38 a.m. | #4
* Tom Tromey <tom@tromey.com> [2022-01-11 07:54:00 -0700]:

> >>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

> 

> >> /* This should always hold, because we create the file

> >> offsets when reading the file, and never free them

> >> without also clearing the contents cache.  */

> 

> Andrew> What should it say now?  I think this comment was wrong previously, it

> Andrew> should have said:

> 

> Andrew>         /* This should always hold, because we create the file

> Andrew>  	   offsets when reading the file, and never free them.  */

> 

> I think they are freed sometimes by source_cache::clear.

> 

> Anyway how about

> 

>   /* This should always hold, because we create the file

>      offsets when reading the file.  */

> 

> and just remove the mention of the freeing.


Thanks, I made this change and pushed the patch below.

Andrew

---

commit 0e42221ac2cea234fe7a35797475f55f3d304b92
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Fri Nov 26 14:34:27 2021 +0000

    gdb: erase items from the source_cache::m_offset_cache
    
    The source_cache class has two member variables m_source_map, which
    stores the file contents, and m_offset_cache, which stores offsets
    into the file contents.
    
    As source files are read the contents of the file, as well as the
    offset data, are stored in the cache using these two member variables.
    
    Whenever GDB needs either the files contents, or the offset data,
    source_cache::ensure is called.  This function looks for the file in
    m_source_map, and if it's found then this implies the file is also in
    m_offset_cache, and we're done.
    
    If the file is not in m_source_map then GDB calls
    source_cache::get_plain_source_lines to open the file and read its
    contents.  ::get_plain_source_lines also calculates the offset data,
    which is then inserted into m_offset_cache.
    
    Back in ::ensure, the file contents are added into m_source_map.  And
    finally, if m_source_map contains more than MAX_ENTRIES, an entry is
    removed from m_source_map.
    
    The problem is entries are not removed from m_offset_cache at the same
    time.
    
    This means that if a program contains enough source files, GDB will
    hold at most MAX_ENTRIES cached source file contents, but can contain
    offsets data for every source file.
    
    Now, the offsets data is going to be smaller than the cached file
    contents, so maybe there's no harm here.  But, when we reload the file
    contents we always recalculate the offsets data.  And, when we
    ::get_line_charpos asking for offset data we still call ::ensure which
    will ends up loading and caching the file contents.
    
    So, given the current code does the work of reloading the offset data
    anyway, we may as well save memory by capping m_offset_cache to
    MAX_ENTRIES just like we do m_source_map.
    
    That's what this commit does.
    
    There should be no user visible changes after this commit, except for
    ever so slightly lower memory usage in some cases.

diff --git a/gdb/source-cache.c b/gdb/source-cache.c
index 0650768cc0e..7016476730a 100644
--- a/gdb/source-cache.c
+++ b/gdb/source-cache.c
@@ -162,9 +162,8 @@ source_cache::ensure (struct symtab *s)
     {
       if (m_source_map[i].fullname == fullname)
 	{
-	  /* This should always hold, because we create the file
-	     offsets when reading the file, and never free them
-	     without also clearing the contents cache.  */
+	  /* This should always hold, because we create the file offsets
+	     when reading the file.  */
 	  gdb_assert (m_offset_cache.find (fullname)
 		      != m_offset_cache.end ());
 	  /* Not strictly LRU, but at least ensure that the most
@@ -240,7 +239,11 @@ source_cache::ensure (struct symtab *s)
   m_source_map.push_back (std::move (result));
 
   if (m_source_map.size () > MAX_ENTRIES)
-    m_source_map.erase (m_source_map.begin ());
+    {
+      auto iter = m_source_map.begin ();
+      m_offset_cache.erase (iter->fullname);
+      m_source_map.erase (iter);
+    }
 
   return true;
 }

Patch

diff --git a/gdb/source-cache.c b/gdb/source-cache.c
index 733d1d272cd..9a5ac40497a 100644
--- a/gdb/source-cache.c
+++ b/gdb/source-cache.c
@@ -240,7 +240,11 @@  source_cache::ensure (struct symtab *s)
   m_source_map.push_back (std::move (result));
 
   if (m_source_map.size () > MAX_ENTRIES)
-    m_source_map.erase (m_source_map.begin ());
+    {
+      auto iter = m_source_map.begin ();
+      m_offset_cache.erase (iter->fullname);
+      m_source_map.erase (iter);
+    }
 
   return true;
 }