[REVIEW,10/10] bfd, ld, libctf: skip zero-refcount strings in CTF string reporting

Message ID 20210227132954.7766-11-nick.alcock@oracle.com
State New
Headers show
Series
  • libctf: cleanups, speedups, and bugfixes (one needing review)
Related show

Commit Message

Alan Modra via Binutils Feb. 27, 2021, 1:29 p.m.
This is a tricky one.  BFD, on the linker's behalf, reports symbols to
libctf via the ctf_new_symbol and ctf_new_dynsym callbacks, which
ultimately call ctf_link_add_linker_symbol.  But while this happens
after strtab offsets are finalized, it happens before the .dynstr is
actually laid out, so we can't iterate over it at this stage and
it is not clear what the reported symbols are actually called.  So
a second callback, examine_strtab, is called after the .dynstr is
finalized, which calls ctf_link_add_strtab and ultimately leads
to ldelf_ctf_strtab_iter_cb being called back repeatedly until the
offsets of every string in the .dynstr is passed to libctf.

libctf can then use this to get symbol names out of the input (which
usually stores symbol types in the form of a name -> type mapping at
this stage) and extract the types of those symbols, feeding them back
into their final form as a 1:1 association with the real symtab's
STT_OBJ and STT_FUNC symbols (with a few skipped, see
ctf_symtab_skippable).

This representation is compact, but has one problem: if libctf somehow
gets confused about the st_type of a symbol, it'll stick an entry into
the function symtypetab when it should put it into the object
symtypetab, or vice versa, and *every symbol from that one on* will have
the wrong CTF type because it's actually looking up the type for a
different symbol.

And we have just such a bug.  ctf_link_add_strtab was not taking the
refcounts of strings into consideration, so even strings that had been
eliminated from the strtab by virtue of being in objects eliminated via
--as-needed etc were being reported.  This is harmful because it can
lead to multiple strings with the same apparent offset, and if the last
duplicate to be reported relates to an eliminated symbol, we look up the
wrong symbol from the input and gets its type wrong: if it's unlucky and
the eliminated symbol is also of the wrong st_type, we will end up with
a corrupted symtypetab.

Thankfully the wrong-st_type case is already diagnosed by a
this-can-never-happen paranoid warning:

  CTF warning: Symbol 61a added to CTF as a function but is of type 1

or the converse

 * CTF warning: Symbol a3 added to CTF as a data object but is of type 2

so at least we can tell when the corruption has spread to more than one
symbol's type.

Skipping zero-refcounted strings is easy: teach _bfd_elf_strtab_str to
skip them, and ldelf_ctf_strtab_iter_cb to loop over skipped strings
until it falls off the end or finds one that isn't skipped.

bfd/ChangeLog
2021-02-26  Nick Alcock  <nick.alcock@oracle.com>

	* elf-strtab.c (_bfd_elf_strtab_str): Skip strings with zero refcount.

ld/ChangeLog
2021-02-26  Nick Alcock  <nick.alcock@oracle.com>

	* ldelfgen.c (ldelf_ctf_strtab_iter_cb): Skip zero-refcount strings.

libctf/ChangeLog
2021-02-26  Nick Alcock  <nick.alcock@oracle.com>

	* ctf-create.c (symtypetab_density): Report the symbol name as
	well as index in the name != object error; note the likely
	consequences.
	* ctf-link.c (ctf_luink_shuffle_syms): Report the symbol index
	as well as name.
---
 bfd/ChangeLog       |  4 ++++
 bfd/elf-strtab.c    |  2 ++
 ld/ChangeLog        |  4 ++++
 ld/ldelfgen.c       | 16 +++++++++++-----
 libctf/ChangeLog    |  8 ++++++++
 libctf/ctf-create.c | 16 ++++++++++------
 libctf/ctf-link.c   |  3 ++-
 7 files changed, 41 insertions(+), 12 deletions(-)

-- 
2.30.0.252.gc27e85e57d

Comments

Alan Modra via Binutils Feb. 27, 2021, 6:28 p.m. | #1
On 27 Feb 2021, Nick Alcock via Binutils outgrape:
> 	* ctf-link.c (ctf_luink_shuffle_syms): Report the symbol index

> 	as well as name.


I'll fix this spelling error: this is not a luinker.
Alan Modra via Binutils March 1, 2021, 3:17 p.m. | #2
Hi Nick,

> bfd/ChangeLog

> 2021-02-26  Nick Alcock  <nick.alcock@oracle.com>

> 

> 	* elf-strtab.c (_bfd_elf_strtab_str): Skip strings with zero refcount.

> 

> ld/ChangeLog

> 2021-02-26  Nick Alcock  <nick.alcock@oracle.com>

> 

> 	* ldelfgen.c (ldelf_ctf_strtab_iter_cb): Skip zero-refcount strings.


This patch is approved, although I have a few minor formatting nits that you
might like to correct:


> diff --git a/bfd/ChangeLog b/bfd/ChangeLog

> index 2fef817d734..f1b4e8a2a46 100644

> --- a/bfd/ChangeLog

> +++ b/bfd/ChangeLog

> @@ -1,3 +1,7 @@

> +2021-02-26  Nick Alcock  <nick.alcock@oracle.com>

> +

> +	* elf-strtab.c (_bfd_elf_strtab_str): Skip strings with zero refcount.

> +


Please can changelog entries not be submitted as context diffs.
They almost never apply cleanly...

> @@ -302,6 +302,8 @@ _bfd_elf_strtab_str (struct elf_strtab_hash *tab, size_t idx,

>       return 0;

>     BFD_ASSERT (idx < tab->size);

>     BFD_ASSERT (tab->sec_size);

> +  if (tab->array[idx]->refcount == 0)

> +    return 0;


This should really be "return NULL;".  (I know that there is a 'return 0;' a few
lines above, but that one is wrong too).


> @@ -375,13 +375,19 @@ ldelf_ctf_strtab_iter_cb (uint32_t *offset, void *arg_)

>     if (arg->next_i == 0)

>       arg->next_i = 1;

>   

> -  if (arg->next_i >= _bfd_elf_strtab_len (arg->strtab))

> +  /* Hunt through strings until we fall off the end or find one with

> +     a nonzero refcount.  */

> +  do

>       {

> -      arg->next_i = 0;

> -      return NULL;

> -    }

> +      if (arg->next_i >= _bfd_elf_strtab_len (arg->strtab))

> +	{

> +	  arg->next_i = 0;

> +	  return NULL;

> +	}

> +

> +      ret = _bfd_elf_strtab_str (arg->strtab, arg->next_i++, &off);

> +    } while (ret == 0);


The while statement should be on its own line.  IE:

     }
   while (ret == 0);

Cheers
   Nick
Alan Modra via Binutils March 2, 2021, 2:26 p.m. | #3
On 1 Mar 2021, Nick Clifton via Binutils spake thusly:
>> diff --git a/bfd/ChangeLog b/bfd/ChangeLog

>> index 2fef817d734..f1b4e8a2a46 100644

>> --- a/bfd/ChangeLog

>> +++ b/bfd/ChangeLog

>> @@ -1,3 +1,7 @@

>> +2021-02-26  Nick Alcock  <nick.alcock@oracle.com>

>> +

>> +	* elf-strtab.c (_bfd_elf_strtab_str): Skip strings with zero refcount.

>> +

>

> Please can changelog entries not be submitted as context diffs.

> They almost never apply cleanly...


Sorry! This is due to my absentmindedly using git format-patch rather
than the alias I normally use to chop them out when sending out patch
mails. (FWIW, the arcane git syntax needed to automatically elide
ChangeLogs is this line noise:

git format-patch "$@" -- ':!:./**/ChangeLog'

which explains why I normally use an alias.)

>> @@ -302,6 +302,8 @@ _bfd_elf_strtab_str (struct elf_strtab_hash *tab, size_t idx,

>>       return 0;

>>     BFD_ASSERT (idx < tab->size);

>>     BFD_ASSERT (tab->sec_size);

>> +  if (tab->array[idx]->refcount == 0)

>> +    return 0;

>

> This should really be "return NULL;".  (I know that there is a 'return 0;' a few

> lines above, but that one is wrong too).


I really should have spotted that. Fixed.

>> +      ret = _bfd_elf_strtab_str (arg->strtab, arg->next_i++, &off);

>> +    } while (ret == 0);

>

> The while statement should be on its own line.  IE:

>

>     }

>   while (ret == 0);


... whoops, I had no idea GNU syntax decreed that (though it is
perfectly clear in standards.texi). Will fix. I'll fix up libctf to
follow this too (in a separate commit, coming soon, along with fixing up
some tabdamage etc) since it violates this all over the place.

Will push this batch later today.
Alan Modra via Binutils March 2, 2021, 3:16 p.m. | #4
On 1 Mar 2021, Nick Clifton via Binutils verbalised:

>> +      ret = _bfd_elf_strtab_str (arg->strtab, arg->next_i++, &off);

>> +    } while (ret == 0);

>

> The while statement should be on its own line.  IE:

>

>     }

>   while (ret == 0);


It's also checking the return value of the same function you just
commented on, so it too should be comparing with NULL. (Adjusted,
pushed, thanks for spotting this mindless cut-and-pasto before it spread
too far!)

Patch

diff --git a/bfd/ChangeLog b/bfd/ChangeLog
index 2fef817d734..f1b4e8a2a46 100644
--- a/bfd/ChangeLog
+++ b/bfd/ChangeLog
@@ -1,3 +1,7 @@ 
+2021-02-26  Nick Alcock  <nick.alcock@oracle.com>
+
+	* elf-strtab.c (_bfd_elf_strtab_str): Skip strings with zero refcount.
+
 2021-02-25  Alan Modra  <amodra@gmail.com>
 
 	PR 27441
diff --git a/bfd/elf-strtab.c b/bfd/elf-strtab.c
index 3e5e1622109..5a9aec55d60 100644
--- a/bfd/elf-strtab.c
+++ b/bfd/elf-strtab.c
@@ -302,6 +302,8 @@  _bfd_elf_strtab_str (struct elf_strtab_hash *tab, size_t idx,
     return 0;
   BFD_ASSERT (idx < tab->size);
   BFD_ASSERT (tab->sec_size);
+  if (tab->array[idx]->refcount == 0)
+    return 0;
   if (offset)
     *offset = tab->array[idx]->u.index;
   return tab->array[idx]->root.string;
diff --git a/ld/ChangeLog b/ld/ChangeLog
index 6540407237c..5f5242b2a22 100644
--- a/ld/ChangeLog
+++ b/ld/ChangeLog
@@ -1,3 +1,7 @@ 
+2021-02-26  Nick Alcock  <nick.alcock@oracle.com>
+
+	* ldelfgen.c (ldelf_ctf_strtab_iter_cb): Skip zero-refcount strings.
+
 2021-02-26  Alan Modra  <amodra@gmail.com>
 
 	PR 27441
diff --git a/ld/ldelfgen.c b/ld/ldelfgen.c
index df3dae0abe8..682cb23d469 100644
--- a/ld/ldelfgen.c
+++ b/ld/ldelfgen.c
@@ -375,13 +375,19 @@  ldelf_ctf_strtab_iter_cb (uint32_t *offset, void *arg_)
   if (arg->next_i == 0)
     arg->next_i = 1;
 
-  if (arg->next_i >= _bfd_elf_strtab_len (arg->strtab))
+  /* Hunt through strings until we fall off the end or find one with
+     a nonzero refcount.  */
+  do
     {
-      arg->next_i = 0;
-      return NULL;
-    }
+      if (arg->next_i >= _bfd_elf_strtab_len (arg->strtab))
+	{
+	  arg->next_i = 0;
+	  return NULL;
+	}
+
+      ret = _bfd_elf_strtab_str (arg->strtab, arg->next_i++, &off);
+    } while (ret == 0);
 
-  ret = _bfd_elf_strtab_str (arg->strtab, arg->next_i++, &off);
   *offset = off;
 
   /* If we've overflowed, we cannot share any further strings: the CTF
diff --git a/libctf/ChangeLog b/libctf/ChangeLog
index f30f79ec6ee..a6cdf9f64fb 100644
--- a/libctf/ChangeLog
+++ b/libctf/ChangeLog
@@ -1,3 +1,11 @@ 
+2021-02-26  Nick Alcock  <nick.alcock@oracle.com>
+
+	* ctf-create.c (symtypetab_density): Report the symbol name as
+	well as index in the name != object error; note the likely
+	consequences.
+	* ctf-link.c (ctf_luink_shuffle_syms): Report the symbol index
+	as well as name.
+
 2021-02-25  Nick Alcock  <nick.alcock@oracle.com>
 
 	* ctf-link.c (ctf_link_shuffle_syms): Free ctf_dynsyms properly.
diff --git a/libctf/ctf-create.c b/libctf/ctf-create.c
index d417922e7fd..3f2c5dacb03 100644
--- a/libctf/ctf-create.c
+++ b/libctf/ctf-create.c
@@ -318,18 +318,22 @@  symtypetab_density (ctf_dict_t *fp, ctf_dict_t *symfp, ctf_dynhash_t *symhash,
 	  if ((flags & CTF_SYMTYPETAB_EMIT_FUNCTION)
 	      && sym->st_type != STT_FUNC)
 	    {
-	      ctf_err_warn (fp, 1, 0, _("Symbol %x added to CTF as a function "
-					"but is of type %x\n"),
-			    sym->st_symidx, sym->st_type);
+	      ctf_err_warn (fp, 1, 0, _("symbol %s (%x) added to CTF as a "
+					"function but is of type %x.  "
+					"The symbol type lookup tables "
+					"are probably corrupted"),
+			    sym->st_name, sym->st_symidx, sym->st_type);
 	      ctf_dynhash_remove (symhash, name);
 	      continue;
 	    }
 	  else if (!(flags & CTF_SYMTYPETAB_EMIT_FUNCTION)
 		   && sym->st_type != STT_OBJECT)
 	    {
-	      ctf_err_warn (fp, 1, 0, _("Symbol %x added to CTF as a data "
-					"object but is of type %x\n"),
-			    sym->st_symidx, sym->st_type);
+	      ctf_err_warn (fp, 1, 0, _("symbol %s (%x) added to CTF as a "
+					"data object but is of type %x.  "
+					"The symbol type lookup tables "
+					"are probably corrupted"),
+			    sym->st_name, sym->st_symidx, sym->st_type);
 	      ctf_dynhash_remove (symhash, name);
 	      continue;
 	    }
diff --git a/libctf/ctf-link.c b/libctf/ctf-link.c
index 882d4297e4f..5471fccd0f7 100644
--- a/libctf/ctf-link.c
+++ b/libctf/ctf-link.c
@@ -1552,7 +1552,8 @@  ctf_link_shuffle_syms (ctf_dict_t *fp)
 	 for skippability here.  */
       if (!ctf_symtab_skippable (&did->cid_sym))
 	{
-	  ctf_dprintf ("symbol name from linker: %s\n", did->cid_sym.st_name);
+	  ctf_dprintf ("symbol from linker: %s (%x)\n", did->cid_sym.st_name,
+		       did->cid_sym.st_symidx);
 
 	  if ((new_sym = malloc (sizeof (ctf_link_sym_t))) == NULL)
 	    goto local_oom;