[Gold] Recognize clang-style crtbegin and crtend files

Message ID CAGSvup8i2X19GxpOG2=nBm+JZeFvJjAne8tfWZ0q4nGtXsfqiw@mail.gmail.com
State New
Headers show
Series
  • [Gold] Recognize clang-style crtbegin and crtend files
Related show

Commit Message

augustine.sterling@gmail.com Nov. 13, 2019, 10:07 p.m.
For installations of its native implementations of crtbegin and
crtend, Clang uses custom names of the form:

clang_rt.crtbegin[-target-name].o

This patch modifies Layout::match_file_name so that gold treats these
files the same as any other crtbegin and crtend.

2019-11-13  Sterling Augustine <augustine.sterling@gmail.com>
       * layout.cc (Layout::match_file_name): Handle clang-style
       crtbegin and crtend filenames.

OK for trunk?

Comments

augustine.sterling@gmail.com Nov. 15, 2019, 11:24 p.m. | #1
On Wed, Nov 13, 2019 at 2:07 PM augustine.sterling@gmail.com
<augustine.sterling@gmail.com> wrote:
> 2019-11-13  Sterling Augustine <augustine.sterling@gmail.com>

>        * layout.cc (Layout::match_file_name): Handle clang-style

>        crtbegin and crtend filenames.


Ping?
Fangrui Song Nov. 16, 2019, 5:59 a.m. | #2
On 2019-11-15, augustine.sterling@gmail.com wrote:
>On Wed, Nov 13, 2019@2:07 PM augustine.sterling@gmail.com

><augustine.sterling@gmail.com> wrote:

>> 2019-11-13  Sterling Augustine <augustine.sterling@gmail.com>

>>        * layout.cc (Layout::match_file_name): Handle clang-style

>>        crtbegin and crtend filenames.

>

>Ping?


https://sourceware.org/ml/binutils/2019-09/msg00003.html


BTW, the patch looks good to me.
Cary Coutant Dec. 28, 2019, 12:50 a.m. | #3
> This patch modifies Layout::match_file_name so that gold treats these

> files the same as any other crtbegin and crtend.

>

> 2019-11-13  Sterling Augustine <augustine.sterling@gmail.com>

>        * layout.cc (Layout::match_file_name): Handle clang-style

>        crtbegin and crtend filenames.


-  if (base_len != match_len + 2 && base_len != match_len + 3)
-    return false;

By removing this, you're allowing the "-<target-name>" part on
gcc-style filenames as well as clang-style. Unless you need to match
that part even without the "clang_rt." prefix, I'd prefer to keep the
stricter matching for gcc-style filenames. And it might be a good idea
to at least check for the "-" before the target name. We just want to
minimize the odds of accidentally matching a user-supplied object
file.

(I really wish there were a better way to handle these sections. I
hope you need this only to support section sorting, and not for the
special treatment of .ctors and .dtors sections, which is a legacy you
shouldn't be saddled with in clang.)

-cary
Fangrui Song Dec. 28, 2019, 5:57 a.m. | #4
On 2019-12-27, Cary Coutant wrote:
>> This patch modifies Layout::match_file_name so that gold treats these

>> files the same as any other crtbegin and crtend.

>>

>> 2019-11-13  Sterling Augustine <augustine.sterling@gmail.com>

>>        * layout.cc (Layout::match_file_name): Handle clang-style

>>        crtbegin and crtend filenames.

>

>-  if (base_len != match_len + 2 && base_len != match_len + 3)

>-    return false;

>

>By removing this, you're allowing the "-<target-name>" part on

>gcc-style filenames as well as clang-style. Unless you need to match

>that part even without the "clang_rt." prefix, I'd prefer to keep the

>stricter matching for gcc-style filenames. And it might be a good idea

>to@least check for the "-" before the target name. We just want to

>minimize the odds of accidentally matching a user-supplied object

>file.

>

>(I really wish there were a better way to handle these sections. I

>hope you need this only to support section sorting, and not for the

>special treatment of .ctors and .dtors sections, which is a legacy you

>shouldn't be saddled with in clang.)

>

>-cary


We can use a looser rule which matches GNU ld more closely.

% ld.bfd --verbose | grep -A12 ' .ctors '
   .ctors          :
   {
     /* gcc uses crtbegin.o to find the start of
        the constructors, so we make sure it is
        first.  Because this is a wildcard, it
        doesn't matter if the user does not
        actually link against crtbegin.o; the
        linker won't look for a file to match a
        wildcard.  The wildcard also means that it
        doesn't matter which directory crtbegin.o
        is in.  */
     KEEP (*crtbegin.o(.ctors))
     KEEP (*crtbegin?.o(.ctors))

If divergence from other compiler-rt files (e.g.
libclang_rt.asan-i386.a) is not a concern, clang_rt.i386.crtbegin.o
should just work with GNU ld.


(On the clang side, I've done https://reviews.llvm.org/D71393 . Since
clang 10, all Linux builds will default to .init_array .
-fno-use-init-array (not recognized by GCC) is required to produce
.ctors)
augustine.sterling@gmail.com Dec. 30, 2019, 5:33 p.m. | #5
On Fri, Dec 27, 2019 at 4:50 PM Cary Coutant <ccoutant@gmail.com> wrote:
> By removing this, you're allowing the "-<target-name>" part on

> gcc-style filenames as well as clang-style. Unless you need to match

> that part even without the "clang_rt." prefix, I'd prefer to keep the

> stricter matching for gcc-style filenames. And it might be a good idea

> to at least check for the "-" before the target name. We just want to

> minimize the odds of accidentally matching a user-supplied object

> file.


Thanks for the review.

Enclosed is a patch that handles clang-style names completely
separately, and also
checks for the "-" before the target name.

> (I really wish there were a better way to handle these sections.

+1

> I

> hope you need this only to support section sorting, and not for the

> special treatment of .ctors and .dtors sections, which is a legacy you

> shouldn't be saddled with in clang.)


Unfortunately, we are saddled with this for certain legacy situations,
even with clang. I
wish it weren't so as well.
diff --git a/gold/ChangeLog b/gold/ChangeLog
index cc5da7d964..63d6dbeb50 100644
--- a/gold/ChangeLog
+++ b/gold/ChangeLog
@@ -1,3 +1,7 @@
+2019-11-13  Sterling Augustine <augustine.sterling@gmail.com>
+	* layout.cc (Layout::match_file_name): Handle clang-style
+	crtbegin and crtend filenames.
+
 2019-11-11  Miguel Saldivar  <saldivarcher@gmail.com>
 
 	PR 24996
diff --git a/gold/layout.cc b/gold/layout.cc
index 194d088c2a..801e48aa40 100644
--- a/gold/layout.cc
+++ b/gold/layout.cc
@@ -5581,17 +5581,31 @@ Layout::output_section_name(const Relobj* relobj, const char* name,
 // to match crtbegin.o as well as crtbeginS.o without getting confused
 // by other possibilities.  Overall matching the file name this way is
 // a dreadful hack, but the GNU linker does it in order to better
-// support gcc, and we need to be compatible.
+// support gcc, and we need to be compatible. Also handle llvm style
+// clang_rt.crtbegin.o and clang_rt.crtbegin-<target-name>.o.
 
 bool
 Layout::match_file_name(const Relobj* relobj, const char* match)
 {
+  size_t match_len = strlen(match);
   const std::string& file_name(relobj->name());
   const char* base_name = lbasename(file_name.c_str());
-  size_t match_len = strlen(match);
+  size_t base_len = strlen(base_name);
+
+  const char* clang_prefix = "clang_rt.";
+  size_t clang_len = strlen(clang_prefix);
+  if (strncmp(base_name, clang_prefix, clang_len) == 0) {
+    base_name += clang_len;
+    base_len -= clang_len;
+    if (strncmp(base_name, match, match_len) != 0)
+      return false;
+    if (*(base_name + match_len) != '.' &&
+        *(base_name + match_len) != '-')
+      return false;
+    return memcmp(base_name + base_len - 2, ".o", 2) == 0;
+  }
   if (strncmp(base_name, match, match_len) != 0)
     return false;
-  size_t base_len = strlen(base_name);
   if (base_len != match_len + 2 && base_len != match_len + 3)
     return false;
   return memcmp(base_name + base_len - 2, ".o", 2) == 0;
Cary Coutant Jan. 8, 2020, 1:25 a.m. | #6
> Enclosed is a patch that handles clang-style names completely

> separately, and also

> checks for the "-" before the target name.


+    if (*(base_name + match_len) != '.' &&
+        *(base_name + match_len) != '-')
+      return false;

Please write these as base_name[match_len].

But Fangrui Song's most recent message in this thread raised a red flag:

> We can use a looser rule which matches GNU ld more closely.

>

> % ld.bfd --verbose | grep -A12 ' .ctors '

>    .ctors          :

>    {

>      /* gcc uses crtbegin.o to find the start of

>         the constructors, so we make sure it is

>         first.  Because this is a wildcard, it

>         doesn't matter if the user does not

>         actually link against crtbegin.o; the

>         linker won't look for a file to match a

>         wildcard.  The wildcard also means that it

>         doesn't matter which directory crtbegin.o

>         is in.  */

>      KEEP (*crtbegin.o(.ctors))

>      KEEP (*crtbegin?.o(.ctors))

>

> If divergence from other compiler-rt files (e.g.

> libclang_rt.asan-i386.a) is not a concern, clang_rt.i386.crtbegin.o

> should just work with GNU ld.


This seems to imply that the filename convention is not
clang_rt.crtbegin-<target>.o, but instead
clang_rt.<target>.crtbegin.o. I looked for a corresponding Gnu ld
patch but couldn't find one, so it does seem likely that you're using
the latter convention, which just happens to work with the existing
Gnu ld default script.

Can you clarify which of these two naming conventions you intend to
support? If it's the former, this patch is OK with the above change.
If the latter, I think you'll need to adjust this patch.

-cary
augustine.sterling@gmail.com Jan. 8, 2020, 4:58 p.m. | #7
On Tue, Jan 7, 2020 at 5:26 PM Cary Coutant <ccoutant@gmail.com> wrote:
> +    if (*(base_name + match_len) != '.' &&

> +        *(base_name + match_len) != '-')

> +      return false;

>

> Please write these as base_name[match_len].


Updated.

> This seems to imply that the filename convention is not

> clang_rt.crtbegin-<target>.o, but instead

> clang_rt.<target>.crtbegin.o.


To the best of my knowledge, clang never uses this form. I haven't
seen it in the wild.

> Can you clarify which of these two naming conventions you intend to

> support? If it's the former, this patch is OK with the above change.

> If the latter, I think you'll need to adjust this patch.


Clang uses two forms of its own (plus all of the gcc ones), which can
be found in clang/lib/Driver/ToolChain.cpp:416:

$(target_directory)/clang_rt.crtbegin.o
clang_rt.crtbegin-$target.o

This updated patch supports both of those forms.
diff --git a/gold/ChangeLog b/gold/ChangeLog
index cc5da7d964..63d6dbeb50 100644
--- a/gold/ChangeLog
+++ b/gold/ChangeLog
@@ -1,3 +1,7 @@
+2019-11-13  Sterling Augustine <augustine.sterling@gmail.com>
+	* layout.cc (Layout::match_file_name): Handle clang-style
+	crtbegin and crtend filenames.
+
 2019-11-11  Miguel Saldivar  <saldivarcher@gmail.com>
 
 	PR 24996
diff --git a/gold/layout.cc b/gold/layout.cc
index 194d088c2a..ebd82df146 100644
--- a/gold/layout.cc
+++ b/gold/layout.cc
@@ -5581,17 +5581,30 @@ Layout::output_section_name(const Relobj* relobj, const char* name,
 // to match crtbegin.o as well as crtbeginS.o without getting confused
 // by other possibilities.  Overall matching the file name this way is
 // a dreadful hack, but the GNU linker does it in order to better
-// support gcc, and we need to be compatible.
+// support gcc, and we need to be compatible. Also handle llvm style
+// clang_rt.crtbegin.o and clang_rt.crtbegin-<target-name>.o.
 
 bool
 Layout::match_file_name(const Relobj* relobj, const char* match)
 {
+  size_t match_len = strlen(match);
   const std::string& file_name(relobj->name());
   const char* base_name = lbasename(file_name.c_str());
-  size_t match_len = strlen(match);
+  size_t base_len = strlen(base_name);
+
+  const char* clang_prefix = "clang_rt.";
+  size_t clang_len = strlen(clang_prefix);
+  if (strncmp(base_name, clang_prefix, clang_len) == 0) {
+    base_name += clang_len;
+    base_len -= clang_len;
+    if (strncmp(base_name, match, match_len) != 0)
+      return false;
+    if (base_name[match_len] != '.' && base_name[match_len] != '-')
+      return false;
+    return memcmp(base_name + base_len - 2, ".o", 2) == 0;
+  }
   if (strncmp(base_name, match, match_len) != 0)
     return false;
-  size_t base_len = strlen(base_name);
   if (base_len != match_len + 2 && base_len != match_len + 3)
     return false;
   return memcmp(base_name + base_len - 2, ".o", 2) == 0;
Cary Coutant Jan. 8, 2020, 8:37 p.m. | #8
> Clang uses two forms of its own (plus all of the gcc ones), which can

> be found in clang/lib/Driver/ToolChain.cpp:416:

>

> $(target_directory)/clang_rt.crtbegin.o

> clang_rt.crtbegin-$target.o

>

> This updated patch supports both of those forms.


Thanks. This is OK.

-cary

Patch

diff --git a/gold/layout.cc b/gold/layout.cc
index 194d088c2a..dc841b77c8 100644
--- a/gold/layout.cc
+++ b/gold/layout.cc
@@ -5581,19 +5581,21 @@  Layout::output_section_name(const Relobj* relobj, const char* name,
 // to match crtbegin.o as well as crtbeginS.o without getting confused
 // by other possibilities.  Overall matching the file name this way is
 // a dreadful hack, but the GNU linker does it in order to better
-// support gcc, and we need to be compatible.
+// support gcc, and we need to be compatible. Also handle llvm style
+// clang_rt.crtbegin[X]-<target-name>.o.
 
 bool
 Layout::match_file_name(const Relobj* relobj, const char* match)
 {
   const std::string& file_name(relobj->name());
   const char* base_name = lbasename(file_name.c_str());
+  const char* clang_prefix = "clang_rt.";
+  if (strncmp(base_name, clang_prefix, strlen(clang_prefix)) == 0)
+    base_name += strlen(clang_prefix);
   size_t match_len = strlen(match);
   if (strncmp(base_name, match, match_len) != 0)
     return false;
   size_t base_len = strlen(base_name);
-  if (base_len != match_len + 2 && base_len != match_len + 3)
-    return false;
   return memcmp(base_name + base_len - 2, ".o", 2) == 0;
 }