dl: Use "adr" assembler command to get proper load address

Message ID 20210907131616.23472-1-lukma@denx.de
State New
Headers show
Series
  • dl: Use "adr" assembler command to get proper load address
Related show

Commit Message

Lukasz Majewski Sept. 7, 2021, 1:16 p.m.
This change is a partial revert of commit
bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of
__ehdr_start linker variable to get the address of loaded program.

The elf_machine_load_address() function is declared in the
sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry
point for the program. It shall return the load address of the dynamic
linker program.
With this revert the 'adr' assembler instruction is used instead of a
place holder:

arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr
00000000 l       .note.gnu.build-id     00000000      __ehdr_start

which shall be pre-set by binutils.

This is crucial in the QEMU ARM environment for which (when /sbin/init
is executed) values set in __ehdr_start symbol are wrong. This causes
the program to crash very early - when the /lib/ld-linux-armhf.so.3 is
executed as a prerequisite to /sbin/init execution.
The kernel's fs/binfmt_elf.c is though responsible for setting
up execution environment, not binutils.

It looks like the only robust way to obtain the _dl_start offset is to
use assembler instruction - not rely on values provided by binutils.

HW:
Hardware name: ARM-Versatile Express (Run with QEMU)
Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1

When the /sbin/init is setup for run from Linux kernel's very small
environment with LD_DEBUG=all the __ehdr_start is not shown at all.

Fixes: BZ #28293
---
 sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

-- 
2.20.1

Comments

Szabolcs Nagy via Libc-alpha Sept. 7, 2021, 4:49 p.m. | #1
On 2021-09-07, Lukasz Majewski wrote:
>This change is a partial revert of commit

>bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of

>__ehdr_start linker variable to get the address of loaded program.

>

>The elf_machine_load_address() function is declared in the

>sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry

>point for the program. It shall return the load address of the dynamic

>linker program.


Yes.

>With this revert the 'adr' assembler instruction is used instead of a

>place holder:

>

>arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr

>00000000 l       .note.gnu.build-id     00000000      __ehdr_start

>

>which shall be pre-set by binutils.


I don't understand this. The sh_addr field of the binutils defined
__ehdr_start is 0.

Declararing __ehdr_start as hidden and accessing it with PC-relative
addressing generates computes the runtime address of __ehdr_start, which
equals the load base.

What's wrong with it? The objdump -t line doesn't tell why this change
is good. Can you compare objdump -dr output and show the incorrect code
sequences with C assessing __ehdr_start?

>This is crucial in the QEMU ARM environment for which (when /sbin/init

>is executed) values set in __ehdr_start symbol are wrong. This causes

>the program to crash very early - when the /lib/ld-linux-armhf.so.3 is

>executed as a prerequisite to /sbin/init execution.


>The kernel's fs/binfmt_elf.c is though responsible for setting

>up execution environment, not binutils.


Whether the symbol entry __ehdr_start exists doesn't matter.
The hidden definition does not have any associated relocation.

>It looks like the only robust way to obtain the _dl_start offset is to

>use assembler instruction - not rely on values provided by binutils.

>

>HW:

>Hardware name: ARM-Versatile Express (Run with QEMU)

>Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1

>

>When the /sbin/init is setup for run from Linux kernel's very small

>environment with LD_DEBUG=all the __ehdr_start is not shown at all.

>

>Fixes: BZ #28293

>---

> sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---

> 1 file changed, 25 insertions(+), 3 deletions(-)

>

>diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h

>index eb13cb8b57..58ebef6ecd 100644

>--- a/sysdeps/arm/dl-machine.h

>+++ b/sysdeps/arm/dl-machine.h

>@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr *ehdr)

> }

>

> /* Return the run-time load address of the shared object.  */

>-static inline ElfW(Addr) __attribute__ ((unused))

>+static inline Elf32_Addr __attribute__ ((unused))

> elf_machine_load_address (void)

> {

>-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;

>-  return (ElfW(Addr)) &__ehdr_start;

>+  Elf32_Addr pcrel_addr;

>+#ifdef SHARED

>+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");

>+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;

>+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));

>+#else

>+  extern Elf32_Addr __dl_relocate_static_pie (void *)

>+    asm ("_dl_relocate_static_pie") attribute_hidden;

>+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;

>+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));

>+#endif

>+#ifdef __thumb__

>+  /* Clear the low bit of the function address.

>+

>+     NOTE: got_addr is from GOT table whose lsb is always set by linker if it's

>+     Thumb function address.  PCREL_ADDR comes from PC-relative calculation

>+     which will finish during assembling.  GAS assembler before the fix for

>+     PR gas/21458 was not setting the lsb but does after that.  Always do the

>+     strip for both, so the code works with various combinations of glibc and

>+     Binutils.  */

>+  got_addr &= ~(Elf32_Addr) 1;

>+  pcrel_addr &= ~(Elf32_Addr) 1;

>+#endif

>+  return pcrel_addr - got_addr;

> }

>

> /* Return the link-time address of _DYNAMIC.  */

>-- 

>2.20.1

>
Lukasz Majewski Sept. 7, 2021, 5:32 p.m. | #2
Hi Fangrui,

> On 2021-09-07, Lukasz Majewski wrote:

> >This change is a partial revert of commit

> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of

> >__ehdr_start linker variable to get the address of loaded program.

> >

> >The elf_machine_load_address() function is declared in the

> >sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry

> >point for the program. It shall return the load address of the

> >dynamic linker program.  

> 

> Yes.

> 

> >With this revert the 'adr' assembler instruction is used instead of a

> >place holder:

> >

> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr

> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start

> >

> >which shall be pre-set by binutils.  

> 

> I don't understand this. The sh_addr field of the binutils defined

> __ehdr_start is 0.

> 

> Declararing __ehdr_start as hidden and accessing it with PC-relative

> addressing generates computes the runtime address of __ehdr_start,

> which equals the load base.


Maybe I don't get it - but shouldn't this be filled in by binutils
(linker ?) when the program is assembled before run?

Or is the __ehdr_start used just as a relative offset to get the proper
position of ld-linux-armh.so when called?

> 

> What's wrong with it? The objdump -t line doesn't tell why this change

> is good.


I'm also puzzled why this causes the QEMU versalite board to break.
(/sbin/init dies with SIGSEGFAULT = 0xb [*])

It looks like the &__ehdr_start is not equal in the result to 'adr'
assembler code in QEMU.

I also thought that the issue is caused by fs/binfmt_elf.c changes in
the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and
5.14. No change.

The QEMU is pretty recent - it is 6.0.0 version for ARM.

> Can you compare objdump -dr output and show the incorrect

> code sequences with C assessing __ehdr_start?


Ok. I will explicitly compare _dl_start function with and without this
patch.

> 

> >This is crucial in the QEMU ARM environment for which (when

> >/sbin/init is executed) values set in __ehdr_start symbol are wrong.

> >This causes the program to crash very early - when the

> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to /sbin/init

> >execution.  

> 

> >The kernel's fs/binfmt_elf.c is though responsible for setting

> >up execution environment, not binutils.  

> 

> Whether the symbol entry __ehdr_start exists doesn't matter.

> The hidden definition does not have any associated relocation.


I'm also wondering if similar issue is visible with "normal" - i.e. not
QEMU ARM run init.

Maybe the "rough" environment in which /sbin/init is run is the culprit?

> 

> >It looks like the only robust way to obtain the _dl_start offset is

> >to use assembler instruction - not rely on values provided by

> >binutils.

> >

> >HW:

> >Hardware name: ARM-Versatile Express (Run with QEMU)

> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1

> >

> >When the /sbin/init is setup for run from Linux kernel's very small

> >environment with LD_DEBUG=all the __ehdr_start is not shown at all.

> >

> >Fixes: BZ #28293

> >---

> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---

> > 1 file changed, 25 insertions(+), 3 deletions(-)

> >

> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h

> >index eb13cb8b57..58ebef6ecd 100644

> >--- a/sysdeps/arm/dl-machine.h

> >+++ b/sysdeps/arm/dl-machine.h

> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr *ehdr)

> > }

> >

> > /* Return the run-time load address of the shared object.  */

> >-static inline ElfW(Addr) __attribute__ ((unused))

> >+static inline Elf32_Addr __attribute__ ((unused))

> > elf_machine_load_address (void)

> > {

> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;

> >-  return (ElfW(Addr)) &__ehdr_start;

> >+  Elf32_Addr pcrel_addr;

> >+#ifdef SHARED

> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");

> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;

> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));

> >+#else

> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)

> >+    asm ("_dl_relocate_static_pie") attribute_hidden;

> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;

> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));

> >+#endif

> >+#ifdef __thumb__

> >+  /* Clear the low bit of the function address.

> >+

> >+     NOTE: got_addr is from GOT table whose lsb is always set by

> >linker if it's

> >+     Thumb function address.  PCREL_ADDR comes from PC-relative

> >calculation

> >+     which will finish during assembling.  GAS assembler before the

> >fix for

> >+     PR gas/21458 was not setting the lsb but does after that.

> >Always do the

> >+     strip for both, so the code works with various combinations of

> >glibc and

> >+     Binutils.  */

> >+  got_addr &= ~(Elf32_Addr) 1;

> >+  pcrel_addr &= ~(Elf32_Addr) 1;

> >+#endif

> >+  return pcrel_addr - got_addr;

> > }

> >

> > /* Return the link-time address of _DYNAMIC.  */

> >-- 

> >2.20.1

> >  


Note:

[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu has
good support for kernel debugging (-s -S switches). Then for user space
program one could use the "on-board" gdb or gdbserver.

However, /sbin/init needs to be debugged from linux and scheduler
context switches needs to be taken into account to review how the code
is executed. And such debugging scenario has issue with QEMU-arm.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Szabolcs Nagy via Libc-alpha Sept. 7, 2021, 5:44 p.m. | #3
On 2021-09-07, Lukasz Majewski wrote:
>Hi Fangrui,

>

>> On 2021-09-07, Lukasz Majewski wrote:

>> >This change is a partial revert of commit

>> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of

>> >__ehdr_start linker variable to get the address of loaded program.

>> >

>> >The elf_machine_load_address() function is declared in the

>> >sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry

>> >point for the program. It shall return the load address of the

>> >dynamic linker program.

>>

>> Yes.

>>

>> >With this revert the 'adr' assembler instruction is used instead of a

>> >place holder:

>> >

>> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr

>> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start

>> >

>> >which shall be pre-set by binutils.

>>

>> I don't understand this. The sh_addr field of the binutils defined

>> __ehdr_start is 0.

>>

>> Declararing __ehdr_start as hidden and accessing it with PC-relative

>> addressing generates computes the runtime address of __ehdr_start,

>> which equals the load base.

>

>Maybe I don't get it - but shouldn't this be filled in by binutils

>(linker ?) when the program is assembled before run?


The linker just sets its link-time address (sh_addr): 0.
The address is computed at runtime.

>Or is the __ehdr_start used just as a relative offset to get the proper

>position of ld-linux-armh.so when called?


It's a relative offset.

% cat a.c
__attribute__((visibility("hidden")))
extern char __ehdr_start[];

char *load_address() { return __ehdr_start; }

% arm-linux-gnueabi-gcc -O1 -fpic a.c -nostdlib
% arm-linux-gnueabi-objdump -dr a.out

a.out:     file format elf32-littlearm


Disassembly of section .text:

00000198 <load_address>:
  198:   e59f0004        ldr     r0, [pc, #4]    ; 1a4 <load_address+0xc>
  19c:   e08f0000        add     r0, pc, r0
  1a0:   e12fff1e        bx      lr
  1a4:   fffffe5c        .word   0xfffffe5c

At runtime, `load_address` will compute the load address.
Note that there is no runtime relocation.


>>

>> What's wrong with it? The objdump -t line doesn't tell why this change

>> is good.

>

>I'm also puzzled why this causes the QEMU versalite board to break.

>(/sbin/init dies with SIGSEGFAULT = 0xb [*])

>

>It looks like the &__ehdr_start is not equal in the result to 'adr'

>assembler code in QEMU.

>

>I also thought that the issue is caused by fs/binfmt_elf.c changes in

>the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and

>5.14. No change.


The kernel change must be unrelated.
The kernel doesn't even know __ehdr_start as a symbol exists.

>The QEMU is pretty recent - it is 6.0.0 version for ARM.

>

>> Can you compare objdump -dr output and show the incorrect

>> code sequences with C assessing __ehdr_start?

>

>Ok. I will explicitly compare _dl_start function with and without this

>patch.


In case it helps, perhaps replace always_inline with noinline
and check why the function is incorrect under your qemu setup.

>>

>> >This is crucial in the QEMU ARM environment for which (when

>> >/sbin/init is executed) values set in __ehdr_start symbol are wrong.

>> >This causes the program to crash very early - when the

>> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to /sbin/init

>> >execution.

>>

>> >The kernel's fs/binfmt_elf.c is though responsible for setting

>> >up execution environment, not binutils.

>>

>> Whether the symbol entry __ehdr_start exists doesn't matter.

>> The hidden definition does not have any associated relocation.

>

>I'm also wondering if similar issue is visible with "normal" - i.e. not

>QEMU ARM run init.

>

>Maybe the "rough" environment in which /sbin/init is run is the culprit?

>


I had tested running ld.so and elf/ldconfig which covered the usage.

I don't know. It needs some debugging from you:)

>>

>> >It looks like the only robust way to obtain the _dl_start offset is

>> >to use assembler instruction - not rely on values provided by

>> >binutils.

>> >

>> >HW:

>> >Hardware name: ARM-Versatile Express (Run with QEMU)

>> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1

>> >

>> >When the /sbin/init is setup for run from Linux kernel's very small

>> >environment with LD_DEBUG=all the __ehdr_start is not shown at all.

>> >

>> >Fixes: BZ #28293

>> >---

>> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---

>> > 1 file changed, 25 insertions(+), 3 deletions(-)

>> >

>> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h

>> >index eb13cb8b57..58ebef6ecd 100644

>> >--- a/sysdeps/arm/dl-machine.h

>> >+++ b/sysdeps/arm/dl-machine.h

>> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr *ehdr)

>> > }

>> >

>> > /* Return the run-time load address of the shared object.  */

>> >-static inline ElfW(Addr) __attribute__ ((unused))

>> >+static inline Elf32_Addr __attribute__ ((unused))

>> > elf_machine_load_address (void)

>> > {

>> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;

>> >-  return (ElfW(Addr)) &__ehdr_start;

>> >+  Elf32_Addr pcrel_addr;

>> >+#ifdef SHARED

>> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");

>> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;

>> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));

>> >+#else

>> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)

>> >+    asm ("_dl_relocate_static_pie") attribute_hidden;

>> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;

>> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));

>> >+#endif

>> >+#ifdef __thumb__

>> >+  /* Clear the low bit of the function address.

>> >+

>> >+     NOTE: got_addr is from GOT table whose lsb is always set by

>> >linker if it's

>> >+     Thumb function address.  PCREL_ADDR comes from PC-relative

>> >calculation

>> >+     which will finish during assembling.  GAS assembler before the

>> >fix for

>> >+     PR gas/21458 was not setting the lsb but does after that.

>> >Always do the

>> >+     strip for both, so the code works with various combinations of

>> >glibc and

>> >+     Binutils.  */

>> >+  got_addr &= ~(Elf32_Addr) 1;

>> >+  pcrel_addr &= ~(Elf32_Addr) 1;

>> >+#endif

>> >+  return pcrel_addr - got_addr;

>> > }

>> >

>> > /* Return the link-time address of _DYNAMIC.  */

>> >--

>> >2.20.1

>> >

>

>Note:

>

>[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu has

>good support for kernel debugging (-s -S switches). Then for user space

>program one could use the "on-board" gdb or gdbserver.

>

>However, /sbin/init needs to be debugged from linux and scheduler

>context switches needs to be taken into account to review how the code

>is executed. And such debugging scenario has issue with QEMU-arm.

>

>

>Best regards,

>

>Lukasz Majewski

>

>--

>

>DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

>HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

>Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Sept. 8, 2021, 3:05 p.m. | #4
Hi Fangrui,

> On 2021-09-07, Lukasz Majewski wrote:

> >Hi Fangrui,

> >  

> >> On 2021-09-07, Lukasz Majewski wrote:  

> >> >This change is a partial revert of commit

> >> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of

> >> >__ehdr_start linker variable to get the address of loaded program.

> >> >

> >> >The elf_machine_load_address() function is declared in the

> >> >sysdeps/arm/dl-machine.h header. It is called from _dl_start()

> >> >entry point for the program. It shall return the load address of

> >> >the dynamic linker program.  

> >>

> >> Yes.

> >>  

> >> >With this revert the 'adr' assembler instruction is used instead

> >> >of a place holder:

> >> >

> >> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr

> >> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start

> >> >

> >> >which shall be pre-set by binutils.  

> >>

> >> I don't understand this. The sh_addr field of the binutils defined

> >> __ehdr_start is 0.

> >>

> >> Declararing __ehdr_start as hidden and accessing it with

> >> PC-relative addressing generates computes the runtime address of

> >> __ehdr_start, which equals the load base.  

> >

> >Maybe I don't get it - but shouldn't this be filled in by binutils

> >(linker ?) when the program is assembled before run?  

> 

> The linker just sets its link-time address (sh_addr): 0.

> The address is computed at runtime.

> 

> >Or is the __ehdr_start used just as a relative offset to get the

> >proper position of ld-linux-armh.so when called?  

> 

> It's a relative offset.

> 

> % cat a.c

> __attribute__((visibility("hidden")))

> extern char __ehdr_start[];

> 

> char *load_address() { return __ehdr_start; }

> 

> % arm-linux-gnueabi-gcc -O1 -fpic a.c -nostdlib

> % arm-linux-gnueabi-objdump -dr a.out

> 

> a.out:     file format elf32-littlearm

> 

> 

> Disassembly of section .text:

> 

> 00000198 <load_address>:

>   198:   e59f0004        ldr     r0, [pc, #4]    ; 1a4

> <load_address+0xc> 19c:   e08f0000        add     r0, pc, r0

>   1a0:   e12fff1e        bx      lr

>   1a4:   fffffe5c        .word   0xfffffe5c

> 

> At runtime, `load_address` will compute the load address.

> Note that there is no runtime relocation.


I think that the problem may be with having the negative value
calculated.

The relevant snipet:

    116c:       bf00            nop
    116e:       bf00            nop
    1170:       bf00            nop
    1172:       f8df 3508       ldr.w   r3, [pc, #1288] ; 167c <_dl_start+0x520>

    1176:       f8df 1508       ldr.w   r1, [pc, #1288] ; 1680 <_dl_start+0x524>

    117a:       447b            add     r3, pc
    117c:       4479            add     r1, pc
    117e:       f8c3 1598       str.w   r1, [r3, #1432] ; 0x598
    1182:       bf00            nop
    1184:       bf00            nop
    1186:       bf00            nop
    1188:       bf00            nop
    118a:       bf00            nop
    118c:       bf00            nop
    118e:       f8df 24f4       ldr.w   r2, [pc, #1268] ; 1684 <_dl_start+0x528>
    1192:       f8d3 5598       ldr.w   r5, [r3, #1432] ; 0x598
    1196:       447a            add     r2, pc
    1198:       442a            add     r2, r5
    119a:       1a52            subs    r2, r2, r1
    119c:       f8c3 25a0       str.w   r2, [r3, #1440] ; 0x5a0
    11a0:       6813            ldr     r3, [r2, #0]


    167c:       0002be92        .word   0x0002be92
    1680:       ffffee80        .word   0xffffee80

The r1 gets the 0xffffee80 (negative offset) value. It is then added to pc
and used to calculate r2.

For working code (with this patch applied) - there are NO such large
values (like aforementioned 0xffffee80). The arithmetic is done on 

   1690:       00000020        .word   0x00000020
   1694:       0002be7e        .word   0x0002be7e

which seems to work.

This shouldn't be a problem as with U2 the arithmetic shall work.
However, I've noticed (with passing LD_DEBUG=all) that the
ld-linux-armhf.so.3 (and init) are relocated twice before execution.

Why do we need to relocate it?

Another question is why on this particular case the large (i.e.
negative) offset matters?

> 

> 

> >>

> >> What's wrong with it? The objdump -t line doesn't tell why this

> >> change is good.  

> >

> >I'm also puzzled why this causes the QEMU versalite board to break.

> >(/sbin/init dies with SIGSEGFAULT = 0xb [*])

> >

> >It looks like the &__ehdr_start is not equal in the result to 'adr'

> >assembler code in QEMU.

> >

> >I also thought that the issue is caused by fs/binfmt_elf.c changes in

> >the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and

> >5.14. No change.  

> 

> The kernel change must be unrelated.

> The kernel doesn't even know __ehdr_start as a symbol exists.

> 

> >The QEMU is pretty recent - it is 6.0.0 version for ARM.

> >  

> >> Can you compare objdump -dr output and show the incorrect

> >> code sequences with C assessing __ehdr_start?  

> >

> >Ok. I will explicitly compare _dl_start function with and without

> >this patch.  

> 

> In case it helps, perhaps replace always_inline with noinline

> and check why the function is incorrect under your qemu setup.

> 

> >>  

> >> >This is crucial in the QEMU ARM environment for which (when

> >> >/sbin/init is executed) values set in __ehdr_start symbol are

> >> >wrong. This causes the program to crash very early - when the

> >> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to

> >> >/sbin/init execution.  

> >>  

> >> >The kernel's fs/binfmt_elf.c is though responsible for setting

> >> >up execution environment, not binutils.  

> >>

> >> Whether the symbol entry __ehdr_start exists doesn't matter.

> >> The hidden definition does not have any associated relocation.  

> >

> >I'm also wondering if similar issue is visible with "normal" - i.e.

> >not QEMU ARM run init.

> >

> >Maybe the "rough" environment in which /sbin/init is run is the

> >culprit?

> >  

> 

> I had tested running ld.so and elf/ldconfig which covered the usage.

> 

> I don't know. It needs some debugging from you:)

> 

> >>  

> >> >It looks like the only robust way to obtain the _dl_start offset

> >> >is to use assembler instruction - not rely on values provided by

> >> >binutils.

> >> >

> >> >HW:

> >> >Hardware name: ARM-Versatile Express (Run with QEMU)

> >> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1

> >> >

> >> >When the /sbin/init is setup for run from Linux kernel's very

> >> >small environment with LD_DEBUG=all the __ehdr_start is not shown

> >> >at all.

> >> >

> >> >Fixes: BZ #28293

> >> >---

> >> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---

> >> > 1 file changed, 25 insertions(+), 3 deletions(-)

> >> >

> >> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h

> >> >index eb13cb8b57..58ebef6ecd 100644

> >> >--- a/sysdeps/arm/dl-machine.h

> >> >+++ b/sysdeps/arm/dl-machine.h

> >> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr

> >> >*ehdr)

> >> > }

> >> >

> >> > /* Return the run-time load address of the shared object.  */

> >> >-static inline ElfW(Addr) __attribute__ ((unused))

> >> >+static inline Elf32_Addr __attribute__ ((unused))

> >> > elf_machine_load_address (void)

> >> > {

> >> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;

> >> >-  return (ElfW(Addr)) &__ehdr_start;

> >> >+  Elf32_Addr pcrel_addr;

> >> >+#ifdef SHARED

> >> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");

> >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;

> >> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));

> >> >+#else

> >> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)

> >> >+    asm ("_dl_relocate_static_pie") attribute_hidden;

> >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;

> >> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));

> >> >+#endif

> >> >+#ifdef __thumb__

> >> >+  /* Clear the low bit of the function address.

> >> >+

> >> >+     NOTE: got_addr is from GOT table whose lsb is always set by

> >> >linker if it's

> >> >+     Thumb function address.  PCREL_ADDR comes from PC-relative

> >> >calculation

> >> >+     which will finish during assembling.  GAS assembler before

> >> >the fix for

> >> >+     PR gas/21458 was not setting the lsb but does after that.

> >> >Always do the

> >> >+     strip for both, so the code works with various combinations

> >> >of glibc and

> >> >+     Binutils.  */

> >> >+  got_addr &= ~(Elf32_Addr) 1;

> >> >+  pcrel_addr &= ~(Elf32_Addr) 1;

> >> >+#endif

> >> >+  return pcrel_addr - got_addr;

> >> > }

> >> >

> >> > /* Return the link-time address of _DYNAMIC.  */

> >> >--

> >> >2.20.1

> >> >  

> >

> >Note:

> >

> >[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu

> >has good support for kernel debugging (-s -S switches). Then for

> >user space program one could use the "on-board" gdb or gdbserver.

> >

> >However, /sbin/init needs to be debugged from linux and scheduler

> >context switches needs to be taken into account to review how the

> >code is executed. And such debugging scenario has issue with

> >QEMU-arm.

> >

> >

> >Best regards,

> >

> >Lukasz Majewski

> >

> >--

> >

> >DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> >HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> >Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> >lukma@denx.de  

> 

> 





Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Szabolcs Nagy via Libc-alpha Sept. 8, 2021, 5:41 p.m. | #5
On Wed, Sep 8, 2021 at 8:05 AM Lukasz Majewski <lukma@denx.de> wrote:
>

> Hi Fangrui,

>

> > On 2021-09-07, Lukasz Majewski wrote:

> > >Hi Fangrui,

> > >

> > >> On 2021-09-07, Lukasz Majewski wrote:

> > >> >This change is a partial revert of commit

> > >> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of

> > >> >__ehdr_start linker variable to get the address of loaded program.

> > >> >

> > >> >The elf_machine_load_address() function is declared in the

> > >> >sysdeps/arm/dl-machine.h header. It is called from _dl_start()

> > >> >entry point for the program. It shall return the load address of

> > >> >the dynamic linker program.

> > >>

> > >> Yes.

> > >>

> > >> >With this revert the 'adr' assembler instruction is used instead

> > >> >of a place holder:

> > >> >

> > >> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr

> > >> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start

> > >> >

> > >> >which shall be pre-set by binutils.

> > >>

> > >> I don't understand this. The sh_addr field of the binutils defined

> > >> __ehdr_start is 0.

> > >>

> > >> Declararing __ehdr_start as hidden and accessing it with

> > >> PC-relative addressing generates computes the runtime address of

> > >> __ehdr_start, which equals the load base.

> > >

> > >Maybe I don't get it - but shouldn't this be filled in by binutils

> > >(linker ?) when the program is assembled before run?

> >

> > The linker just sets its link-time address (sh_addr): 0.

> > The address is computed at runtime.

> >

> > >Or is the __ehdr_start used just as a relative offset to get the

> > >proper position of ld-linux-armh.so when called?

> >

> > It's a relative offset.

> >

> > % cat a.c

> > __attribute__((visibility("hidden")))

> > extern char __ehdr_start[];

> >

> > char *load_address() { return __ehdr_start; }

> >

> > % arm-linux-gnueabi-gcc -O1 -fpic a.c -nostdlib

> > % arm-linux-gnueabi-objdump -dr a.out

> >

> > a.out:     file format elf32-littlearm

> >

> >

> > Disassembly of section .text:

> >

> > 00000198 <load_address>:

> >   198:   e59f0004        ldr     r0, [pc, #4]    ; 1a4

> > <load_address+0xc> 19c:   e08f0000        add     r0, pc, r0

> >   1a0:   e12fff1e        bx      lr

> >   1a4:   fffffe5c        .word   0xfffffe5c

> >

> > At runtime, `load_address` will compute the load address.

> > Note that there is no runtime relocation.

>

> I think that the problem may be with having the negative value

> calculated.


Unfamiliar with arm, but I don't see why a negative value can be a problem.
The address computation just wraps around as expected.

> The relevant snipet:

>

>     116c:       bf00            nop

>     116e:       bf00            nop

>     1170:       bf00            nop

>     1172:       f8df 3508       ldr.w   r3, [pc, #1288] ; 167c <_dl_start+0x520>

>

>     1176:       f8df 1508       ldr.w   r1, [pc, #1288] ; 1680 <_dl_start+0x524>

>

>     117a:       447b            add     r3, pc

>     117c:       4479            add     r1, pc

>     117e:       f8c3 1598       str.w   r1, [r3, #1432] ; 0x598

>     1182:       bf00            nop

>     1184:       bf00            nop

>     1186:       bf00            nop

>     1188:       bf00            nop

>     118a:       bf00            nop

>     118c:       bf00            nop

>     118e:       f8df 24f4       ldr.w   r2, [pc, #1268] ; 1684 <_dl_start+0x528>

>     1192:       f8d3 5598       ldr.w   r5, [r3, #1432] ; 0x598

>     1196:       447a            add     r2, pc

>     1198:       442a            add     r2, r5

>     119a:       1a52            subs    r2, r2, r1

>     119c:       f8c3 25a0       str.w   r2, [r3, #1440] ; 0x5a0

>     11a0:       6813            ldr     r3, [r2, #0]

>

>

>     167c:       0002be92        .word   0x0002be92

>     1680:       ffffee80        .word   0xffffee80

>

> The r1 gets the 0xffffee80 (negative offset) value. It is then added to pc

> and used to calculate r2.

>

> For working code (with this patch applied) - there are NO such large

> values (like aforementioned 0xffffee80). The arithmetic is done on

>

>    1690:       00000020        .word   0x00000020

>    1694:       0002be7e        .word   0x0002be7e

>

> which seems to work.

>

> This shouldn't be a problem as with U2 the arithmetic shall work.

> However, I've noticed (with passing LD_DEBUG=all) that the

> ld-linux-armhf.so.3 (and init) are relocated twice before execution.

>

> Why do we need to relocate it?


No idea...

> Another question is why on this particular case the large (i.e.

> negative) offset matters?


No idea...

> >

> >

> > >>

> > >> What's wrong with it? The objdump -t line doesn't tell why this

> > >> change is good.

> > >

> > >I'm also puzzled why this causes the QEMU versalite board to break.

> > >(/sbin/init dies with SIGSEGFAULT = 0xb [*])

> > >

> > >It looks like the &__ehdr_start is not equal in the result to 'adr'

> > >assembler code in QEMU.

> > >

> > >I also thought that the issue is caused by fs/binfmt_elf.c changes in

> > >the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and

> > >5.14. No change.

> >

> > The kernel change must be unrelated.

> > The kernel doesn't even know __ehdr_start as a symbol exists.

> >

> > >The QEMU is pretty recent - it is 6.0.0 version for ARM.

> > >

> > >> Can you compare objdump -dr output and show the incorrect

> > >> code sequences with C assessing __ehdr_start?

> > >

> > >Ok. I will explicitly compare _dl_start function with and without

> > >this patch.

> >

> > In case it helps, perhaps replace always_inline with noinline

> > and check why the function is incorrect under your qemu setup.

> >

> > >>

> > >> >This is crucial in the QEMU ARM environment for which (when

> > >> >/sbin/init is executed) values set in __ehdr_start symbol are

> > >> >wrong. This causes the program to crash very early - when the

> > >> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to

> > >> >/sbin/init execution.

> > >>

> > >> >The kernel's fs/binfmt_elf.c is though responsible for setting

> > >> >up execution environment, not binutils.

> > >>

> > >> Whether the symbol entry __ehdr_start exists doesn't matter.

> > >> The hidden definition does not have any associated relocation.

> > >

> > >I'm also wondering if similar issue is visible with "normal" - i.e.

> > >not QEMU ARM run init.

> > >

> > >Maybe the "rough" environment in which /sbin/init is run is the

> > >culprit?

> > >

> >

> > I had tested running ld.so and elf/ldconfig which covered the usage.

> >

> > I don't know. It needs some debugging from you:)

> >

> > >>

> > >> >It looks like the only robust way to obtain the _dl_start offset

> > >> >is to use assembler instruction - not rely on values provided by

> > >> >binutils.

> > >> >

> > >> >HW:

> > >> >Hardware name: ARM-Versatile Express (Run with QEMU)

> > >> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1

> > >> >

> > >> >When the /sbin/init is setup for run from Linux kernel's very

> > >> >small environment with LD_DEBUG=all the __ehdr_start is not shown

> > >> >at all.

> > >> >

> > >> >Fixes: BZ #28293

> > >> >---

> > >> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---

> > >> > 1 file changed, 25 insertions(+), 3 deletions(-)

> > >> >

> > >> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h

> > >> >index eb13cb8b57..58ebef6ecd 100644

> > >> >--- a/sysdeps/arm/dl-machine.h

> > >> >+++ b/sysdeps/arm/dl-machine.h

> > >> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr

> > >> >*ehdr)

> > >> > }

> > >> >

> > >> > /* Return the run-time load address of the shared object.  */

> > >> >-static inline ElfW(Addr) __attribute__ ((unused))

> > >> >+static inline Elf32_Addr __attribute__ ((unused))

> > >> > elf_machine_load_address (void)

> > >> > {

> > >> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;

> > >> >-  return (ElfW(Addr)) &__ehdr_start;

> > >> >+  Elf32_Addr pcrel_addr;

> > >> >+#ifdef SHARED

> > >> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");

> > >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;

> > >> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));

> > >> >+#else

> > >> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)

> > >> >+    asm ("_dl_relocate_static_pie") attribute_hidden;

> > >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;

> > >> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));

> > >> >+#endif

> > >> >+#ifdef __thumb__

> > >> >+  /* Clear the low bit of the function address.

> > >> >+

> > >> >+     NOTE: got_addr is from GOT table whose lsb is always set by

> > >> >linker if it's

> > >> >+     Thumb function address.  PCREL_ADDR comes from PC-relative

> > >> >calculation

> > >> >+     which will finish during assembling.  GAS assembler before

> > >> >the fix for

> > >> >+     PR gas/21458 was not setting the lsb but does after that.

> > >> >Always do the

> > >> >+     strip for both, so the code works with various combinations

> > >> >of glibc and

> > >> >+     Binutils.  */

> > >> >+  got_addr &= ~(Elf32_Addr) 1;

> > >> >+  pcrel_addr &= ~(Elf32_Addr) 1;

> > >> >+#endif

> > >> >+  return pcrel_addr - got_addr;

> > >> > }

> > >> >

> > >> > /* Return the link-time address of _DYNAMIC.  */

> > >> >--

> > >> >2.20.1

> > >> >

> > >

> > >Note:

> > >

> > >[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu

> > >has good support for kernel debugging (-s -S switches). Then for

> > >user space program one could use the "on-board" gdb or gdbserver.

> > >

> > >However, /sbin/init needs to be debugged from linux and scheduler

> > >context switches needs to be taken into account to review how the

> > >code is executed. And such debugging scenario has issue with

> > >QEMU-arm.

> > >

> > >

> > >Best regards,

> > >

> > >Lukasz Majewski

> > >

> > >--

> > >

> > >DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> > >HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> > >Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> > >lukma@denx.de

> >

> >

>

>

>

>

> Best regards,

>

> Lukasz Majewski

>

> --

>

> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de




-- 
宋方睿
Szabolcs Nagy via Libc-alpha Sept. 8, 2021, 7:19 p.m. | #6
On 08/09/2021 12:05, Lukasz Majewski wrote:
> 

> The r1 gets the 0xffffee80 (negative offset) value. It is then added to pc

> and used to calculate r2.

> 

> For working code (with this patch applied) - there are NO such large

> values (like aforementioned 0xffffee80). The arithmetic is done on 

> 

>    1690:       00000020        .word   0x00000020

>    1694:       0002be7e        .word   0x0002be7e

> 

> which seems to work.

> 

> This shouldn't be a problem as with U2 the arithmetic shall work.

> However, I've noticed (with passing LD_DEBUG=all) that the

> ld-linux-armhf.so.3 (and init) are relocated twice before execution.

> 

> Why do we need to relocate it?

> 

> Another question is why on this particular case the large (i.e.

> negative) offset matters?


I think it is highly unlikely the negative offset plays any role here.
Do you have a working example to trigger this issue?  I am currently
testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and
with different configurations (armv5, armv6, armv7, with and without
thumb) and I haven't see any issue so far.

It might binutils related, which version are you using?
Lukasz Majewski Sept. 8, 2021, 8:34 p.m. | #7
Hi Adhemerval,

> On 08/09/2021 12:05, Lukasz Majewski wrote:

> > 

> > The r1 gets the 0xffffee80 (negative offset) value. It is then

> > added to pc and used to calculate r2.

> > 

> > For working code (with this patch applied) - there are NO such large

> > values (like aforementioned 0xffffee80). The arithmetic is done on 

> > 

> >    1690:       00000020        .word   0x00000020

> >    1694:       0002be7e        .word   0x0002be7e

> > 

> > which seems to work.

> > 

> > This shouldn't be a problem as with U2 the arithmetic shall work.

> > However, I've noticed (with passing LD_DEBUG=all) that the

> > ld-linux-armhf.so.3 (and init) are relocated twice before execution.

> > 

> > Why do we need to relocate it?

> > 

> > Another question is why on this particular case the large (i.e.

> > negative) offset matters?  

> 

> I think it is highly unlikely the negative offset plays any role here.

> Do you have a working example to trigger this issue? 


I'm just using QEMU (runqemu -d y2038-image-devel nographic) from
meta-y2038 build.

Recent versions: 
poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1
qemu_6.0.0
binutils_2.37
gcc_11.2

The QEMU board uses Cortex-A9 core.

> I am currently

> testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and

> with different configurations (armv5, armv6, armv7, with and without

> thumb) and I haven't see any issue so far.

> 

> It might binutils related, which version are you using?





Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Sept. 9, 2021, 7:18 a.m. | #8
On Wed, 8 Sep 2021 22:34:21 +0200
Lukasz Majewski <lukma@denx.de> wrote:

> Hi Adhemerval,

> 

> > On 08/09/2021 12:05, Lukasz Majewski wrote:  

> > > 

> > > The r1 gets the 0xffffee80 (negative offset) value. It is then

> > > added to pc and used to calculate r2.

> > > 

> > > For working code (with this patch applied) - there are NO such

> > > large values (like aforementioned 0xffffee80). The arithmetic is

> > > done on 

> > > 

> > >    1690:       00000020        .word   0x00000020

> > >    1694:       0002be7e        .word   0x0002be7e

> > > 

> > > which seems to work.

> > > 

> > > This shouldn't be a problem as with U2 the arithmetic shall work.

> > > However, I've noticed (with passing LD_DEBUG=all) that the

> > > ld-linux-armhf.so.3 (and init) are relocated twice before

> > > execution.

> > > 

> > > Why do we need to relocate it?

> > > 

> > > Another question is why on this particular case the large (i.e.

> > > negative) offset matters?    

> > 

> > I think it is highly unlikely the negative offset plays any role

> > here. Do you have a working example to trigger this issue?   

> 

> I'm just using QEMU (runqemu -d y2038-image-devel nographic) from

> meta-y2038 build.

> 

> Recent versions: 

> poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1

> qemu_6.0.0

> binutils_2.37

> gcc_11.2

> 

> The QEMU board uses Cortex-A9 core.


I've updated the poky to be the newest -master:
master:50293eec9a7d0602b748220ab100b070b8d3432a

No change.

I will try the same setup with Cortex-A5 core - maybe there is some
kind of subtle handling of such emulation in qemu?

> 

> > I am currently

> > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and

> > with different configurations (armv5, armv6, armv7, with and without

> > thumb) and I haven't see any issue so far.

> > 

> > It might binutils related, which version are you using?  

> 

> 

> 

> 

> Best regards,

> 

> Lukasz Majewski

> 

> --

> 

> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> lukma@denx.de





Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Sept. 9, 2021, 9:49 a.m. | #9
On Thu, 9 Sep 2021 09:18:06 +0200
Lukasz Majewski <lukma@denx.de> wrote:

> On Wed, 8 Sep 2021 22:34:21 +0200

> Lukasz Majewski <lukma@denx.de> wrote:

> 

> > Hi Adhemerval,

> >   

> > > On 08/09/2021 12:05, Lukasz Majewski wrote:    

> > > > 

> > > > The r1 gets the 0xffffee80 (negative offset) value. It is then

> > > > added to pc and used to calculate r2.

> > > > 

> > > > For working code (with this patch applied) - there are NO such

> > > > large values (like aforementioned 0xffffee80). The arithmetic is

> > > > done on 

> > > > 

> > > >    1690:       00000020        .word   0x00000020

> > > >    1694:       0002be7e        .word   0x0002be7e

> > > > 

> > > > which seems to work.

> > > > 

> > > > This shouldn't be a problem as with U2 the arithmetic shall

> > > > work. However, I've noticed (with passing LD_DEBUG=all) that the

> > > > ld-linux-armhf.so.3 (and init) are relocated twice before

> > > > execution.

> > > > 

> > > > Why do we need to relocate it?

> > > > 

> > > > Another question is why on this particular case the large (i.e.

> > > > negative) offset matters?      

> > > 

> > > I think it is highly unlikely the negative offset plays any role

> > > here. Do you have a working example to trigger this issue?     

> > 

> > I'm just using QEMU (runqemu -d y2038-image-devel nographic) from

> > meta-y2038 build.

> > 

> > Recent versions: 

> > poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1

> > qemu_6.0.0

> > binutils_2.37

> > gcc_11.2

> > 

> > The QEMU board uses Cortex-A9 core.  

> 

> I've updated the poky to be the newest -master:

> master:50293eec9a7d0602b748220ab100b070b8d3432a

> 

> No change.

> 

> I will try the same setup with Cortex-A5 core - maybe there is some

> kind of subtle handling of such emulation in qemu?


No change. On the qemuarm yocto/OE MACHINE the same situation is
observed.

After applying this patch the board works without issues.

> 

> >   

> > > I am currently

> > > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and

> > > with different configurations (armv5, armv6, armv7, with and

> > > without thumb) and I haven't see any issue so far.

> > > 

> > > It might binutils related, which version are you using?    

> > 

> > 

> > 

> > 

> > Best regards,

> > 

> > Lukasz Majewski

> > 

> > --

> > 

> > DENX Software Engineering GmbH,      Managing Director: Wolfgang

> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,

> > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> > lukma@denx.de  

> 

> 

> 

> 

> Best regards,

> 

> Lukasz Majewski

> 

> --

> 

> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> lukma@denx.de





Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Sept. 10, 2021, 10:10 a.m. | #10
On Thu, 9 Sep 2021 11:49:36 +0200
Lukasz Majewski <lukma@denx.de> wrote:

> On Thu, 9 Sep 2021 09:18:06 +0200

> Lukasz Majewski <lukma@denx.de> wrote:

> 

> > On Wed, 8 Sep 2021 22:34:21 +0200

> > Lukasz Majewski <lukma@denx.de> wrote:

> >   

> > > Hi Adhemerval,

> > >     

> > > > On 08/09/2021 12:05, Lukasz Majewski wrote:      

> > > > > 

> > > > > The r1 gets the 0xffffee80 (negative offset) value. It is then

> > > > > added to pc and used to calculate r2.

> > > > > 

> > > > > For working code (with this patch applied) - there are NO such

> > > > > large values (like aforementioned 0xffffee80). The arithmetic

> > > > > is done on 

> > > > > 

> > > > >    1690:       00000020        .word   0x00000020

> > > > >    1694:       0002be7e        .word   0x0002be7e

> > > > > 

> > > > > which seems to work.

> > > > > 

> > > > > This shouldn't be a problem as with U2 the arithmetic shall

> > > > > work. However, I've noticed (with passing LD_DEBUG=all) that

> > > > > the ld-linux-armhf.so.3 (and init) are relocated twice before

> > > > > execution.

> > > > > 

> > > > > Why do we need to relocate it?

> > > > > 

> > > > > Another question is why on this particular case the large

> > > > > (i.e. negative) offset matters?        

> > > > 

> > > > I think it is highly unlikely the negative offset plays any role

> > > > here. Do you have a working example to trigger this issue?

> > > >  

> > > 

> > > I'm just using QEMU (runqemu -d y2038-image-devel nographic) from

> > > meta-y2038 build.

> > > 

> > > Recent versions: 

> > > poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1

> > > qemu_6.0.0

> > > binutils_2.37

> > > gcc_11.2

> > > 

> > > The QEMU board uses Cortex-A9 core.    

> > 

> > I've updated the poky to be the newest -master:

> > master:50293eec9a7d0602b748220ab100b070b8d3432a

> > 

> > No change.

> > 

> > I will try the same setup with Cortex-A5 core - maybe there is some

> > kind of subtle handling of such emulation in qemu?  

> 

> No change. On the qemuarm yocto/OE MACHINE the same situation is

> observed.

> 


Do you have any more ideas where to look for solution of this problem?

> After applying this patch the board works without issues.

> 

> >   

> > >     

> > > > I am currently

> > > > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11)

> > > > and with different configurations (armv5, armv6, armv7, with and

> > > > without thumb) and I haven't see any issue so far.

> > > > 

> > > > It might binutils related, which version are you using?      

> > > 

> > > 

> > > 

> > > 

> > > Best regards,

> > > 

> > > Lukasz Majewski

> > > 

> > > --

> > > 

> > > DENX Software Engineering GmbH,      Managing Director: Wolfgang

> > > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,

> > > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> > > lukma@denx.de    

> > 

> > 

> > 

> > 

> > Best regards,

> > 

> > Lukasz Majewski

> > 

> > --

> > 

> > DENX Software Engineering GmbH,      Managing Director: Wolfgang

> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,

> > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> > lukma@denx.de  

> 

> 

> 

> 

> Best regards,

> 

> Lukasz Majewski

> 

> --

> 

> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> lukma@denx.de





Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Sept. 17, 2021, 8:29 a.m. | #11
Dear Community,

> On Thu, 9 Sep 2021 11:49:36 +0200

> Lukasz Majewski <lukma@denx.de> wrote:

> 

> > On Thu, 9 Sep 2021 09:18:06 +0200

> > Lukasz Majewski <lukma@denx.de> wrote:

> >   

> > > On Wed, 8 Sep 2021 22:34:21 +0200

> > > Lukasz Majewski <lukma@denx.de> wrote:

> > >     

> > > > Hi Adhemerval,

> > > >       

> > > > > On 08/09/2021 12:05, Lukasz Majewski wrote:        

> > > > > > 

> > > > > > The r1 gets the 0xffffee80 (negative offset) value. It is

> > > > > > then added to pc and used to calculate r2.

> > > > > > 

> > > > > > For working code (with this patch applied) - there are NO

> > > > > > such large values (like aforementioned 0xffffee80). The

> > > > > > arithmetic is done on 

> > > > > > 

> > > > > >    1690:       00000020        .word   0x00000020

> > > > > >    1694:       0002be7e        .word   0x0002be7e

> > > > > > 

> > > > > > which seems to work.

> > > > > > 

> > > > > > This shouldn't be a problem as with U2 the arithmetic shall

> > > > > > work. However, I've noticed (with passing LD_DEBUG=all) that

> > > > > > the ld-linux-armhf.so.3 (and init) are relocated twice

> > > > > > before execution.

> > > > > > 

> > > > > > Why do we need to relocate it?

> > > > > > 

> > > > > > Another question is why on this particular case the large

> > > > > > (i.e. negative) offset matters?          

> > > > > 

> > > > > I think it is highly unlikely the negative offset plays any

> > > > > role here. Do you have a working example to trigger this

> > > > > issue? 

> > > > 

> > > > I'm just using QEMU (runqemu -d y2038-image-devel nographic)

> > > > from meta-y2038 build.

> > > > 

> > > > Recent versions: 

> > > > poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1

> > > > qemu_6.0.0

> > > > binutils_2.37

> > > > gcc_11.2

> > > > 

> > > > The QEMU board uses Cortex-A9 core.      

> > > 

> > > I've updated the poky to be the newest -master:

> > > master:50293eec9a7d0602b748220ab100b070b8d3432a

> > > 

> > > No change.

> > > 

> > > I will try the same setup with Cortex-A5 core - maybe there is

> > > some kind of subtle handling of such emulation in qemu?    

> > 

> > No change. On the qemuarm yocto/OE MACHINE the same situation is

> > observed.

> >   

> 

> Do you have any more ideas where to look for solution of this problem?


Can we make a decision regarding this fix?

The only finding, which I get, is the large negative offset added to the
pc in the glibc broken (current master) code, which causes QEMU system
to fail. 

I've shared this problem with the QEMU community [1], but no reply
received till today.


Shall we:

1. Apply this patch or

2. Revert the conversion (whole) patch [2] for ARM.

3. Wait till QEMU response for the problem?



Links:

[1] -
https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg03332.html

[2] - SHA1: bca0f5cbc9257c13322b99e55235c4f21ba0bd82

> 

> > After applying this patch the board works without issues.

> >   

> > >     

> > > >       

> > > > > I am currently

> > > > > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11)

> > > > > and with different configurations (armv5, armv6, armv7, with

> > > > > and without thumb) and I haven't see any issue so far.

> > > > > 

> > > > > It might binutils related, which version are you using?

> > > > >  

> > > > 

> > > > 

> > > > 

> > > > 

> > > > Best regards,

> > > > 

> > > > Lukasz Majewski

> > > > 

> > > > --

> > > > 

> > > > DENX Software Engineering GmbH,      Managing Director: Wolfgang

> > > > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194

> > > > Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax:

> > > > (+49)-8142-66989-80 Email: lukma@denx.de      

> > > 

> > > 

> > > 

> > > 

> > > Best regards,

> > > 

> > > Lukasz Majewski

> > > 

> > > --

> > > 

> > > DENX Software Engineering GmbH,      Managing Director: Wolfgang

> > > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,

> > > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> > > lukma@denx.de    

> > 

> > 

> > 

> > 

> > Best regards,

> > 

> > Lukasz Majewski

> > 

> > --

> > 

> > DENX Software Engineering GmbH,      Managing Director: Wolfgang

> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,

> > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> > lukma@denx.de  

> 

> 

> 

> 

> Best regards,

> 

> Lukasz Majewski

> 

> --

> 

> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:

> lukma@denx.de





Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Joseph Myers Sept. 17, 2021, 1:27 p.m. | #12
On Fri, 17 Sep 2021, Lukasz Majewski wrote:

> Can we make a decision regarding this fix?


I don't think we've yet seen a thorough analysis of the issue.

Involvement of QEMU is probably not relevant.  Either the dynamic linker 
binary generated is correct, according to the instruction semantics in the 
Arm ARM and the relocation semantics in AAELF, or it isn't.  If it's 
incorrect, either the (static) linker inputs are correct and there's a 
linker bug, or the linker inputs are incorrect ("correct" here might be a 
bit fuzzier, involving things that are not fully specified).

So start from working out whether the generated binary is correct or not, 
with an appropriate description of the relevant instruction sequences 
executed and why they do or do not work in each case.

QEMU only becomes relevant if you have a binary that works when executed 
on hardware but not on QEMU (or vice versa).

(a) Do you have such a binary working on hardware but not on QEMU?

(b) Have you tested using the same binaries with both QEMU and hardware?

(c) Have you tested with the glibc testsuite, using a prebuilt system 
image with known good glibc rather than building init with the new QEMU?  
That's a better starting point than building a whole system with the new 
glibc, though if the result is "all tests fail execution" you may not be 
much further forward (but at least you can run the dynamic linker, good 
and bad versions, under gdbserver, and step individual instructions to see 
exactly what happens when the problem code is executed).

(d) Likewise, but with --enable-hardcoded-path-in-tests?  The point of 
this question is that one thing that's different by default between a 
glibc testsuite run and running binaries outside the glibc testsuite is 
whether new binaries are run directly or via ld.so --library-path.  It's 
possible some dynamic linker bugs might show up in one configuration but 
not the other.  So if the glibc testsuite passes in a default 
configuration (which I assume it did, if the submitter of the original 
patch tested it properly), but init fails in a newly built system image, 
one possible explanation would be such a bug - and running the glibc 
testsuite with --enable-hardcoded-path-in-tests is one way of checking for 
that kind of issue.

-- 
Joseph S. Myers
joseph@codesourcery.com
Andreas Schwab Sept. 17, 2021, 4:17 p.m. | #13
On Sep 17 2021, Joseph Myers wrote:

> QEMU only becomes relevant if you have a binary that works when executed 

> on hardware but not on QEMU (or vice versa).


Given that glibc is working fine on hardware, it is unlikely a bug in
glibc.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Patch

diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
index eb13cb8b57..58ebef6ecd 100644
--- a/sysdeps/arm/dl-machine.h
+++ b/sysdeps/arm/dl-machine.h
@@ -38,11 +38,33 @@  elf_machine_matches_host (const Elf32_Ehdr *ehdr)
 }
 
 /* Return the run-time load address of the shared object.  */
-static inline ElfW(Addr) __attribute__ ((unused))
+static inline Elf32_Addr __attribute__ ((unused))
 elf_machine_load_address (void)
 {
-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
-  return (ElfW(Addr)) &__ehdr_start;
+  Elf32_Addr pcrel_addr;
+#ifdef SHARED
+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");
+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
+#else
+  extern Elf32_Addr __dl_relocate_static_pie (void *)
+    asm ("_dl_relocate_static_pie") attribute_hidden;
+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;
+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));
+#endif
+#ifdef __thumb__
+  /* Clear the low bit of the function address.
+
+     NOTE: got_addr is from GOT table whose lsb is always set by linker if it's
+     Thumb function address.  PCREL_ADDR comes from PC-relative calculation
+     which will finish during assembling.  GAS assembler before the fix for
+     PR gas/21458 was not setting the lsb but does after that.  Always do the
+     strip for both, so the code works with various combinations of glibc and
+     Binutils.  */
+  got_addr &= ~(Elf32_Addr) 1;
+  pcrel_addr &= ~(Elf32_Addr) 1;
+#endif
+  return pcrel_addr - got_addr;
 }
 
 /* Return the link-time address of _DYNAMIC.  */