Optimize epilogue in thumb __aeabi_{memmove,memset} implementations

Message ID 20190930145510.2796395-1-christos.gentsos@cern.ch
State New
Headers show
Series
  • Optimize epilogue in thumb __aeabi_{memmove,memset} implementations
Related show

Commit Message

Christos Gentsos Sept. 30, 2019, 2:55 p.m.
The same pop instruction that is used to restore registers can be used
to return from the function (as it is already done in other function
implementations).
---
 newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
 newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

-- 
2.23.0

Comments

Richard Earnshaw (lists) Sept. 30, 2019, 3:31 p.m. | #1
On 30/09/2019 15:55, Christos Gentsos wrote:
> The same pop instruction that is used to restore registers can be used

> to return from the function (as it is already done in other function

> implementations).

> ---

>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---

>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---

>   2 files changed, 2 insertions(+), 6 deletions(-)

> 

> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

> index 61a72581..a0aad852 100644

> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S

> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

> @@ -49,9 +49,7 @@ __aeabi_memmove:

>   	subs	r3, r3, #1

>   	bcs	1b

>   2:

> -	pop	{r4}

> -	pop	{r1}

> -	bx	r1

> +	pop	{r4, pc}

>   3:

>   	movs	r3, #0

>   	cmp	r2, #0

> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S

> index aa8f2719..5bb80b20 100644

> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S

> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S

> @@ -110,9 +110,7 @@ __aeabi_memset:

>   	cmp	r4, r3

>   	bne	8b

>   9:

> -	pop	{r4, r5, r6}

> -	pop	{r1}

> -	bx	r1

> +	pop	{r4, r5, r6, pc}

>   10:

>   	movs	r3, r0

>   	movs	r4, r1

> 


No.  That isn't interworking clean on armv4t, which we still need to 
support.

Sorry.

R.
Richard Earnshaw (lists) Sept. 30, 2019, 3:37 p.m. | #2
On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:
> On 30/09/2019 15:55, Christos Gentsos wrote:

>> The same pop instruction that is used to restore registers can be used

>> to return from the function (as it is already done in other function

>> implementations).

>> ---

>>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---

>>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---

>>   2 files changed, 2 insertions(+), 6 deletions(-)

>>

>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S 

>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>> index 61a72581..a0aad852 100644

>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>> @@ -49,9 +49,7 @@ __aeabi_memmove:

>>       subs    r3, r3, #1

>>       bcs    1b

>>   2:

>> -    pop    {r4}

>> -    pop    {r1}

>> -    bx    r1

>> +    pop    {r4, pc}

>>   3:

>>       movs    r3, #0

>>       cmp    r2, #0

>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S 

>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S

>> index aa8f2719..5bb80b20 100644

>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S

>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S

>> @@ -110,9 +110,7 @@ __aeabi_memset:

>>       cmp    r4, r3

>>       bne    8b

>>   9:

>> -    pop    {r4, r5, r6}

>> -    pop    {r1}

>> -    bx    r1

>> +    pop    {r4, r5, r6, pc}

>>   10:

>>       movs    r3, r0

>>       movs    r4, r1

>>

> 

> No.  That isn't interworking clean on armv4t, which we still need to 

> support.

> 

> Sorry.

> 

> R.


However, a patch that tests __ARM_ARCH >=5 and uses your improved 
sequence only in that case (preserving the old code otherwise) would 
probably be OK :-)

R.
Christos Gentsos Sept. 30, 2019, 5:16 p.m. | #3
On Mon, Sep 30 2019 at 16:37:26 +0100, Richard Earnshaw (lists) wrote:
> On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:

>> On 30/09/2019 15:55, Christos Gentsos wrote:

>>> The same pop instruction that is used to restore registers can be used

>>> to return from the function (as it is already done in other function

>>> implementations).

>>> ---

>>>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---

>>>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---

>>>   2 files changed, 2 insertions(+), 6 deletions(-)

>>>

>>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S 

>>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>>> index 61a72581..a0aad852 100644

>>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>>> @@ -49,9 +49,7 @@ __aeabi_memmove:

>>>       subs    r3, r3, #1

>>>       bcs    1b

>>>   2:

>>> -    pop    {r4}

>>> -    pop    {r1}

>>> -    bx    r1

>>> +    pop    {r4, pc}

>>>   3:

>>>       movs    r3, #0

>>>       cmp    r2, #0

>>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S 

>>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S

>>> index aa8f2719..5bb80b20 100644

>>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S

>>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S

>>> @@ -110,9 +110,7 @@ __aeabi_memset:

>>>       cmp    r4, r3

>>>       bne    8b

>>>   9:

>>> -    pop    {r4, r5, r6}

>>> -    pop    {r1}

>>> -    bx    r1

>>> +    pop    {r4, r5, r6, pc}

>>>   10:

>>>       movs    r3, r0

>>>       movs    r4, r1

>>>

>> 

>> No.  That isn't interworking clean on armv4t, which we still need to 

>> support.

>> 

>> Sorry.

>> 

>> R.

>

> However, a patch that tests __ARM_ARCH >=5 and uses your improved 

> sequence only in that case (preserving the old code otherwise) would 

> probably be OK :-)

>

> R.


Oh sorry then, I wasn't aware of that, thanks for the correction. I
re-made the patch such that it now checks for __ARM_ARCH, as per your
suggestion. Does it look better?

Thanks,
Christos
---
 newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 ++++
 newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
index 61a72581..465a5a19 100644
--- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
@@ -49,9 +49,13 @@ __aeabi_memmove:
 	subs	r3, r3, #1
 	bcs	1b
 2:
+#if __ARM_ARCH >= 5
+	pop	{r4, pc}
+#else
 	pop	{r4}
 	pop	{r1}
 	bx	r1
+#endif
 3:
 	movs	r3, #0
 	cmp	r2, #0
diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
index aa8f2719..52094a7b 100644
--- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
@@ -110,9 +110,13 @@ __aeabi_memset:
 	cmp	r4, r3
 	bne	8b
 9:
+#if __ARM_ARCH >= 5
+	pop	{r4, r5, r6, pc}
+#else
 	pop	{r4, r5, r6}
 	pop	{r1}
 	bx	r1
+#endif
 10:
 	movs	r3, r0
 	movs	r4, r1
-- 
2.23.0
Richard Earnshaw (lists) Oct. 7, 2019, 2:30 p.m. | #4
Thanks.  I tweaked this slightly to include the compatibility header 
acle-compat.h and put this in.

R.

On 30/09/2019 18:16, Christos Gentsos wrote:
> On Mon, Sep 30 2019 at 16:37:26 +0100, Richard Earnshaw (lists) wrote:

>> On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:

>>> On 30/09/2019 15:55, Christos Gentsos wrote:

>>>> The same pop instruction that is used to restore registers can be used

>>>> to return from the function (as it is already done in other function

>>>> implementations).

>>>> ---

>>>>    newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---

>>>>    newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---

>>>>    2 files changed, 2 insertions(+), 6 deletions(-)

>>>>

>>>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>>>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>>>> index 61a72581..a0aad852 100644

>>>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>>>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

>>>> @@ -49,9 +49,7 @@ __aeabi_memmove:

>>>>        subs    r3, r3, #1

>>>>        bcs    1b

>>>>    2:

>>>> -    pop    {r4}

>>>> -    pop    {r1}

>>>> -    bx    r1

>>>> +    pop    {r4, pc}

>>>>    3:

>>>>        movs    r3, #0

>>>>        cmp    r2, #0

>>>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S

>>>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S

>>>> index aa8f2719..5bb80b20 100644

>>>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S

>>>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S

>>>> @@ -110,9 +110,7 @@ __aeabi_memset:

>>>>        cmp    r4, r3

>>>>        bne    8b

>>>>    9:

>>>> -    pop    {r4, r5, r6}

>>>> -    pop    {r1}

>>>> -    bx    r1

>>>> +    pop    {r4, r5, r6, pc}

>>>>    10:

>>>>        movs    r3, r0

>>>>        movs    r4, r1

>>>>

>>>

>>> No.  That isn't interworking clean on armv4t, which we still need to

>>> support.

>>>

>>> Sorry.

>>>

>>> R.

>>

>> However, a patch that tests __ARM_ARCH >=5 and uses your improved

>> sequence only in that case (preserving the old code otherwise) would

>> probably be OK :-)

>>

>> R.

> 

> Oh sorry then, I wasn't aware of that, thanks for the correction. I

> re-made the patch such that it now checks for __ARM_ARCH, as per your

> suggestion. Does it look better?

> 

> Thanks,

> Christos

> ---

>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 ++++

>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 ++++

>   2 files changed, 8 insertions(+)

> 

> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

> index 61a72581..465a5a19 100644

> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S

> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S

> @@ -49,9 +49,13 @@ __aeabi_memmove:

>   	subs	r3, r3, #1

>   	bcs	1b

>   2:

> +#if __ARM_ARCH >= 5

> +	pop	{r4, pc}

> +#else

>   	pop	{r4}

>   	pop	{r1}

>   	bx	r1

> +#endif

>   3:

>   	movs	r3, #0

>   	cmp	r2, #0

> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S

> index aa8f2719..52094a7b 100644

> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S

> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S

> @@ -110,9 +110,13 @@ __aeabi_memset:

>   	cmp	r4, r3

>   	bne	8b

>   9:

> +#if __ARM_ARCH >= 5

> +	pop	{r4, r5, r6, pc}

> +#else

>   	pop	{r4, r5, r6}

>   	pop	{r1}

>   	bx	r1

> +#endif

>   10:

>   	movs	r3, r0

>   	movs	r4, r1

>

Patch

diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
index 61a72581..a0aad852 100644
--- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
@@ -49,9 +49,7 @@  __aeabi_memmove:
 	subs	r3, r3, #1
 	bcs	1b
 2:
-	pop	{r4}
-	pop	{r1}
-	bx	r1
+	pop	{r4, pc}
 3:
 	movs	r3, #0
 	cmp	r2, #0
diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
index aa8f2719..5bb80b20 100644
--- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
@@ -110,9 +110,7 @@  __aeabi_memset:
 	cmp	r4, r3
 	bne	8b
 9:
-	pop	{r4, r5, r6}
-	pop	{r1}
-	bx	r1
+	pop	{r4, r5, r6, pc}
 10:
 	movs	r3, r0
 	movs	r4, r1