Patches for targeting AArch64 Darwin with clang

Message ID 983159DB-FF02-4264-A7F2-AC963A4C68F7@siguza.net
State New
Headers show
Series
  • Patches for targeting AArch64 Darwin with clang
Related show

Commit Message

R. Diez via Newlib Jan. 11, 2021, 11:15 p.m.
Hi

We at the checkra1n team are using Newlib as the standard library of a pre-boot bare metal execution environment on jailbroken iPhones (i.e. aarch64).
As our target is using the Darwin ABI and we're building with clang, we had to apply some patches. We'd like to upstream those.

The first two patches should be uncontroversial. They merely consist of:
1. an additional header include (which causes a warning for Linux/ELF targets, but which seems to be fatal when targeting Darwin).
2. a change that makes all AArch64 "p2align" directives default to 2 rather than 0 (which I'm assuming is done implicitly anyway for non-Darwin targets?).

The third patch changes SIMD/Neon register arguments in instructions that move between general-purpose and vector registers.
This is requires when building with clang, even for non-Darwin targets. As far as I can tell, the "d" in "reg.d[0]" does not appear in the ARMv8 Reference Manual and is a gcc-specific thing. I'm assuming it has no actual meaning and gcc just silently ignores it, but I didn't find any actual documentation on that.

The fourth patch makes all the AArch64 assembly files compatible with the Darwin ABI. In particular:
- The .type and .size directives are illegal for Darwin targets, so they are wrapped in "#ifndef __APPLE__" blocks.
- Macro invocations must separate arguments by commas, otherwise they are concatenated and treated as one argument. This should work on all targets and not require any ifdefs.
- Darwin prefixes C symbols with an underscore, so the assembly for e.g. memcpy has to use _memcpy as label. I figured the least invasive patch for this was to just #define these symbols when targeting Darwin.
- In one case there was a "b.hs memcpy". Darwin seems to not allow jumping to external labels in conditional branches, so I replaced that with a conditional jump to a local label, followed by an unconditional jump to the external one.

Please find the patches attached below.

- Siguza



From 461d0a53041b94d23c3dd76b785b60b675ebdaa5 Mon Sep 17 00:00:00 2001
From: Siguza <siguza@siguza.net>

Date: Mon, 11 Jan 2021 22:47:57 +0100
Subject: [PATCH 1/4] Fix include of _memalign_r in aligned_alloc.c

---
 newlib/libc/stdlib/aligned_alloc.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.24.3 (Apple Git-128)

Comments

R. Diez via Newlib Jan. 14, 2021, 7 p.m. | #1
On 11/01/2021 23:15, Siguza via Newlib wrote:
> Hi

> 

> We at the checkra1n team are using Newlib as the standard library of a pre-boot bare metal execution environment on jailbroken iPhones (i.e. aarch64).

> As our target is using the Darwin ABI and we're building with clang, we had to apply some patches. We'd like to upstream those.

> 

> The first two patches should be uncontroversial. They merely consist of:

> 1. an additional header include (which causes a warning for Linux/ELF targets, but which seems to be fatal when targeting Darwin).

> 2. a change that makes all AArch64 "p2align" directives default to 2 rather than 0 (which I'm assuming is done implicitly anyway for non-Darwin targets?).

> 

> The third patch changes SIMD/Neon register arguments in instructions that move between general-purpose and vector registers.

> This is requires when building with clang, even for non-Darwin targets. As far as I can tell, the "d" in "reg.d[0]" does not appear in the ARMv8 Reference Manual and is a gcc-specific thing. I'm assuming it has no actual meaning and gcc just silently ignores it, but I didn't find any actual documentation on that.

> 

> The fourth patch makes all the AArch64 assembly files compatible with the Darwin ABI. In particular:

> - The .type and .size directives are illegal for Darwin targets, so they are wrapped in "#ifndef __APPLE__" blocks.

> - Macro invocations must separate arguments by commas, otherwise they are concatenated and treated as one argument. This should work on all targets and not require any ifdefs.

> - Darwin prefixes C symbols with an underscore, so the assembly for e.g. memcpy has to use _memcpy as label. I figured the least invasive patch for this was to just #define these symbols when targeting Darwin.

> - In one case there was a "b.hs memcpy". Darwin seems to not allow jumping to external labels in conditional branches, so I replaced that with a conditional jump to a local label, followed by an unconditional jump to the external one.

> 

> Please find the patches attached below.

> 

> - Siguza

> 


Please separate this out into individual patches for each issue.  It's
difficult to review it when it's all mixed together and for longer term
maintenance we'll probably want to commit the changes separately as well.

The best way to handle label prefixes for public functions is to define
something similar to the way the Arm port does this.

#define CONCAT(a, b) CONCAT2(a, b)
#define CONCAT2(a, b) a ## b

#ifdef __USER_LABEL_PREFIX__
#define FUNCTION( name ) CONCAT (__USER_LABEL_PREFIX__, name)
#else
#error __USER_LABEL_PREFIX is not defined
#endif

I'd expect your C compiler to define __USER_LABEL_PREFIX__ appropriately
for your platform (GCC does this anyway).

Now you can just wrap all public names with FUNCTION() and the
preprocessor will handle this automatically.   This is significantly
preferable to littering the code with platform-specific ifdefs.

R.

> 

> 

> From 461d0a53041b94d23c3dd76b785b60b675ebdaa5 Mon Sep 17 00:00:00 2001

> From: Siguza <siguza@siguza.net>

> Date: Mon, 11 Jan 2021 22:47:57 +0100

> Subject: [PATCH 1/4] Fix include of _memalign_r in aligned_alloc.c

> 

> ---

>  newlib/libc/stdlib/aligned_alloc.c | 1 +

>  1 file changed, 1 insertion(+)

> 

> diff --git a/newlib/libc/stdlib/aligned_alloc.c b/newlib/libc/stdlib/aligned_alloc.c

> index feb22c24b..ad8887bd0 100644

> --- a/newlib/libc/stdlib/aligned_alloc.c

> +++ b/newlib/libc/stdlib/aligned_alloc.c

> @@ -26,6 +26,7 @@

>     NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS

>     SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */

>  

> +#include <malloc.h>

>  #include <reent.h>

>  #include <stdlib.h>

>  

>

Patch

diff --git a/newlib/libc/stdlib/aligned_alloc.c b/newlib/libc/stdlib/aligned_alloc.c
index feb22c24b..ad8887bd0 100644
--- a/newlib/libc/stdlib/aligned_alloc.c
+++ b/newlib/libc/stdlib/aligned_alloc.c
@@ -26,6 +26,7 @@ 
    NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
    SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */
 
+#include <malloc.h>
 #include <reent.h>
 #include <stdlib.h>
 
-- 
2.24.3 (Apple Git-128)



From f9342c71fbcf968c26395ce0f1532266602b07af Mon Sep 17 00:00:00 2001
From: Siguza <siguza@siguza.net>
Date: Mon, 11 Jan 2021 22:52:11 +0100
Subject: [PATCH 2/4] Make aarch64 p2align default to 2

---
 newlib/libc/machine/aarch64/memchr.S    | 2 +-
 newlib/libc/machine/aarch64/memcmp.S    | 2 +-
 newlib/libc/machine/aarch64/memcpy.S    | 2 +-
 newlib/libc/machine/aarch64/memmove.S   | 2 +-
 newlib/libc/machine/aarch64/memset.S    | 2 +-
 newlib/libc/machine/aarch64/rawmemchr.S | 3 +--
 newlib/libc/machine/aarch64/setjmp.S    | 2 ++
 newlib/libc/machine/aarch64/strchr.S    | 2 +-
 newlib/libc/machine/aarch64/strchrnul.S | 2 +-
 newlib/libc/machine/aarch64/strcmp.S    | 2 +-
 newlib/libc/machine/aarch64/strcpy.S    | 2 +-
 newlib/libc/machine/aarch64/strlen.S    | 2 +-
 newlib/libc/machine/aarch64/strncmp.S   | 2 +-
 newlib/libc/machine/aarch64/strnlen.S   | 2 +-
 newlib/libc/machine/aarch64/strrchr.S   | 2 +-
 15 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/newlib/libc/machine/aarch64/memchr.S b/newlib/libc/machine/aarch64/memchr.S
index 53f5d6bc0..91c2af22d 100644
--- a/newlib/libc/machine/aarch64/memchr.S
+++ b/newlib/libc/machine/aarch64/memchr.S
@@ -70,7 +70,7 @@ 
  * identify exactly which byte has matched.
  */
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/memcmp.S b/newlib/libc/machine/aarch64/memcmp.S
index 605d99365..981baab3c 100644
--- a/newlib/libc/machine/aarch64/memcmp.S
+++ b/newlib/libc/machine/aarch64/memcmp.S
@@ -81,7 +81,7 @@ 
 #define tmp1		x7
 #define tmp2		x8
 
-        .macro def_fn f p2align=0
+        .macro def_fn f p2align=2
         .text
         .p2align \p2align
         .global \f
diff --git a/newlib/libc/machine/aarch64/memcpy.S b/newlib/libc/machine/aarch64/memcpy.S
index 463bad0a1..d2de7415d 100644
--- a/newlib/libc/machine/aarch64/memcpy.S
+++ b/newlib/libc/machine/aarch64/memcpy.S
@@ -87,7 +87,7 @@ 
 
 #define L(l) .L ## l
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/memmove.S b/newlib/libc/machine/aarch64/memmove.S
index 597a8c8e9..6da548f10 100644
--- a/newlib/libc/machine/aarch64/memmove.S
+++ b/newlib/libc/machine/aarch64/memmove.S
@@ -61,7 +61,7 @@ 
 /* See memmove-stub.c  */
 #else
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/memset.S b/newlib/libc/machine/aarch64/memset.S
index 103e3f8bb..cad9117b7 100644
--- a/newlib/libc/machine/aarch64/memset.S
+++ b/newlib/libc/machine/aarch64/memset.S
@@ -77,7 +77,7 @@ 
 
 #define L(l) .L ## l
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/rawmemchr.S b/newlib/libc/machine/aarch64/rawmemchr.S
index 26da81005..484971b3f 100644
--- a/newlib/libc/machine/aarch64/rawmemchr.S
+++ b/newlib/libc/machine/aarch64/rawmemchr.S
@@ -36,7 +36,7 @@ 
 
 #define L(l) .L ## l
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
@@ -65,4 +65,3 @@  L(do_strlen):
 
 	.size   rawmemchr, . - rawmemchr
 #endif
-
diff --git a/newlib/libc/machine/aarch64/setjmp.S b/newlib/libc/machine/aarch64/setjmp.S
index 0856145bf..fde0e45a7 100644
--- a/newlib/libc/machine/aarch64/setjmp.S
+++ b/newlib/libc/machine/aarch64/setjmp.S
@@ -43,6 +43,7 @@ 
 
 // int setjmp (jmp_buf)
 	.global	setjmp
+	.p2align	2
 	.type	setjmp, %function
 setjmp:
 	mov	x16, sp
@@ -58,6 +59,7 @@  setjmp:
 
 // void longjmp (jmp_buf, int) __attribute__ ((noreturn))
 	.global	longjmp
+	.p2align	2
 	.type	longjmp, %function
 longjmp:
 #define REG_PAIR(REG1, REG2, OFFS)	ldp REG1, REG2, [x0, OFFS]
diff --git a/newlib/libc/machine/aarch64/strchr.S b/newlib/libc/machine/aarch64/strchr.S
index 2448dbc7d..5fc0fd06e 100644
--- a/newlib/libc/machine/aarch64/strchr.S
+++ b/newlib/libc/machine/aarch64/strchr.S
@@ -74,7 +74,7 @@ 
 
 /* Locals and temporaries.  */
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/strchrnul.S b/newlib/libc/machine/aarch64/strchrnul.S
index a0ac13b7f..99fba3128 100644
--- a/newlib/libc/machine/aarch64/strchrnul.S
+++ b/newlib/libc/machine/aarch64/strchrnul.S
@@ -70,7 +70,7 @@ 
 
 /* Locals and temporaries.  */
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/strcmp.S b/newlib/libc/machine/aarch64/strcmp.S
index e2bef2d49..cabcf4faa 100644
--- a/newlib/libc/machine/aarch64/strcmp.S
+++ b/newlib/libc/machine/aarch64/strcmp.S
@@ -33,7 +33,7 @@ 
 /* See strcmp-stub.c  */
 #else
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/strcpy.S b/newlib/libc/machine/aarch64/strcpy.S
index e5405f253..95533de60 100644
--- a/newlib/libc/machine/aarch64/strcpy.S
+++ b/newlib/libc/machine/aarch64/strcpy.S
@@ -72,7 +72,7 @@ 
 #define STRCPY strcpy
 #endif
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/strlen.S b/newlib/libc/machine/aarch64/strlen.S
index 872d136ef..7e6ced01d 100644
--- a/newlib/libc/machine/aarch64/strlen.S
+++ b/newlib/libc/machine/aarch64/strlen.S
@@ -55,7 +55,7 @@ 
 
 #define L(l) .L ## l
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/strncmp.S b/newlib/libc/machine/aarch64/strncmp.S
index ffdabc260..b218e95a7 100644
--- a/newlib/libc/machine/aarch64/strncmp.S
+++ b/newlib/libc/machine/aarch64/strncmp.S
@@ -33,7 +33,7 @@ 
  * ARMv8-a, AArch64
  */
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/strnlen.S b/newlib/libc/machine/aarch64/strnlen.S
index c255c3f7c..0eb742412 100644
--- a/newlib/libc/machine/aarch64/strnlen.S
+++ b/newlib/libc/machine/aarch64/strnlen.S
@@ -55,7 +55,7 @@ 
 #define pos		x13
 #define limit_wd	x14
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
diff --git a/newlib/libc/machine/aarch64/strrchr.S b/newlib/libc/machine/aarch64/strrchr.S
index d64fc09b1..8cf8d302d 100644
--- a/newlib/libc/machine/aarch64/strrchr.S
+++ b/newlib/libc/machine/aarch64/strrchr.S
@@ -80,7 +80,7 @@ 
 
 /* Locals and temporaries.  */
 
-	.macro def_fn f p2align=0
+	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
-- 
2.24.3 (Apple Git-128)



From 779f336fc4bfae8933b141460bff1c53f29effad Mon Sep 17 00:00:00 2001
From: Siguza <siguza@siguza.net>
Date: Mon, 11 Jan 2021 22:54:12 +0100
Subject: [PATCH 3/4] Make aarch64 assembly clang-compatible

---
 newlib/libc/machine/aarch64/memchr.S    |  6 +++---
 newlib/libc/machine/aarch64/strchr.S    |  6 +++---
 newlib/libc/machine/aarch64/strchrnul.S |  6 +++---
 newlib/libc/machine/aarch64/strrchr.S   | 10 +++++-----
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/newlib/libc/machine/aarch64/memchr.S b/newlib/libc/machine/aarch64/memchr.S
index 91c2af22d..8389c8a50 100644
--- a/newlib/libc/machine/aarch64/memchr.S
+++ b/newlib/libc/machine/aarch64/memchr.S
@@ -110,7 +110,7 @@  def_fn memchr
 	and	vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b
 	addp	vend.16b, vhas_chr1.16b, vhas_chr2.16b		/* 256->128 */
 	addp	vend.16b, vend.16b, vend.16b			/* 128->64 */
-	mov	synd, vend.2d[0]
+	mov	synd, vend.d[0]
 	/* Clear the soff*2 lower bits */
 	lsl	tmp, soff, #1
 	lsr	synd, synd, tmp
@@ -130,7 +130,7 @@  def_fn memchr
 	/* Use a fast check for the termination condition */
 	orr	vend.16b, vhas_chr1.16b, vhas_chr2.16b
 	addp	vend.2d, vend.2d, vend.2d
-	mov	synd, vend.2d[0]
+	mov	synd, vend.d[0]
 	/* We're not out of data, loop if we haven't found the character */
 	cbz	synd, .Lloop
 
@@ -140,7 +140,7 @@  def_fn memchr
 	and	vhas_chr2.16b, vhas_chr2.16b, vrepmask.16b
 	addp	vend.16b, vhas_chr1.16b, vhas_chr2.16b		/* 256->128 */
 	addp	vend.16b, vend.16b, vend.16b			/* 128->64 */
-	mov	synd, vend.2d[0]
+	mov	synd, vend.d[0]
 	/* Only do the clear for the last possible block */
 	b.hi	.Ltail
 
diff --git a/newlib/libc/machine/aarch64/strchr.S b/newlib/libc/machine/aarch64/strchr.S
index 5fc0fd06e..8ed6ef673 100644
--- a/newlib/libc/machine/aarch64/strchr.S
+++ b/newlib/libc/machine/aarch64/strchr.S
@@ -117,7 +117,7 @@  def_fn strchr
 	addp	vend1.16b, vend1.16b, vend2.16b		// 128->64
 	lsr	tmp1, tmp3, tmp1
 
-	mov	tmp3, vend1.2d[0]
+	mov	tmp3, vend1.d[0]
 	bic	tmp1, tmp3, tmp1	// Mask padding bits.
 	cbnz	tmp1, .Ltail
 
@@ -132,7 +132,7 @@  def_fn strchr
 	orr	vend2.16b, vhas_nul2.16b, vhas_chr2.16b
 	orr	vend1.16b, vend1.16b, vend2.16b
 	addp	vend1.2d, vend1.2d, vend1.2d
-	mov	tmp1, vend1.2d[0]
+	mov	tmp1, vend1.d[0]
 	cbz	tmp1, .Lloop
 
 	/* Termination condition found.  Now need to establish exactly why
@@ -146,7 +146,7 @@  def_fn strchr
 	addp	vend1.16b, vend1.16b, vend2.16b		// 256->128
 	addp	vend1.16b, vend1.16b, vend2.16b		// 128->64
 
-	mov	tmp1, vend1.2d[0]
+	mov	tmp1, vend1.d[0]
 .Ltail:
 	/* Count the trailing zeros, by bit reversing...  */
 	rbit	tmp1, tmp1
diff --git a/newlib/libc/machine/aarch64/strchrnul.S b/newlib/libc/machine/aarch64/strchrnul.S
index 99fba3128..0e257fa06 100644
--- a/newlib/libc/machine/aarch64/strchrnul.S
+++ b/newlib/libc/machine/aarch64/strchrnul.S
@@ -109,7 +109,7 @@  def_fn strchrnul
 	addp	vend1.16b, vend1.16b, vend1.16b		// 128->64
 	lsr	tmp1, tmp3, tmp1
 
-	mov	tmp3, vend1.2d[0]
+	mov	tmp3, vend1.d[0]
 	bic	tmp1, tmp3, tmp1	// Mask padding bits.
 	cbnz	tmp1, .Ltail
 
@@ -124,7 +124,7 @@  def_fn strchrnul
 	orr	vhas_chr2.16b, vhas_nul2.16b, vhas_chr2.16b
 	orr	vend1.16b, vhas_chr1.16b, vhas_chr2.16b
 	addp	vend1.2d, vend1.2d, vend1.2d
-	mov	tmp1, vend1.2d[0]
+	mov	tmp1, vend1.d[0]
 	cbz	tmp1, .Lloop
 
 	/* Termination condition found.  Now need to establish exactly why
@@ -134,7 +134,7 @@  def_fn strchrnul
 	addp	vend1.16b, vhas_chr1.16b, vhas_chr2.16b		// 256->128
 	addp	vend1.16b, vend1.16b, vend1.16b		// 128->64
 
-	mov	tmp1, vend1.2d[0]
+	mov	tmp1, vend1.d[0]
 .Ltail:
 	/* Count the trailing zeros, by bit reversing...  */
 	rbit	tmp1, tmp1
diff --git a/newlib/libc/machine/aarch64/strrchr.S b/newlib/libc/machine/aarch64/strrchr.S
index 8cf8d302d..ee425c42b 100644
--- a/newlib/libc/machine/aarch64/strrchr.S
+++ b/newlib/libc/machine/aarch64/strrchr.S
@@ -120,10 +120,10 @@  def_fn strrchr
 	addp	vhas_chr1.16b, vhas_chr1.16b, vhas_chr2.16b	// 256->128
 	addp	vhas_nul1.16b, vhas_nul1.16b, vhas_nul1.16b	// 128->64
 	addp	vhas_chr1.16b, vhas_chr1.16b, vhas_chr1.16b	// 128->64
-	mov	nul_match, vhas_nul1.2d[0]
+	mov	nul_match, vhas_nul1.d[0]
 	lsl	tmp1, tmp1, #1
 	mov	const_m1, #~0
-	mov	chr_match, vhas_chr1.2d[0]
+	mov	chr_match, vhas_chr1.d[0]
 	lsr	tmp3, const_m1, tmp1
 
 	bic	nul_match, nul_match, tmp3	// Mask padding bits.
@@ -146,15 +146,15 @@  def_fn strrchr
 	addp	vhas_chr1.16b, vhas_chr1.16b, vhas_chr2.16b	// 256->128
 	addp	vend1.16b, vend1.16b, vend1.16b	// 128->64
 	addp	vhas_chr1.16b, vhas_chr1.16b, vhas_chr1.16b	// 128->64
-	mov	nul_match, vend1.2d[0]
-	mov	chr_match, vhas_chr1.2d[0]
+	mov	nul_match, vend1.d[0]
+	mov	chr_match, vhas_chr1.d[0]
 	cbz	nul_match, .Lloop
 
 	and	vhas_nul1.16b, vhas_nul1.16b, vrepmask_0.16b
 	and	vhas_nul2.16b, vhas_nul2.16b, vrepmask_0.16b
 	addp	vhas_nul1.16b, vhas_nul1.16b, vhas_nul2.16b
 	addp	vhas_nul1.16b, vhas_nul1.16b, vhas_nul1.16b
-	mov	nul_match, vhas_nul1.2d[0]
+	mov	nul_match, vhas_nul1.d[0]
 
 .Ltail:
 	/* Work out exactly where the string ends.  */
-- 
2.24.3 (Apple Git-128)



From d80083fccf21ab7664732d88978d982c1bc99080 Mon Sep 17 00:00:00 2001
From: Siguza <siguza@siguza.net>
Date: Mon, 11 Jan 2021 23:01:35 +0100
Subject: [PATCH 4/4] Make aarch64 support the Darwin ABI

---
 newlib/libc/machine/aarch64/memchr.S    |  8 ++++++++
 newlib/libc/machine/aarch64/memcmp.S    | 10 +++++++++-
 newlib/libc/machine/aarch64/memcpy.S    | 10 +++++++++-
 newlib/libc/machine/aarch64/memmove.S   | 14 +++++++++++++-
 newlib/libc/machine/aarch64/memset.S    | 10 +++++++++-
 newlib/libc/machine/aarch64/rawmemchr.S | 12 +++++++++++-
 newlib/libc/machine/aarch64/setjmp.S    | 14 ++++++++++++++
 newlib/libc/machine/aarch64/strchr.S    |  8 ++++++++
 newlib/libc/machine/aarch64/strchrnul.S |  8 ++++++++
 newlib/libc/machine/aarch64/strcmp.S    | 12 ++++++++++--
 newlib/libc/machine/aarch64/strcpy.S    | 14 +++++++++++++-
 newlib/libc/machine/aarch64/strlen.S    | 10 +++++++++-
 newlib/libc/machine/aarch64/strncmp.S   |  9 +++++++++
 newlib/libc/machine/aarch64/strnlen.S   | 10 +++++++++-
 newlib/libc/machine/aarch64/strrchr.S   |  8 ++++++++
 15 files changed, 147 insertions(+), 10 deletions(-)

diff --git a/newlib/libc/machine/aarch64/memchr.S b/newlib/libc/machine/aarch64/memchr.S
index 8389c8a50..7025919a0 100644
--- a/newlib/libc/machine/aarch64/memchr.S
+++ b/newlib/libc/machine/aarch64/memchr.S
@@ -70,11 +70,17 @@ 
  * identify exactly which byte has matched.
  */
 
+#ifdef __APPLE__
+#   define memchr _memchr
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -172,5 +178,7 @@  def_fn memchr
 	mov	result, #0
 	ret
 
+#ifndef __APPLE__
 	.size	memchr, . - memchr
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/memcmp.S b/newlib/libc/machine/aarch64/memcmp.S
index 981baab3c..95a7d2a8c 100644
--- a/newlib/libc/machine/aarch64/memcmp.S
+++ b/newlib/libc/machine/aarch64/memcmp.S
@@ -81,15 +81,21 @@ 
 #define tmp1		x7
 #define tmp2		x8
 
+#ifdef __APPLE__
+#   define memcmp _memcmp
+#endif
+
         .macro def_fn f p2align=2
         .text
         .p2align \p2align
         .global \f
+#ifndef __APPLE__
         .type \f, %function
+#endif
 \f:
         .endm
 
-def_fn memcmp p2align=6
+def_fn memcmp, p2align=6
 	subs	limit, limit, 8
 	b.lo	L(less8)
 
@@ -192,5 +198,7 @@  L(byte_loop):
 	sub	result, data1w, data2w
 	ret
 
+#ifndef __APPLE__
 	.size	memcmp, . - memcmp
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/memcpy.S b/newlib/libc/machine/aarch64/memcpy.S
index d2de7415d..d9d3ef20f 100644
--- a/newlib/libc/machine/aarch64/memcpy.S
+++ b/newlib/libc/machine/aarch64/memcpy.S
@@ -87,11 +87,17 @@ 
 
 #define L(l) .L ## l
 
+#ifdef __APPLE__
+#   define memcpy _memcpy
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -104,7 +110,7 @@ 
    well as non-overlapping copies.
 */
 
-def_fn memcpy p2align=6
+def_fn memcpy, p2align=6
 	prfm	PLDL1KEEP, [src]
 	add	srcend, src, count
 	add	dstend, dstin, count
@@ -226,5 +232,7 @@  L(copy_long):
 	stp	C_l, C_h, [dstend, -16]
 	ret
 
+#ifndef __APPLE__
 	.size	memcpy, . - memcpy
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/memmove.S b/newlib/libc/machine/aarch64/memmove.S
index 6da548f10..395482061 100644
--- a/newlib/libc/machine/aarch64/memmove.S
+++ b/newlib/libc/machine/aarch64/memmove.S
@@ -61,11 +61,18 @@ 
 /* See memmove-stub.c  */
 #else
 
+#ifdef __APPLE__
+#   define memcpy _memcpy
+#   define memmove _memmove
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -97,8 +104,11 @@  def_fn memmove, 6
 	sub	tmp1, dstin, src
 	cmp	count, 96
 	ccmp	tmp1, count, 2, hi
-	b.hs	memcpy
+	/* Darwin can't use b.hs to jump to external labels. */
+	b.lo	0f
+	b	memcpy
 
+0:
 	cbz	tmp1, 3f
 	add	dstend, dstin, count
 	add	srcend, src, count
@@ -151,5 +161,7 @@  def_fn memmove, 6
 	stp	C_l, C_h, [dstin]
 3:	ret
 
+#ifndef __APPLE__
 	.size	memmove, . - memmove
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/memset.S b/newlib/libc/machine/aarch64/memset.S
index cad9117b7..7bf190943 100644
--- a/newlib/libc/machine/aarch64/memset.S
+++ b/newlib/libc/machine/aarch64/memset.S
@@ -77,15 +77,21 @@ 
 
 #define L(l) .L ## l
 
+#ifdef __APPLE__
+#   define memset _memset
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
-def_fn memset p2align=6
+def_fn memset, p2align=6
 
 	dup	v0.16B, valw
 	add	dstend, dstin, count
@@ -236,5 +242,7 @@  L(zva_other):
 	sub	dst, dst, 32		/* Bias dst for tail loop.  */
 	b	L(tail64)
 
+#ifndef __APPLE__
 	.size	memset, . - memset
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/rawmemchr.S b/newlib/libc/machine/aarch64/rawmemchr.S
index 484971b3f..9f37a4d83 100644
--- a/newlib/libc/machine/aarch64/rawmemchr.S
+++ b/newlib/libc/machine/aarch64/rawmemchr.S
@@ -36,11 +36,19 @@ 
 
 #define L(l) .L ## l
 
+#ifdef __APPLE__
+#   define memchr _memchr
+#   define rawmemchr _rawmemchr
+#   define strlen _strlen
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -48,7 +56,7 @@ 
    Call strlen without setting up a full frame - it preserves x14/x15.
 */
 
-def_fn rawmemchr p2align=5
+def_fn rawmemchr, p2align=5
 	.cfi_startproc
 	cbz	w1, L(do_strlen)
 	mov	x2, -1
@@ -63,5 +71,7 @@  L(do_strlen):
 	ret	x15
 	.cfi_endproc
 
+#ifndef __APPLE__
 	.size   rawmemchr, . - rawmemchr
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/setjmp.S b/newlib/libc/machine/aarch64/setjmp.S
index fde0e45a7..0335b6729 100644
--- a/newlib/libc/machine/aarch64/setjmp.S
+++ b/newlib/libc/machine/aarch64/setjmp.S
@@ -41,10 +41,17 @@ 
 	REG_PAIR (d12, d13, 144);	\
 	REG_PAIR (d14, d15, 160);
 
+#ifdef __APPLE__
+#   define setjmp _setjmp
+#   define longjmp _longjmp
+#endif
+
 // int setjmp (jmp_buf)
 	.global	setjmp
 	.p2align	2
+#ifndef __APPLE__
 	.type	setjmp, %function
+#endif
 setjmp:
 	mov	x16, sp
 #define REG_PAIR(REG1, REG2, OFFS)	stp REG1, REG2, [x0, OFFS]
@@ -55,12 +62,16 @@  setjmp:
 #undef REG_ONE
 	mov	w0, #0
 	ret
+#ifndef __APPLE__
 	.size	setjmp, .-setjmp
+#endif
 
 // void longjmp (jmp_buf, int) __attribute__ ((noreturn))
 	.global	longjmp
 	.p2align	2
+#ifndef __APPLE__
 	.type	longjmp, %function
+#endif
 longjmp:
 #define REG_PAIR(REG1, REG2, OFFS)	ldp REG1, REG2, [x0, OFFS]
 #define REG_ONE(REG1, OFFS)		ldr REG1, [x0, OFFS]
@@ -73,4 +84,7 @@  longjmp:
 	cinc	w0, w1, eq
 	// use br not ret, as ret is guaranteed to mispredict
 	br	x30
+
+#ifndef __APPLE__
 	.size	longjmp, .-longjmp
+#endif
diff --git a/newlib/libc/machine/aarch64/strchr.S b/newlib/libc/machine/aarch64/strchr.S
index 8ed6ef673..c7e159b0a 100644
--- a/newlib/libc/machine/aarch64/strchr.S
+++ b/newlib/libc/machine/aarch64/strchr.S
@@ -74,11 +74,17 @@ 
 
 /* Locals and temporaries.  */
 
+#ifdef __APPLE__
+#   define strchr _strchr
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -160,5 +166,7 @@  def_fn strchr
 	csel	result, result, xzr, eq
 	ret
 
+#ifndef __APPLE__
 	.size	strchr, . - strchr
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/strchrnul.S b/newlib/libc/machine/aarch64/strchrnul.S
index 0e257fa06..9f5551f59 100644
--- a/newlib/libc/machine/aarch64/strchrnul.S
+++ b/newlib/libc/machine/aarch64/strchrnul.S
@@ -70,11 +70,17 @@ 
 
 /* Locals and temporaries.  */
 
+#ifdef __APPLE__
+#   define strchrnul _strchrnul
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -145,5 +151,7 @@  def_fn strchrnul
 	add	result, src, tmp1, lsr #1
 	ret
 
+#ifndef __APPLE__
 	.size	strchrnul, . - strchrnul
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/strcmp.S b/newlib/libc/machine/aarch64/strcmp.S
index cabcf4faa..ce6c2f5ad 100644
--- a/newlib/libc/machine/aarch64/strcmp.S
+++ b/newlib/libc/machine/aarch64/strcmp.S
@@ -33,11 +33,17 @@ 
 /* See strcmp-stub.c  */
 #else
 
+#ifdef __APPLE__
+#   define strcmp _strcmp
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -67,7 +73,7 @@ 
 #define pos		x11
 
 	/* Start of performance-critical section  -- one 64B cache line.  */
-def_fn strcmp p2align=6
+def_fn strcmp, p2align=6
 	eor	tmp1, src1, src2
 	mov	zeroones, #REP8_01
 	tst	tmp1, #7
@@ -197,6 +203,8 @@  L(loop_misaligned):
 L(done):
 	sub	result, data1, data2
 	ret
-	.size	strcmp, .-strcmp
 
+#ifndef __APPLE__
+	.size	strcmp, .-strcmp
+#endif
 #endif
diff --git a/newlib/libc/machine/aarch64/strcpy.S b/newlib/libc/machine/aarch64/strcpy.S
index 95533de60..f9b293423 100644
--- a/newlib/libc/machine/aarch64/strcpy.S
+++ b/newlib/libc/machine/aarch64/strcpy.S
@@ -66,17 +66,27 @@ 
 #define len		x16
 #define to_align	x17
 
+#ifdef __APPLE__
+#ifdef BUILD_STPCPY
+#define STRCPY _stpcpy
+#else
+#define STRCPY _strcpy
+#endif
+#else
 #ifdef BUILD_STPCPY
 #define STRCPY stpcpy
 #else
 #define STRCPY strcpy
+#endif
 #endif
 
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -111,7 +121,7 @@ 
 
 #define MIN_PAGE_SIZE (1 << MIN_PAGE_P2)
 
-def_fn STRCPY p2align=6
+def_fn STRCPY, p2align=6
 	/* For moderately short strings, the fastest way to do the copy is to
 	   calculate the length of the string in the same way as strlen, then
 	   essentially do a memcpy of the result.  This avoids the need for
@@ -337,5 +347,7 @@  def_fn STRCPY p2align=6
 	bic	has_nul2, tmp3, tmp4
 	b	.Lfp_gt8
 
+#ifndef __APPLE__
 	.size	STRCPY, . - STRCPY
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/strlen.S b/newlib/libc/machine/aarch64/strlen.S
index 7e6ced01d..c1ef145ea 100644
--- a/newlib/libc/machine/aarch64/strlen.S
+++ b/newlib/libc/machine/aarch64/strlen.S
@@ -55,11 +55,17 @@ 
 
 #define L(l) .L ## l
 
+#ifdef __APPLE__
+#   define strlen _strlen
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -104,7 +110,7 @@ 
 	   whether the first fetch, which may be misaligned, crosses a page
 	   boundary.  */
 
-def_fn strlen p2align=6
+def_fn strlen, p2align=6
 	and	tmp1, srcin, MIN_PAGE_SIZE - 1
 	mov	zeroones, REP8_01
 	cmp	tmp1, MIN_PAGE_SIZE - 16
@@ -234,5 +240,7 @@  L(page_cross):
 	csel	data2, data2, tmp2, eq
 	b	L(page_cross_entry)
 
+#ifndef __APPLE__
 	.size	strlen, . - strlen
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/strncmp.S b/newlib/libc/machine/aarch64/strncmp.S
index b218e95a7..bbae2a083 100644
--- a/newlib/libc/machine/aarch64/strncmp.S
+++ b/newlib/libc/machine/aarch64/strncmp.S
@@ -33,11 +33,17 @@ 
  * ARMv8-a, AArch64
  */
 
+#ifdef __APPLE__
+#   define strncmp _strncmp
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -286,5 +292,8 @@  def_fn strncmp
 .Lret0:
 	mov	result, #0
 	ret
+
+#ifndef __APPLE__
 	.size strncmp, . - strncmp
 #endif
+#endif
diff --git a/newlib/libc/machine/aarch64/strnlen.S b/newlib/libc/machine/aarch64/strnlen.S
index 0eb742412..f6f501fec 100644
--- a/newlib/libc/machine/aarch64/strnlen.S
+++ b/newlib/libc/machine/aarch64/strnlen.S
@@ -55,11 +55,17 @@ 
 #define pos		x13
 #define limit_wd	x14
 
+#ifdef __APPLE__
+#   define strnlen _strnlen
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -182,6 +188,8 @@  def_fn strnlen
 	csinv	data1, data1, xzr, le
 	csel	data2, data2, data2a, le
 	b	.Lrealigned
-	.size	strnlen, . - .Lstart	/* Include pre-padding in size.  */
 
+#ifndef __APPLE__
+	.size	strnlen, . - .Lstart	/* Include pre-padding in size.  */
+#endif
 #endif
diff --git a/newlib/libc/machine/aarch64/strrchr.S b/newlib/libc/machine/aarch64/strrchr.S
index ee425c42b..b65833fe0 100644
--- a/newlib/libc/machine/aarch64/strrchr.S
+++ b/newlib/libc/machine/aarch64/strrchr.S
@@ -80,11 +80,17 @@ 
 
 /* Locals and temporaries.  */
 
+#ifdef __APPLE__
+#   define strrchr _strrchr
+#endif
+
 	.macro def_fn f p2align=2
 	.text
 	.p2align \p2align
 	.global \f
+#ifndef __APPLE__
 	.type \f, %function
+#endif
 \f:
 	.endm
 
@@ -178,5 +184,7 @@  def_fn strrchr
 
 	ret
 
+#ifndef __APPLE__
 	.size	strrchr, . - strrchr
 #endif
+#endif