Add power10 BRD/BRW/BRH support.

Message ID 1593623042-3202-1-git-send-email-meissner@linux.ibm.com
State New
Headers show
Series
  • Add power10 BRD/BRW/BRH support.
Related show

Commit Message

Carl Love via Gcc-patches July 1, 2020, 5:04 p.m.
This patch adds support for the Power10 BRD, BRW, and BRH instructions that do
byte swapping on values in GPRs.

This patch is a revision of the byte swap patch proposed in:
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546990.html

I changed the DI swap function bswapdi2_brd instead of bswapdi2_hw that I used
in the previous patch as requested.

I have changed all of the places that used 'future' to be 'power10'.

I have split the tests into 3 tests, one that tests BRH, one that tests BRW,
and one that tests BRD.  The BRD test must have a 64-bit ABI, but the other two
do not.

I used '*' for the length attribute where it generates a single instruction.

I did not change the tests in the splitter as I don't see any easier way to do
the test.  For reference, there are 3 cases:

   1) Power10 instruction swapping a GPR which does not get split
   2) Non-power10 instruction splitting the insn into component parts; (and)
   3) Power9 instruction swapping a vector register which does not get split.

I have done bootstrap compilers and run make check with and without this patch
on a little endian power9 system running Linux.  I have also done the boostrap
on a big endian power8 system with both 32/64-bit support, and the tests pass.
Can I check this patch into the master brannch?


gcc/
2020-06-30  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.md (bswaphi2_reg): Generate the BRH
	instruction on ISA 3.1.
	(bswapsi2_reg): Generate the BRW instruction on ISA 3.1.
	(bswapdi2): Rename bswapdi2_xxbrd to bswapdi2_brd.
	(bswapdi2_brd): Rename from bswapdi2_xxbrd.  Generate the BRD
	instruction on ISA 3.1.

gcc/testsuite/
2020-06-30  Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/bswap-brd.c: New test.
	* gcc.target/powerpc/bswap-brw.c: New test.
	* gcc.target/powerpc/bswap-brh.c: New test.
---
 gcc/config/rs6000/rs6000.md                  | 44 +++++++++++++++-------------
 gcc/testsuite/gcc.target/powerpc/bswap-brd.c | 23 +++++++++++++++
 gcc/testsuite/gcc.target/powerpc/bswap-brh.c | 12 ++++++++
 gcc/testsuite/gcc.target/powerpc/bswap-brw.c | 23 +++++++++++++++
 4 files changed, 82 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brh.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brw.c

-- 
1.8.3.1

Comments

Segher Boessenkool July 6, 2020, 6:41 p.m. | #1
Hi!

On Wed, Jul 01, 2020 at 01:04:02PM -0400, Michael Meissner wrote:
> This patch adds support for the Power10 BRD, BRW, and BRH instructions that do

> byte swapping on values in GPRs.


> I used '*' for the length attribute where it generates a single instruction.


More exactly: those that can use the default length calculation.

> I did not change the tests in the splitter as I don't see any easier way to do

> the test.  For reference, there are 3 cases:

> 

>    1) Power10 instruction swapping a GPR which does not get split

>    2) Non-power10 instruction splitting the insn into component parts; (and)

>    3) Power9 instruction swapping a vector register which does not get split.


>  (define_insn_and_split "bswaphi2_reg"


> +   brh %0,%1

>     #

>     xxbrh %x0,%x1"


> +   (set_attr "type" "shift,*,vecperm")

> +   (set_attr "isa" "p10,*,p9v")])


I don't think "shift" will be a good type for this, but we'll see.  (The
insn types are mostly for scheduling...  It is often hard to come up with
something good while there is only one implementation.  Power9 was quite
different from most older chips here.  Hopefully it can improve now we
also have Power10.

>  ;; We are always BITS_BIG_ENDIAN, so the bit positions below in

>  ;; zero_extract insns do not change for -mlittle.

>  (define_insn_and_split "bswapsi2_reg"


(Not new in this patch of course...  There are no zero_extract insns
below, heh).

> @@ -2641,9 +2643,9 @@ (define_insn_and_split "bswapsi2_reg"

>  		(and:SI (match_dup 0)

>  			(const_int -256))))]

>    ""

> -  [(set_attr "length" "12,4")

> -   (set_attr "type" "*,vecperm")

> -   (set_attr "isa" "*,p9v")])

> +  [(set_attr "length" "4,12,4")

> +   (set_attr "type" "shift,*,vecperm")

> +   (set_attr "isa" "p10,*,p9v")])


"length" should use "*" here, as well.

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/powerpc/bswap-brd.c

> @@ -0,0 +1,23 @@

> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */


Everything in gcc.target/powerpc is powerpc*-*-* already.

Why does this neede lp64?  (Put that in a comment.)

> +/* This tests whether GCC generates the ISA 3.1 BRW byte swap instruction for

> +   GPR data, but generates XXBRW for data in a vector register.  */


This comment is wrong.  This tests for "brd" (and "xxbrd"), instead.

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/powerpc/bswap-brh.c

> @@ -0,0 +1,12 @@

> +/* { dg-do compile { target { powerpc*-*-* } } } */


powerpc*-*-* is default.  Don't use it here.

> +/* This tests whether GCC generates the ISA 3.1 16-bit byte swap

> +   instruction BRH.  */


"whether GCC generates "brh" to implement 16-bit byte swap" or such...
(the instruction swaps every pair of bytes).

Okay for trunk with those things fixed.  Thanks!


Segher

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 7baaa61..86c8c02 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2586,15 +2586,16 @@  (define_insn "bswap<mode>2_store"
   [(set_attr "type" "store")])
 
 (define_insn_and_split "bswaphi2_reg"
-  [(set (match_operand:HI 0 "gpc_reg_operand" "=&r,wa")
+  [(set (match_operand:HI 0 "gpc_reg_operand" "=r,&r,wa")
 	(bswap:HI
-	 (match_operand:HI 1 "gpc_reg_operand" "r,wa")))
-   (clobber (match_scratch:SI 2 "=&r,X"))]
+	 (match_operand:HI 1 "gpc_reg_operand" "r,r,wa")))
+   (clobber (match_scratch:SI 2 "=X,&r,X"))]
   ""
   "@
+   brh %0,%1
    #
    xxbrh %x0,%x1"
-  "reload_completed && int_reg_operand (operands[0], HImode)"
+  "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], HImode)"
   [(set (match_dup 3)
 	(and:SI (lshiftrt:SI (match_dup 4)
 			     (const_int 8))
@@ -2610,21 +2611,22 @@  (define_insn_and_split "bswaphi2_reg"
   operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0);
   operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
 }
-  [(set_attr "length" "12,4")
-   (set_attr "type" "*,vecperm")
-   (set_attr "isa" "*,p9v")])
+  [(set_attr "length" "*,12,*")
+   (set_attr "type" "shift,*,vecperm")
+   (set_attr "isa" "p10,*,p9v")])
 
 ;; We are always BITS_BIG_ENDIAN, so the bit positions below in
 ;; zero_extract insns do not change for -mlittle.
 (define_insn_and_split "bswapsi2_reg"
-  [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,wa")
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,&r,wa")
 	(bswap:SI
-	 (match_operand:SI 1 "gpc_reg_operand" "r,wa")))]
+	 (match_operand:SI 1 "gpc_reg_operand" "r,r,wa")))]
   ""
   "@
+   brw %0,%1
    #
    xxbrw %x0,%x1"
-  "reload_completed && int_reg_operand (operands[0], SImode)"
+  "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], SImode)"
   [(set (match_dup 0)					; DABC
 	(rotate:SI (match_dup 1)
 		   (const_int 24)))
@@ -2641,9 +2643,9 @@  (define_insn_and_split "bswapsi2_reg"
 		(and:SI (match_dup 0)
 			(const_int -256))))]
   ""
-  [(set_attr "length" "12,4")
-   (set_attr "type" "*,vecperm")
-   (set_attr "isa" "*,p9v")])
+  [(set_attr "length" "4,12,4")
+   (set_attr "type" "shift,*,vecperm")
+   (set_attr "isa" "p10,*,p9v")])
 
 ;; On systems with LDBRX/STDBRX generate the loads/stores directly, just like
 ;; we do for L{H,W}BRX and ST{H,W}BRX above.  If not, we have to generate more
@@ -2676,7 +2678,7 @@  (define_expand "bswapdi2"
 	  emit_insn (gen_bswapdi2_store (dest, src));
         }
       else if (TARGET_P9_VECTOR)
-	emit_insn (gen_bswapdi2_xxbrd (dest, src));
+	emit_insn (gen_bswapdi2_brd (dest, src));
       else
 	emit_insn (gen_bswapdi2_reg (dest, src));
       DONE;
@@ -2707,13 +2709,15 @@  (define_insn "bswapdi2_store"
   "stdbrx %1,%y0"
   [(set_attr "type" "store")])
 
-(define_insn "bswapdi2_xxbrd"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=wa")
-	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wa")))]
+(define_insn "bswapdi2_brd"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,wa")
+	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "r,wa")))]
   "TARGET_P9_VECTOR"
-  "xxbrd %x0,%x1"
-  [(set_attr "type" "vecperm")
-   (set_attr "isa" "p9v")])
+  "@
+   brd %0,%1
+   xxbrd %x0,%x1"
+  [(set_attr "type" "shift,vecperm")
+   (set_attr "isa" "p10,p9v")])
 
 (define_insn "bswapdi2_reg"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=&r")
diff --git a/gcc/testsuite/gcc.target/powerpc/bswap-brd.c b/gcc/testsuite/gcc.target/powerpc/bswap-brd.c
new file mode 100644
index 0000000..483d1b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/bswap-brd.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* This tests whether GCC generates the ISA 3.1 BRW byte swap instruction for
+   GPR data, but generates XXBRW for data in a vector register.  */
+
+unsigned long long
+bswap_ll (unsigned long long a)
+{
+  return __builtin_bswap64 (a); /* { dg-final { scan-assembler {\mbrd\M} } } */
+}
+
+double
+bswap_ll_dbl (unsigned long long a)
+{
+  unsigned int b = a;
+  /* Force the value to be loaded into a vector register.  */
+  __asm__ (" # %x0" : "+wa" (b));
+
+  /* { dg-final { scan-assembler {\mxxbrd\M} } } */
+  return (double) __builtin_bswap64 (b);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/bswap-brh.c b/gcc/testsuite/gcc.target/powerpc/bswap-brh.c
new file mode 100644
index 0000000..c230bda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/bswap-brh.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* This tests whether GCC generates the ISA 3.1 16-bit byte swap
+   instruction BRH.  */
+
+unsigned short
+bswap_short (unsigned short a)
+{
+  return __builtin_bswap16 (a); /* { dg-final { scan-assembler {\mbrh\M} } } */
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/bswap-brw.c b/gcc/testsuite/gcc.target/powerpc/bswap-brw.c
new file mode 100644
index 0000000..4857c5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/bswap-brw.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* This tests whether GCC generates the ISA 3.1 BRW byte swap instruction for
+   GPR data, but generates XXBRW for data in a vector register.  */
+
+unsigned int
+bswap_int (unsigned int a)
+{
+  return __builtin_bswap32 (a); /* { dg-final { scan-assembler {\mbrw\M} } } */
+}
+
+double
+bswap_int_dbl (unsigned int a)
+{
+  unsigned int b = a;
+  /* Force the value to be loaded into a vector register.  */
+  __asm__ (" # %x0" : "+wa" (b));
+
+  /* { dg-final { scan-assembler {\mxxbrw\M} } } */
+  return (double) __builtin_bswap32 (b);
+}