nvptx: : Add support for popcount and widening multiply instructions

Message ID 000b01d6515e$44913130$cdb39390$@nextmovesoftware.com
State New
Headers show
Series
  • nvptx: : Add support for popcount and widening multiply instructions
Related show

Commit Message

Roger Sayle July 3, 2020, 5:20 p.m.
The following patch adds support for three-input addition instructions to the nvptx backend.
The PTX ISA's "vadd.u32.u32.u32.add d, a, b, c" instruction effectively implements 32-bit d = a+b+c,
and the  "vsub.u32.u32.u32 d,a,b,c" instruction that provides 32-bit d = (a-b)+c.  The hope is that
these mnemonics help ptxas generate the low-level hardware's IADD3 instruction.

Tested by "make" and "make -k check" on --build=nvptx-none hosted on  x86_64-pc-linux-gnu
with no new regressions.

[PATCH] nvptx: Add support for vadd.add and vsub.add instructions

2020-07-03  Roger Sayle  <roger@nextmovesoftware.com>

	gcc/ChangeLog:
	* config/nvptx/nvptx.md (vadd_addsi4): New instruction.
	(vsub_addsi4): New instruction.

	gcc/testsuite/ChangeLog:
	* gcc.target/nvptx/vadd_add.c: New test.
	* gcc.target/nvptx/vsub_add.c: New test.


Hopefully, I've got the patch/diff file format correct this time.
Ok for mainline?

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

-----Original Message-----
From: Tom de Vries <tdevries@suse.de> 

Sent: 02 July 2020 14:29
To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] nvptx: : Add support for popcount and widening multiply instructions

On 7/1/20 3:06 PM, Roger Sayle wrote:
> 

> The following patch adds support for the popc and mul.wide instructions to the nvptx backend.

> I've a follow-up patch for supporting mul.hi instructions, but those 

> changes require some minor tweaks to GCC's middle-end, so I'll submit those pieces separately.

> 

> Tested by "make" and "make -k check" on --build=nvptx-none hosted on 

> x86_64-pc-linux-gnu with no new regressions.

> 

> 2020-07-01  Roger Sayle  <roger@nextmovesoftware.com>

> 

> gcc/ChangeLog:

>         * config/nvptx/nvptx.md (popcount<mode>2): New instructions.

>         (mulhishi3, mulsidi3, umulhisi3, umulsidi3): New instructions.

> 

> gcc/testsuite/ChangeLog:

>         * gcc.target/nvptx/popc-1.c: New test.

>         * gcc.target/nvptx/popc-2.c: New test.

>         * gcc.target/nvptx/popc-3.c: New test.

>         * gcc.target/nvptx/mul-wide.c: New test.

>         * gcc.target/nvptx/umul-wide.c: New test.

> 

> 

> Ok for mainline?

> 


Hi Roger,

LGTM, please apply.

[ Btw, can you next time add the new files to the patch.  That's somewhat more convenient to apply. ]

Thanks
- Tom

Comments

Tom de Vries July 5, 2020, 11:09 a.m. | #1
[ fixed $subject ]

On 7/3/20 7:20 PM, Roger Sayle wrote:
> 

> The following patch adds support for three-input addition instructions to the nvptx backend.

> The PTX ISA's "vadd.u32.u32.u32.add d, a, b, c" instruction effectively implements 32-bit d = a+b+c,

> and the  "vsub.u32.u32.u32 d,a,b,c" instruction that provides 32-bit d = (a-b)+c.  The hope is that

> these mnemonics help ptxas generate the low-level hardware's IADD3 instruction.

> 

> Tested by "make" and "make -k check" on --build=nvptx-none hosted on  x86_64-pc-linux-gnu

> with no new regressions.

> 

> [PATCH] nvptx: Add support for vadd.add and vsub.add instructions

> 

> 2020-07-03  Roger Sayle  <roger@nextmovesoftware.com>

> 

> 	gcc/ChangeLog:

> 	* config/nvptx/nvptx.md (vadd_addsi4): New instruction.

> 	(vsub_addsi4): New instruction.

> 

> 	gcc/testsuite/ChangeLog:

> 	* gcc.target/nvptx/vadd_add.c: New test.

> 	* gcc.target/nvptx/vsub_add.c: New test.

> 

> 

> Hopefully, I've got the patch/diff file format correct this time.

> Ok for mainline?

> 


Hi Roger,

the patch looks fine, please apply.

I wonder though, AFAIU the define_insn names are not standard names, so
could they be defined with the '*' prefix?  If so, you could add that as
well.

Thanks,
- Tom

Patch

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 5ceeac7..11d1d35 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -373,6 +373,22 @@ 
   ""
   "%.\\tadd%t0\\t%0, %1, %2;")
 
+(define_insn "vadd_addsi4"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+        (plus:SI (plus:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+			  (match_operand:SI 2 "nvptx_register_operand" "R"))
+		 (match_operand:SI 3 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tvadd%t0%t1%t2.add\\t%0, %1, %2, %3;")
+
+(define_insn "vsub_addsi4"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+        (plus:SI (minus:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+			   (match_operand:SI 2 "nvptx_register_operand" "R"))
+		 (match_operand:SI 3 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tvsub%t0%t1%t2.add\\t%0, %1, %2, %3;")
+
 (define_insn "sub<mode>3"
   [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
 	(minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
diff --git a/gcc/testsuite/gcc.target/nvptx/vadd_add.c b/gcc/testsuite/gcc.target/nvptx/vadd_add.c
new file mode 100644
index 0000000..dcb2394
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/vadd_add.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x, int y, int z)
+{
+  return x + y + z;
+}
+
+unsigned int bar(unsigned int x, unsigned int y, unsigned int z)
+{
+  return x + y + z;
+}
+
+/* { dg-final { scan-assembler-times "vadd.u32.u32.u32.add" 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/nvptx/vsub_add.c b/gcc/testsuite/gcc.target/nvptx/vsub_add.c
new file mode 100644
index 0000000..3f632c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/vsub_add.c
@@ -0,0 +1,25 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x, int y, int z)
+{
+  return (x - y) + z;
+}
+
+int bar(int x, int y, int z)
+{
+  return x + (y - z);
+}
+
+unsigned int ufoo(unsigned int x, unsigned int y, unsigned int z)
+{
+  return (x - y) + z;
+}
+
+unsigned int ubar(unsigned int x, unsigned int y, unsigned int z)
+{
+  return x + (y - z);
+}
+
+/* { dg-final { scan-assembler-times "vsub.u32.u32.u32.add" 4 } } */
+