[1/3] Add IEEE 128-bit min/max support on PowerPC.

Message ID 20210609002125.GA18854@ibm-toto.the-meissners.org
State New
Headers show
Series
  • Add Power10 IEEE 128-bit min, max, conditional move
Related show

Commit Message

Martin Sebor via Gcc-patches June 9, 2021, 12:21 a.m.
[PATCH 1/3] Add IEEE 128-bit min/max support on PowerPC.

This patch adds the support for the IEEE 128-bit floating point C minimum and
maximum instructions.  The next patch will add the support for using the
compare and set mask instruction to implement conditional moves.

This patch does not try to re-use the code used for SF/DF min/max
support.  It defines a separate insn for the IEEE 128-bit support.  It
uses the code iterator <minmax> to simplify adding both operations.

GCC will not convert ternary operations into using min/max instructions
provided in this patch unless the user uses -Ofast or similar switches due to
issues with NaNs.  The next patch that adds conditional move instructions will
enable the ternary conversion in many cases.

Note the code for fixing float128-minmax.c has been moved to a separate
patch.

I tested it on 3 platforms:

    *	Power9 little endian, --with-code=power9;
    *	Power8 big endian, --with-code=power8, both 32/64-bit tests done;
    *	Power10 little endian, --with-code=power10.

All systems bootstrapped and there were no new regressions.  I believe I have
addressed the issues with the last patch.

Can I check this into the master branch, and after a soak-in period, back port
it to the GCC 11 branch?

gcc/
2021-06-08  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA
	3.1 IEEE 128-bit floating point xsmaxcqp and xsmincqp
	instructions.
	* config/rs6000/rs6000.md (s<minmax><mode>3, IEEE128 iterator):
	New insns.

gcc/testsuite/
2021-06-08  Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/float128-minmax-2.c: New test.
---
 gcc/config/rs6000/rs6000.c                        |  3 ++-
 gcc/config/rs6000/rs6000.md                       | 11 +++++++++++
 .../gcc.target/powerpc/float128-minmax-2.c        | 15 +++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c

-- 
2.31.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

Comments

Martin Sebor via Gcc-patches June 17, 2021, 12:32 a.m. | #1
Ping patch.  In particular, we would like to get this into the GCC 11.2
backport as it is power10 enablement.

| Date: Tue, 8 Jun 2021 20:21:25 -0400
| Subject: [PATCH 1/3] Add IEEE 128-bit min/max support on PowerPC.
| Message-ID: <20210609002125.GA18854@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797
Segher Boessenkool June 17, 2021, 5:39 p.m. | #2
On Tue, Jun 08, 2021 at 08:21:25PM -0400, Michael Meissner wrote:
> GCC will not convert ternary operations into using min/max instructions

> provided in this patch unless the user uses -Ofast or similar switches due to

> issues with NaNs.


It will not do it because it is *incorrect* to do :-)

(The RTL operators smin/smax are undefined for any NaN inputs (unless
both are the same bit pattern), and for different sign zeroes; it isn't
only NaNs even).

> gcc/

> 2021-06-08  Michael Meissner  <meissner@linux.ibm.com>

> 

> 	* config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA

> 	3.1 IEEE 128-bit floating point xsmaxcqp and xsmincqp

> 	instructions.


Please don't randomly break lines.

> --- a/gcc/config/rs6000/rs6000.c

> +++ b/gcc/config/rs6000/rs6000.c

> @@ -16103,7 +16103,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx op0, rtx op1)

>    /* VSX/altivec have direct min/max insns.  */

>    if ((code == SMAX || code == SMIN)

>        && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)

> -	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))))

> +	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))

> +	  || (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))))


The actual insns only check TARGET_POWER10 (so no TARGET_FLOAT128_HW).
Which is right, this or that?

> +/* { dg-require-effective-target ppc_float128_hw } */

> +/* { dg-require-effective-target power10_ok } */

> +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */


In testcases we can assume that float128_hw is set whenever we have a
p10; we don't manually disable it to make live hard for ourselves ;-)

We could disallow disabling QP float separately from all other float.
We will still need to test if float is enabled at all so it won't help
all that much immediately, alas.

With that TARGET_POWER10 condition fixed: okay for trunk, and for 11
once it is tested for trunk on all systems.  Thanks!


Segher
Martin Sebor via Gcc-patches June 17, 2021, 7:18 p.m. | #3
On Thu, Jun 17, 2021 at 12:39:04PM -0500, Segher Boessenkool wrote:
> On Tue, Jun 08, 2021 at 08:21:25PM -0400, Michael Meissner wrote:

> > GCC will not convert ternary operations into using min/max instructions

> > provided in this patch unless the user uses -Ofast or similar switches due to

> > issues with NaNs.

> 

> It will not do it because it is *incorrect* to do :-)

> 

> (The RTL operators smin/smax are undefined for any NaN inputs (unless

> both are the same bit pattern), and for different sign zeroes; it isn't

> only NaNs even).

> 

> > gcc/

> > 2021-06-08  Michael Meissner  <meissner@linux.ibm.com>

> > 

> > 	* config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA

> > 	3.1 IEEE 128-bit floating point xsmaxcqp and xsmincqp

> > 	instructions.

> 

> Please don't randomly break lines.

> 

> > --- a/gcc/config/rs6000/rs6000.c

> > +++ b/gcc/config/rs6000/rs6000.c

> > @@ -16103,7 +16103,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx op0, rtx op1)

> >    /* VSX/altivec have direct min/max insns.  */

> >    if ((code == SMAX || code == SMIN)

> >        && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)

> > -	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))))

> > +	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))

> > +	  || (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))))

> 

> The actual insns only check TARGET_POWER10 (so no TARGET_FLOAT128_HW).

> Which is right, this or that?


It should include TARGET_FLOAT128_HW.  The problem area is a power10 running in
big endian mode and running 32-bit code.  Because we don't have TImode, we
can't enable the IEEE 128-bit hardware instructions.  And because we don't have
GLIBC support, we don't enable the FLOAT128 stuff by default on big endian.

> > +/* { dg-require-effective-target ppc_float128_hw } */

> > +/* { dg-require-effective-target power10_ok } */

> > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */

> 

> In testcases we can assume that float128_hw is set whenever we have a

> p10; we don't manually disable it to make live hard for ourselves ;-)


Again, I put it in case somebody builds a BE power10 compiler.

> We could disallow disabling QP float separately from all other float.

> We will still need to test if float is enabled at all so it won't help

> all that much immediately, alas.

> 

> With that TARGET_POWER10 condition fixed: okay for trunk, and for 11

> once it is tested for trunk on all systems.  Thanks!


Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797
Segher Boessenkool June 23, 2021, 11:56 p.m. | #4
Hi!

On Thu, Jun 17, 2021 at 03:18:48PM -0400, Michael Meissner wrote:
> > The actual insns only check TARGET_POWER10 (so no TARGET_FLOAT128_HW).

> > Which is right, this or that?

> 

> It should include TARGET_FLOAT128_HW.


Okay, so fix that :-)

> The problem area is a power10 running in

> big endian mode and running 32-bit code.  Because we don't have TImode, we

> can't enable the IEEE 128-bit hardware instructions.


I don't see why not?

> > > +/* { dg-require-effective-target ppc_float128_hw } */

> > > +/* { dg-require-effective-target power10_ok } */

> > > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */

> > 

> > In testcases we can assume that float128_hw is set whenever we have a

> > p10; we don't manually disable it to make live hard for ourselves ;-)

> 

> Again, I put it in case somebody builds a BE power10 compiler.


This should still be fixed.  And yes, people do test BE p10, of course.
And BE p10 *should* enable the QP float insns.  Does it not currently?


Segher
Martin Sebor via Gcc-patches June 28, 2021, 7 p.m. | #5
On Wed, Jun 23, 2021 at 06:56:37PM -0500, Segher Boessenkool wrote:
> Hi!

> 

> On Thu, Jun 17, 2021 at 03:18:48PM -0400, Michael Meissner wrote:

> > > The actual insns only check TARGET_POWER10 (so no TARGET_FLOAT128_HW).

> > > Which is right, this or that?

> > 

> > It should include TARGET_FLOAT128_HW.

> 

> Okay, so fix that :-)



> > The problem area is a power10 running in

> > big endian mode and running 32-bit code.  Because we don't have TImode, we

> > can't enable the IEEE 128-bit hardware instructions.

> 

> I don't see why not?

> 

> > > > +/* { dg-require-effective-target ppc_float128_hw } */

> > > > +/* { dg-require-effective-target power10_ok } */

> > > > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */

> > > 

> > > In testcases we can assume that float128_hw is set whenever we have a

> > > p10; we don't manually disable it to make live hard for ourselves ;-)

> > 

> > Again, I put it in case somebody builds a BE power10 compiler.

> 

> This should still be fixed.  And yes, people do test BE p10, of course.

> And BE p10 *should* enable the QP float insns.  Does it not currently?


GCC does not enable __float128 by default on BE.  The reason is there are no
plans to enable all of the float128 support in glibc in BE.  Without a library,
it is kind of useless to enable __float128.

If the compiler enabled __float128, It breaks things that check if __float128
is avaiable.  They think __float128 is available, and then they fail when when
they can't anything besides basic arithmetic.

Because the compiler is configured not to enable __float128 in a BE context, we
don't build the __float128 emulator in libgcc.

In addition, BE GCC runs on things that does not have GLIBC (like AIX).  If we
enabled it by default, it would break those environments.

A further complication is BE by default is still power4 or power5.  You need
VSX support to even pass __float128 arguments.  While it is possible to pass
__float128 in GPRs, you run into compatibility issues if one module is compiled
with VSX and another is compiled without setting a base cpu, because one module
will expect things in GPRs and the other in Altivec registers.

And as I've said, the issue with 32-bit move is we don't have TImode support.
Some of the machine indepenent passes want to use an appropriate integer type
to move basic types.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797
Segher Boessenkool June 30, 2021, 5:35 p.m. | #6
On Mon, Jun 28, 2021 at 03:00:02PM -0400, Michael Meissner wrote:
> On Wed, Jun 23, 2021 at 06:56:37PM -0500, Segher Boessenkool wrote:

> > > The problem area is a power10 running in

> > > big endian mode and running 32-bit code.  Because we don't have TImode, we

> > > can't enable the IEEE 128-bit hardware instructions.

> > 

> > I don't see why not?


And this is still true, and the core of the problem here.  Please show
any argument for this?

> > > > > +/* { dg-require-effective-target ppc_float128_hw } */

> > > > > +/* { dg-require-effective-target power10_ok } */

> > > > > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */

> > > > 

> > > > In testcases we can assume that float128_hw is set whenever we have a

> > > > p10; we don't manually disable it to make live hard for ourselves ;-)

> > > 

> > > Again, I put it in case somebody builds a BE power10 compiler.

> > 

> > This should still be fixed.  And yes, people do test BE p10, of course.

> > And BE p10 *should* enable the QP float insns.  Does it not currently?

> 

> GCC does not enable __float128 by default on BE.


But it does enable _Float128, as it should, and it does work.

> The reason is there are no

> plans to enable all of the float128 support in glibc in BE.  Without a library,

> it is kind of useless to enable __float128.


This is fundamentally wrong.  GCC is a compiler.  It is used without
libraries often (some applications do not want the standard libraries
for a reason, some implement them themselves, some *are* the standard
libraries).  And you can have a lot of useful code without using libm.

> If the compiler enabled __float128, It breaks things that check if __float128

> is avaiable.  They think __float128 is available, and then they fail when when

> they can't anything besides basic arithmetic.


So?  That would be the *correct* behaviour.

> Because the compiler is configured not to enable __float128 in a BE context, we

> don't build the __float128 emulator in libgcc.


Yes, another imperfection.

> In addition, BE GCC runs on things that does not have GLIBC (like AIX).  If we

> enabled it by default, it would break those environments.


How so?  Not anymore than you do now, you cannot use *any* QP float
with the status quo.

> A further complication is BE by default is still power4 or power5.  You need

> VSX support to even pass __float128 arguments.  While it is possible to pass

> __float128 in GPRs, you run into compatibility issues if one module is compiled

> with VSX and another is compiled without setting a base cpu, because one module

> will expect things in GPRs and the other in Altivec registers.


Yes, allowing QP float before p8 would solve a lot more problems, as I
have told you very often; but it is independent of this.  You need p9
to have the machine insns, and you can compile code that needs the
libgcc soft-float emulation functions for QP on power7 already (it needs
basic VSX only, and should be okay with only VMX even).

> And as I've said, the issue with 32-bit move is we don't have TImode support.

> Some of the machine indepenent passes want to use an appropriate integer type

> to move basic types.


So why does it work fine with double-double?

Please show an example.


Segher

Patch

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b01bb5c8191..1651788df6a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16103,7 +16103,8 @@  rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx op0, rtx op1)
   /* VSX/altivec have direct min/max insns.  */
   if ((code == SMAX || code == SMIN)
       && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
-	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))))
+	  || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))
+	  || (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))))
     {
       emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1)));
       return;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3f59b544f6a..064c3a2d9d6 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5214,6 +5214,17 @@  (define_insn "*s<minmax><mode>3_vsx"
 }
   [(set_attr "type" "fp")])
 
+;; Min/max for ISA 3.1 IEEE 128-bit floating point
+(define_insn "s<minmax><mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(fp_minmax:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "xs<minmax>cqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")
+   (set_attr "size" "128")])
+
 ;; The conditional move instructions allow us to perform max and min operations
 ;; even when we don't have the appropriate max/min instruction using the FSEL
 ;; instruction.
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
new file mode 100644
index 00000000000..c71ba08c9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
@@ -0,0 +1,15 @@ 
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a
+   call.  */
+TYPE f128_min (TYPE a, TYPE b) { return __builtin_fminf128 (a, b); }
+TYPE f128_max (TYPE a, TYPE b) { return __builtin_fmaxf128 (a, b); }
+
+/* { dg-final { scan-assembler {\mxsmaxcqp\M} } } */
+/* { dg-final { scan-assembler {\mxsmincqp\M} } } */