x86: Correct EVEX vector load/store optimization

Message ID CAMe9rOqe1z6ESMcF_kFkaJLN4j9HBvJf2POVCXHsQxic5dzeTQ@mail.gmail.com
State New
Headers show
Series
  • x86: Correct EVEX vector load/store optimization
Related show

Commit Message

H.J. Lu March 19, 2019, 6:20 a.m.
On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote:
>

> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote:

> > --- a/gas/config/tc-i386.c

> > +++ b/gas/config/tc-i386.c

> > @@ -4075,6 +4075,56 @@ optimize_encoding (void)

> >           i.types[j].bitfield.ymmword = 0;

> >         }

> >      }

> > +  else if ((cpu_arch_flags.bitfield.cpuavx

> > +         || cpu_arch_isa_flags.bitfield.cpuavx)

>

> Once again a questionable condition, as per earlier replies to

> other patches of yours.


Fixed.

> > +        && i.vec_encoding != vex_encoding_evex

> > +        && !i.types[0].bitfield.zmmword

> > +        && !i.mask

> > +        && is_evex_encoding (&i.tm)

> > +        && (i.tm.base_opcode == 0x666f

> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f

> > +            || i.tm.base_opcode == 0xf36f

> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f

> > +            || i.tm.base_opcode == 0xf26f

> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

>

> All three of these can be expressed with just a single comparison,

> using & or | instead of ^ and (if necessary) adjusting the literal

> value compared against.


Fixed.

> > +        && i.tm.extension_opcode == None)

> > +    {

> > +      /* Optimize: -O1:

> > +        VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16,

> > +        vmovdqu32 and vmovdqu64:

> > +          EVEX VOP %xmmM, %xmmN

> > +            -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16)

> > +          EVEX VOP %ymmM, %ymmN

> > +            -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16)

> > +          EVEX VOP %xmmM, mem

> > +            -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16)

> > +          EVEX VOP %ymmM, mem

> > +            -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16)

> > +          EVEX VOP mem, %xmmN

> > +            -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16)

>

> There's some confusion on this line.

>

> > +          EVEX VOP mem, %ymmN

> > +            -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)

> > +       */

>

> For the variants with a memory operand I doubt the conversion

> is always a win, and it may be against the user request in case of

> -Os. This is because of the Disp8 scaling the EVEX encoding permits.


Fixed.

> > +      if (i.tm.base_opcode == 0xf26f)

> > +     i.tm.base_opcode = 0xf36f;

> > +      else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

> > +     i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;

>

> This again can be expressed without "else if()" afaict.

>


Fixed.

Here is the patch.

Thanks.

-- 
H.J.

Comments

Jan Beulich March 19, 2019, 8:30 a.m. | #1
>>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote:

> On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote:

>>

>> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote:

>> > --- a/gas/config/tc-i386.c

>> > +++ b/gas/config/tc-i386.c

>> > @@ -4075,6 +4075,56 @@ optimize_encoding (void)

>> >           i.types[j].bitfield.ymmword = 0;

>> >         }

>> >      }

>> > +  else if ((cpu_arch_flags.bitfield.cpuavx

>> > +         || cpu_arch_isa_flags.bitfield.cpuavx)

>>

>> Once again a questionable condition, as per earlier replies to

>> other patches of yours.

> 

> Fixed.

> 

>> > +        && i.vec_encoding != vex_encoding_evex

>> > +        && !i.types[0].bitfield.zmmword

>> > +        && !i.mask

>> > +        && is_evex_encoding (&i.tm)

>> > +        && (i.tm.base_opcode == 0x666f

>> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f

>> > +            || i.tm.base_opcode == 0xf36f

>> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f

>> > +            || i.tm.base_opcode == 0xf26f

>> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

>>

>> All three of these can be expressed with just a single comparison,

>> using & or | instead of ^ and (if necessary) adjusting the literal

>> value compared against.

> 

> Fixed.

> 

>> > +        && i.tm.extension_opcode == None)

>> > +    {

>> > +      /* Optimize: -O1:

>> > +        VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16,

>> > +        vmovdqu32 and vmovdqu64:

>> > +          EVEX VOP %xmmM, %xmmN

>> > +            -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16)

>> > +          EVEX VOP %ymmM, %ymmN

>> > +            -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16)

>> > +          EVEX VOP %xmmM, mem

>> > +            -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16)

>> > +          EVEX VOP %ymmM, mem

>> > +            -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16)

>> > +          EVEX VOP mem, %xmmN

>> > +            -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16)

>>

>> There's some confusion on this line.

>>

>> > +          EVEX VOP mem, %ymmN

>> > +            -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)

>> > +       */

>>

>> For the variants with a memory operand I doubt the conversion

>> is always a win, and it may be against the user request in case of

>> -Os. This is because of the Disp8 scaling the EVEX encoding permits.

> 

> Fixed.

> 

>> > +      if (i.tm.base_opcode == 0xf26f)

>> > +     i.tm.base_opcode = 0xf36f;

>> > +      else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

>> > +     i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;

>>

>> This again can be expressed without "else if()" afaict.

>>

> 

> Fixed.

> 

> Here is the patch.


Thanks.

>--- a/gas/config/tc-i386.c

>+++ b/gas/config/tc-i386.c

>@@ -4068,18 +4068,14 @@ optimize_encoding (void)

> 	    i.types[j].bitfield.ymmword = 0;

> 	  }

>     }

>-  else if ((cpu_arch_flags.bitfield.cpuavx

>-	    || cpu_arch_isa_flags.bitfield.cpuavx)

>-	   && i.vec_encoding != vex_encoding_evex

>+  else if (i.vec_encoding != vex_encoding_evex

> 	   && !i.types[0].bitfield.zmmword


Ah, here the remaining cpuavx goes away as well.

>+      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)

>+	{

>+	  i.tm.base_opcode &= Opcode_SIMD_IntD;

>+	  i.tm.base_opcode |= 0xf36f;

>+	}


How about the even simpler

      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
	i.tm.base_opcode ^= 0xf36f ^ 0xf26f;

?

Jan
H.J. Lu March 19, 2019, 8:48 a.m. | #2
On Tue, Mar 19, 2019 at 4:30 PM Jan Beulich <JBeulich@suse.com> wrote:
>

> >>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote:

> > On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote:

> >>

> >> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote:

> >> > --- a/gas/config/tc-i386.c

> >> > +++ b/gas/config/tc-i386.c

> >> > @@ -4075,6 +4075,56 @@ optimize_encoding (void)

> >> >           i.types[j].bitfield.ymmword = 0;

> >> >         }

> >> >      }

> >> > +  else if ((cpu_arch_flags.bitfield.cpuavx

> >> > +         || cpu_arch_isa_flags.bitfield.cpuavx)

> >>

> >> Once again a questionable condition, as per earlier replies to

> >> other patches of yours.

> >

> > Fixed.

> >

> >> > +        && i.vec_encoding != vex_encoding_evex

> >> > +        && !i.types[0].bitfield.zmmword

> >> > +        && !i.mask

> >> > +        && is_evex_encoding (&i.tm)

> >> > +        && (i.tm.base_opcode == 0x666f

> >> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f

> >> > +            || i.tm.base_opcode == 0xf36f

> >> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f

> >> > +            || i.tm.base_opcode == 0xf26f

> >> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

> >>

> >> All three of these can be expressed with just a single comparison,

> >> using & or | instead of ^ and (if necessary) adjusting the literal

> >> value compared against.

> >

> > Fixed.

> >

> >> > +        && i.tm.extension_opcode == None)

> >> > +    {

> >> > +      /* Optimize: -O1:

> >> > +        VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16,

> >> > +        vmovdqu32 and vmovdqu64:

> >> > +          EVEX VOP %xmmM, %xmmN

> >> > +            -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16)

> >> > +          EVEX VOP %ymmM, %ymmN

> >> > +            -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16)

> >> > +          EVEX VOP %xmmM, mem

> >> > +            -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16)

> >> > +          EVEX VOP %ymmM, mem

> >> > +            -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16)

> >> > +          EVEX VOP mem, %xmmN

> >> > +            -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16)

> >>

> >> There's some confusion on this line.

> >>

> >> > +          EVEX VOP mem, %ymmN

> >> > +            -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)

> >> > +       */

> >>

> >> For the variants with a memory operand I doubt the conversion

> >> is always a win, and it may be against the user request in case of

> >> -Os. This is because of the Disp8 scaling the EVEX encoding permits.

> >

> > Fixed.

> >

> >> > +      if (i.tm.base_opcode == 0xf26f)

> >> > +     i.tm.base_opcode = 0xf36f;

> >> > +      else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

> >> > +     i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;

> >>

> >> This again can be expressed without "else if()" afaict.

> >>

> >

> > Fixed.

> >

> > Here is the patch.

>

> Thanks.

>

> >--- a/gas/config/tc-i386.c

> >+++ b/gas/config/tc-i386.c

> >@@ -4068,18 +4068,14 @@ optimize_encoding (void)

> >           i.types[j].bitfield.ymmword = 0;

> >         }

> >     }

> >-  else if ((cpu_arch_flags.bitfield.cpuavx

> >-          || cpu_arch_isa_flags.bitfield.cpuavx)

> >-         && i.vec_encoding != vex_encoding_evex

> >+  else if (i.vec_encoding != vex_encoding_evex

> >          && !i.types[0].bitfield.zmmword

>

> Ah, here the remaining cpuavx goes away as well.

>

> >+      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)

> >+      {

> >+        i.tm.base_opcode &= Opcode_SIMD_IntD;

> >+        i.tm.base_opcode |= 0xf36f;

> >+      }

>

> How about the even simpler

>

>       if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)

>         i.tm.base_opcode ^= 0xf36f ^ 0xf26f;

>


It works.

I am going to check in this patch together with other 2.

Thanks.

-- 
H.J.
From 177fca87fa53139e3a409876c0d9333e6b33780c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 19 Mar 2019 10:56:39 +0800
Subject: [PATCH] x86: Correct EVEX vector load/store optimization

Update EVEX vector load/store optimization:

1. There is no need to check AVX since AVX2 is required for AVX512F.
2. We need to check both operands for ZMM register since AT&T syntax
may not set zmmword on the first operand.
3. Update Opcode_SIMD_IntD check and set.
4. Since the VEX prefix has 2 or 3 bytes, the EVEX prefix has 4 bytes,
EVEX Disp8 has 1 byte and VEX Disp32 has 4 bytes, we choose EVEX Disp8
over VEX Disp32.

	* config/tc-i386.c (optimize_encoding): Don't check AVX for
	EVEX vector load/store optimization.  Check both operands for
	ZMM register.  Update EVEX vector load/store opcode check.
	Choose EVEX Disp8 over VEX Disp32.
	* testsuite/gas/i386/optimize-1.d: Updated.
	* testsuite/gas/i386/optimize-1a.d: Likewise.
	* testsuite/gas/i386/optimize-2.d: Likewise.
	* testsuite/gas/i386/optimize-4.d: Likewise.
	* testsuite/gas/i386/optimize-5.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-2.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-2a.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-2b.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-3.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-5.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-6.d: Likewise.
	* testsuite/gas/i386/optimize-1.s: Add ZMM register load
	test.
	* testsuite/gas/i386/x86-64-optimize-2.s: Likewise.
---
 gas/config/tc-i386.c                        | 43 ++++++++++++++-------
 gas/testsuite/gas/i386/optimize-1.d         | 25 ++++++------
 gas/testsuite/gas/i386/optimize-1.s         |  2 +
 gas/testsuite/gas/i386/optimize-1a.d        | 25 ++++++------
 gas/testsuite/gas/i386/optimize-2.d         | 24 ++++++------
 gas/testsuite/gas/i386/optimize-4.d         | 25 ++++++------
 gas/testsuite/gas/i386/optimize-5.d         | 25 ++++++------
 gas/testsuite/gas/i386/x86-64-optimize-2.d  | 25 ++++++------
 gas/testsuite/gas/i386/x86-64-optimize-2.s  |  2 +
 gas/testsuite/gas/i386/x86-64-optimize-2a.d | 25 ++++++------
 gas/testsuite/gas/i386/x86-64-optimize-2b.d | 25 ++++++------
 gas/testsuite/gas/i386/x86-64-optimize-3.d  | 24 ++++++------
 gas/testsuite/gas/i386/x86-64-optimize-5.d  | 25 ++++++------
 gas/testsuite/gas/i386/x86-64-optimize-6.d  | 25 ++++++------
 14 files changed, 175 insertions(+), 145 deletions(-)

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 3885728de7..690fd23ff0 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -4068,18 +4068,14 @@ optimize_encoding (void)
 	    i.types[j].bitfield.ymmword = 0;
 	  }
     }
-  else if ((cpu_arch_flags.bitfield.cpuavx
-	    || cpu_arch_isa_flags.bitfield.cpuavx)
-	   && i.vec_encoding != vex_encoding_evex
+  else if (i.vec_encoding != vex_encoding_evex
 	   && !i.types[0].bitfield.zmmword
+	   && !i.types[1].bitfield.zmmword
 	   && !i.mask
 	   && is_evex_encoding (&i.tm)
-	   && (i.tm.base_opcode == 0x666f
-	       || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f
-	       || i.tm.base_opcode == 0xf36f
-	       || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f
-	       || i.tm.base_opcode == 0xf26f
-	       || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
+	   && ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0x666f
+	       || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf36f
+	       || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
 	   && i.tm.extension_opcode == None)
     {
       /* Optimize: -O1:
@@ -4098,10 +4094,31 @@ optimize_encoding (void)
 	     EVEX VOP mem, %ymmN
 	       -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)
        */
-      if (i.tm.base_opcode == 0xf26f)
-	i.tm.base_opcode = 0xf36f;
-      else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
-	i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;
+      for (j = 0; j < 2; j++)
+	if (operand_type_check (i.types[j], disp)
+	    && i.op[j].disps->X_op == O_constant)
+	  {
+	    /* Since the VEX prefix has 2 or 3 bytes, the EVEX prefix
+	       has 4 bytes, EVEX Disp8 has 1 byte and VEX Disp32 has 4
+	       bytes, we choose EVEX Disp8 over VEX Disp32.  */
+	    int evex_disp8, vex_disp8;
+	    unsigned int memshift = i.memshift;
+	    offsetT n = i.op[j].disps->X_add_number;
+
+	    evex_disp8 = fits_in_disp8 (n);
+	    i.memshift = 0;
+	    vex_disp8 = fits_in_disp8 (n);
+	    if (evex_disp8 != vex_disp8)
+	      {
+		i.memshift = memshift;
+		return;
+	      }
+
+	    i.types[j].bitfield.disp8 = vex_disp8;
+	    break;
+	  }
+      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
+	i.tm.base_opcode ^= 0xf36f ^ 0xf26f;
       i.tm.opcode_modifier.vex
 	= i.types[0].bitfield.ymmword ? VEX256 : VEX128;
       i.tm.opcode_modifier.vexw = VEXW0;
diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d
index 70c802c002..2f40c72a4e 100644
--- a/gas/testsuite/gas/i386/optimize-1.d
+++ b/gas/testsuite/gas/i386/optimize-1.d
@@ -74,12 +74,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -92,10 +92,11 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/optimize-1.s b/gas/testsuite/gas/i386/optimize-1.s
index 6dcfbc2799..4c15d16c2a 100644
--- a/gas/testsuite/gas/i386/optimize-1.s
+++ b/gas/testsuite/gas/i386/optimize-1.s
@@ -114,3 +114,5 @@ _start:
 	vmovdqu16	%ymm1, 128(%eax)
 	vmovdqu32	%ymm1, 128(%eax)
 	vmovdqu64	%ymm1, 128(%eax)
+
+	vmovdqa32	(%eax), %zmm2
diff --git a/gas/testsuite/gas/i386/optimize-1a.d b/gas/testsuite/gas/i386/optimize-1a.d
index cee2383d84..d7c253a6fa 100644
--- a/gas/testsuite/gas/i386/optimize-1a.d
+++ b/gas/testsuite/gas/i386/optimize-1a.d
@@ -75,12 +75,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -93,10 +93,11 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/optimize-2.d b/gas/testsuite/gas/i386/optimize-2.d
index 19467f5c01..ed61dec6fa 100644
--- a/gas/testsuite/gas/i386/optimize-2.d
+++ b/gas/testsuite/gas/i386/optimize-2.d
@@ -29,12 +29,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -47,12 +47,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
  +[a-f0-9]+:	62 f1 7d 48 6f d1    	vmovdqa32 %zmm1,%zmm2
  +[a-f0-9]+:	62 f1 fd 48 6f d1    	vmovdqa64 %zmm1,%zmm2
  +[a-f0-9]+:	62 f1 7f 48 6f d1    	vmovdqu8 %zmm1,%zmm2
diff --git a/gas/testsuite/gas/i386/optimize-4.d b/gas/testsuite/gas/i386/optimize-4.d
index 2df84654d6..f062ad7717 100644
--- a/gas/testsuite/gas/i386/optimize-4.d
+++ b/gas/testsuite/gas/i386/optimize-4.d
@@ -74,12 +74,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -92,12 +92,13 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
 #pass
diff --git a/gas/testsuite/gas/i386/optimize-5.d b/gas/testsuite/gas/i386/optimize-5.d
index ecc1ab139a..fdf5561af8 100644
--- a/gas/testsuite/gas/i386/optimize-5.d
+++ b/gas/testsuite/gas/i386/optimize-5.d
@@ -74,12 +74,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -92,12 +92,13 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 7d 28 6f d1    	vmovdqa32 %ymm1,%ymm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d
index 067df076f7..45b98ae694 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,10 +148,11 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.s b/gas/testsuite/gas/i386/x86-64-optimize-2.s
index 1275610e55..e5d298225a 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2.s
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2.s
@@ -170,3 +170,5 @@ _start:
 	vmovdqu16	%ymm1, 128(%rax)
 	vmovdqu32	%ymm1, 128(%rax)
 	vmovdqu64	%ymm1, 128(%rax)
+
+	vmovdqa32	(%rax), %zmm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2a.d b/gas/testsuite/gas/i386/x86-64-optimize-2a.d
index 532a1458bc..39385b96ec 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2a.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2a.d
@@ -125,12 +125,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -149,10 +149,11 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2b.d b/gas/testsuite/gas/i386/x86-64-optimize-2b.d
index 09474a1016..3eb3a59eac 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2b.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2b.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,10 +148,11 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.d b/gas/testsuite/gas/i386/x86-64-optimize-3.d
index 74336a4fe2..5e2832df4c 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-3.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d
@@ -43,12 +43,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -67,12 +67,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
  +[a-f0-9]+:	62 b1 7d 08 6f d5    	vmovdqa32 %xmm21,%xmm2
  +[a-f0-9]+:	62 b1 fd 08 6f d5    	vmovdqa64 %xmm21,%xmm2
  +[a-f0-9]+:	62 b1 7f 08 6f d5    	vmovdqu8 %xmm21,%xmm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.d b/gas/testsuite/gas/i386/x86-64-optimize-5.d
index 012237df57..5065d650d4 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-5.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-5.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,12 +148,13 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 7d 28 6f d1    	vmovdqa32 %ymm1,%ymm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.d b/gas/testsuite/gas/i386/x86-64-optimize-6.d
index aca119e4f9..8ebd9b2475 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-6.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-6.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,12 +148,13 @@ Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 7d 28 6f d1    	vmovdqa32 %ymm1,%ymm2
Jan Beulich March 19, 2019, 8:52 a.m. | #3
>>> On 19.03.19 at 09:48, <hjl.tools@gmail.com> wrote:

> On Tue, Mar 19, 2019 at 4:30 PM Jan Beulich <JBeulich@suse.com> wrote:

>>

>> >>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote:

>> > On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote:

>> >>

>> >> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote:

>> >> > --- a/gas/config/tc-i386.c

>> >> > +++ b/gas/config/tc-i386.c

>> >> > @@ -4075,6 +4075,56 @@ optimize_encoding (void)

>> >> >           i.types[j].bitfield.ymmword = 0;

>> >> >         }

>> >> >      }

>> >> > +  else if ((cpu_arch_flags.bitfield.cpuavx

>> >> > +         || cpu_arch_isa_flags.bitfield.cpuavx)

>> >>

>> >> Once again a questionable condition, as per earlier replies to

>> >> other patches of yours.

>> >

>> > Fixed.

>> >

>> >> > +        && i.vec_encoding != vex_encoding_evex

>> >> > +        && !i.types[0].bitfield.zmmword

>> >> > +        && !i.mask

>> >> > +        && is_evex_encoding (&i.tm)

>> >> > +        && (i.tm.base_opcode == 0x666f

>> >> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f

>> >> > +            || i.tm.base_opcode == 0xf36f

>> >> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f

>> >> > +            || i.tm.base_opcode == 0xf26f

>> >> > +            || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

>> >>

>> >> All three of these can be expressed with just a single comparison,

>> >> using & or | instead of ^ and (if necessary) adjusting the literal

>> >> value compared against.

>> >

>> > Fixed.

>> >

>> >> > +        && i.tm.extension_opcode == None)

>> >> > +    {

>> >> > +      /* Optimize: -O1:

>> >> > +        VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16,

>> >> > +        vmovdqu32 and vmovdqu64:

>> >> > +          EVEX VOP %xmmM, %xmmN

>> >> > +            -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16)

>> >> > +          EVEX VOP %ymmM, %ymmN

>> >> > +            -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16)

>> >> > +          EVEX VOP %xmmM, mem

>> >> > +            -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16)

>> >> > +          EVEX VOP %ymmM, mem

>> >> > +            -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16)

>> >> > +          EVEX VOP mem, %xmmN

>> >> > +            -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16)

>> >>

>> >> There's some confusion on this line.

>> >>

>> >> > +          EVEX VOP mem, %ymmN

>> >> > +            -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)

>> >> > +       */

>> >>

>> >> For the variants with a memory operand I doubt the conversion

>> >> is always a win, and it may be against the user request in case of

>> >> -Os. This is because of the Disp8 scaling the EVEX encoding permits.

>> >

>> > Fixed.

>> >

>> >> > +      if (i.tm.base_opcode == 0xf26f)

>> >> > +     i.tm.base_opcode = 0xf36f;

>> >> > +      else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)

>> >> > +     i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;

>> >>

>> >> This again can be expressed without "else if()" afaict.

>> >>

>> >

>> > Fixed.

>> >

>> > Here is the patch.

>>

>> Thanks.

>>

>> >--- a/gas/config/tc-i386.c

>> >+++ b/gas/config/tc-i386.c

>> >@@ -4068,18 +4068,14 @@ optimize_encoding (void)

>> >           i.types[j].bitfield.ymmword = 0;

>> >         }

>> >     }

>> >-  else if ((cpu_arch_flags.bitfield.cpuavx

>> >-          || cpu_arch_isa_flags.bitfield.cpuavx)

>> >-         && i.vec_encoding != vex_encoding_evex

>> >+  else if (i.vec_encoding != vex_encoding_evex

>> >          && !i.types[0].bitfield.zmmword

>>

>> Ah, here the remaining cpuavx goes away as well.

>>

>> >+      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)

>> >+      {

>> >+        i.tm.base_opcode &= Opcode_SIMD_IntD;

>> >+        i.tm.base_opcode |= 0xf36f;

>> >+      }

>>

>> How about the even simpler

>>

>>       if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)

>>         i.tm.base_opcode ^= 0xf36f ^ 0xf26f;

>>

> 

> It works.

> 

> I am going to check in this patch together with other 2.

> 

> Thanks.


Thank you as well.

Jan

Patch

From 84ecabf0624411c1ab95bfadbd864aa4b226b2e8 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 19 Mar 2019 10:56:39 +0800
Subject: [PATCH] x86: Correct EVEX vector load/store optimization

Update EVEX vector load/store optimization:

1. There is no need to check AVX since AVX2 is required for AVX512F.
2. We need to check both operands for ZMM register since AT&T syntax
may not set zmmword on the first operand.
3. Update Opcode_SIMD_IntD check and set.
4. Since the VEX prefix has 2 or 3 bytes, the EVEX prefix has 4 bytes,
EVEX Disp8 has 1 byte and VEX Disp32 has 4 bytes, we choose EVEX Disp8
over VEX Disp32.

	* config/tc-i386.c (optimize_encoding): Don't check AVX for
	EVEX vector load/store optimization.  Check both operands for
	ZMM register.  Update EVEX vector load/store opcode check.
	Choose EVEX Disp8 over VEX Disp32.
	* testsuite/gas/i386/optimize-1.d: Updated.
	* testsuite/gas/i386/optimize-1a.d: Likewise.
	* testsuite/gas/i386/optimize-2.d: Likewise.
	* testsuite/gas/i386/optimize-4.d: Likewise.
	* testsuite/gas/i386/optimize-5.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-2.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-2a.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-2b.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-3.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-5.d: Likewise.
	* testsuite/gas/i386/x86-64-optimize-6.d: Likewise.
	* testsuite/gas/i386/optimize-1.s: Add ZMM register load
	test.
	* testsuite/gas/i386/x86-64-optimize-2.s: Likewise.
---
 gas/config/tc-i386.c                        | 46 +++++++++++++++------
 gas/testsuite/gas/i386/optimize-1.d         | 25 +++++------
 gas/testsuite/gas/i386/optimize-1.s         |  2 +
 gas/testsuite/gas/i386/optimize-1a.d        | 25 +++++------
 gas/testsuite/gas/i386/optimize-2.d         | 24 +++++------
 gas/testsuite/gas/i386/optimize-4.d         | 25 +++++------
 gas/testsuite/gas/i386/optimize-5.d         | 25 +++++------
 gas/testsuite/gas/i386/x86-64-optimize-2.d  | 25 +++++------
 gas/testsuite/gas/i386/x86-64-optimize-2.s  |  2 +
 gas/testsuite/gas/i386/x86-64-optimize-2a.d | 25 +++++------
 gas/testsuite/gas/i386/x86-64-optimize-2b.d | 25 +++++------
 gas/testsuite/gas/i386/x86-64-optimize-3.d  | 24 +++++------
 gas/testsuite/gas/i386/x86-64-optimize-5.d  | 25 +++++------
 gas/testsuite/gas/i386/x86-64-optimize-6.d  | 25 +++++------
 14 files changed, 178 insertions(+), 145 deletions(-)

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 3885728de7..3447fe0fa3 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -4068,18 +4068,14 @@  optimize_encoding (void)
 	    i.types[j].bitfield.ymmword = 0;
 	  }
     }
-  else if ((cpu_arch_flags.bitfield.cpuavx
-	    || cpu_arch_isa_flags.bitfield.cpuavx)
-	   && i.vec_encoding != vex_encoding_evex
+  else if (i.vec_encoding != vex_encoding_evex
 	   && !i.types[0].bitfield.zmmword
+	   && !i.types[1].bitfield.zmmword
 	   && !i.mask
 	   && is_evex_encoding (&i.tm)
-	   && (i.tm.base_opcode == 0x666f
-	       || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f
-	       || i.tm.base_opcode == 0xf36f
-	       || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f
-	       || i.tm.base_opcode == 0xf26f
-	       || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
+	   && ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0x666f
+	       || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf36f
+	       || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
 	   && i.tm.extension_opcode == None)
     {
       /* Optimize: -O1:
@@ -4098,10 +4094,34 @@  optimize_encoding (void)
 	     EVEX VOP mem, %ymmN
 	       -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)
        */
-      if (i.tm.base_opcode == 0xf26f)
-	i.tm.base_opcode = 0xf36f;
-      else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
-	i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;
+      for (j = 0; j < 2; j++)
+	if (operand_type_check (i.types[j], disp)
+	    && i.op[j].disps->X_op == O_constant)
+	  {
+	    /* Since the VEX prefix has 2 or 3 bytes, the EVEX prefix
+	       has 4 bytes, EVEX Disp8 has 1 byte and VEX Disp32 has 4
+	       bytes, we choose EVEX Disp8 over VEX Disp32.  */
+	    int evex_disp8, vex_disp8;
+	    unsigned int memshift = i.memshift;
+	    offsetT n = i.op[j].disps->X_add_number;
+
+	    evex_disp8 = fits_in_disp8 (n);
+	    i.memshift = 0;
+	    vex_disp8 = fits_in_disp8 (n);
+	    if (evex_disp8 != vex_disp8)
+	      {
+		i.memshift = memshift;
+		return;
+	      }
+
+	    i.types[j].bitfield.disp8 = vex_disp8;
+	    break;
+	  }
+      if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
+	{
+	  i.tm.base_opcode &= Opcode_SIMD_IntD;
+	  i.tm.base_opcode |= 0xf36f;
+	}
       i.tm.opcode_modifier.vex
 	= i.types[0].bitfield.ymmword ? VEX256 : VEX128;
       i.tm.opcode_modifier.vexw = VEXW0;
diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d
index 70c802c002..2f40c72a4e 100644
--- a/gas/testsuite/gas/i386/optimize-1.d
+++ b/gas/testsuite/gas/i386/optimize-1.d
@@ -74,12 +74,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -92,10 +92,11 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/optimize-1.s b/gas/testsuite/gas/i386/optimize-1.s
index 6dcfbc2799..4c15d16c2a 100644
--- a/gas/testsuite/gas/i386/optimize-1.s
+++ b/gas/testsuite/gas/i386/optimize-1.s
@@ -114,3 +114,5 @@  _start:
 	vmovdqu16	%ymm1, 128(%eax)
 	vmovdqu32	%ymm1, 128(%eax)
 	vmovdqu64	%ymm1, 128(%eax)
+
+	vmovdqa32	(%eax), %zmm2
diff --git a/gas/testsuite/gas/i386/optimize-1a.d b/gas/testsuite/gas/i386/optimize-1a.d
index cee2383d84..d7c253a6fa 100644
--- a/gas/testsuite/gas/i386/optimize-1a.d
+++ b/gas/testsuite/gas/i386/optimize-1a.d
@@ -75,12 +75,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -93,10 +93,11 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/optimize-2.d b/gas/testsuite/gas/i386/optimize-2.d
index 19467f5c01..ed61dec6fa 100644
--- a/gas/testsuite/gas/i386/optimize-2.d
+++ b/gas/testsuite/gas/i386/optimize-2.d
@@ -29,12 +29,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -47,12 +47,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
  +[a-f0-9]+:	62 f1 7d 48 6f d1    	vmovdqa32 %zmm1,%zmm2
  +[a-f0-9]+:	62 f1 fd 48 6f d1    	vmovdqa64 %zmm1,%zmm2
  +[a-f0-9]+:	62 f1 7f 48 6f d1    	vmovdqu8 %zmm1,%zmm2
diff --git a/gas/testsuite/gas/i386/optimize-4.d b/gas/testsuite/gas/i386/optimize-4.d
index 2df84654d6..f062ad7717 100644
--- a/gas/testsuite/gas/i386/optimize-4.d
+++ b/gas/testsuite/gas/i386/optimize-4.d
@@ -74,12 +74,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -92,12 +92,13 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
 #pass
diff --git a/gas/testsuite/gas/i386/optimize-5.d b/gas/testsuite/gas/i386/optimize-5.d
index ecc1ab139a..fdf5561af8 100644
--- a/gas/testsuite/gas/i386/optimize-5.d
+++ b/gas/testsuite/gas/i386/optimize-5.d
@@ -74,12 +74,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%eax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -92,12 +92,13 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%eax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 7d 28 6f d1    	vmovdqa32 %ymm1,%ymm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d
index 067df076f7..45b98ae694 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d
@@ -124,12 +124,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,10 +148,11 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.s b/gas/testsuite/gas/i386/x86-64-optimize-2.s
index 1275610e55..e5d298225a 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2.s
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2.s
@@ -170,3 +170,5 @@  _start:
 	vmovdqu16	%ymm1, 128(%rax)
 	vmovdqu32	%ymm1, 128(%rax)
 	vmovdqu64	%ymm1, 128(%rax)
+
+	vmovdqa32	(%rax), %zmm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2a.d b/gas/testsuite/gas/i386/x86-64-optimize-2a.d
index 532a1458bc..39385b96ec 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2a.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2a.d
@@ -125,12 +125,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -149,10 +149,11 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2b.d b/gas/testsuite/gas/i386/x86-64-optimize-2b.d
index 09474a1016..3eb3a59eac 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2b.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2b.d
@@ -124,12 +124,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,10 +148,11 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.d b/gas/testsuite/gas/i386/x86-64-optimize-3.d
index 74336a4fe2..5e2832df4c 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-3.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d
@@ -43,12 +43,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -67,12 +67,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
  +[a-f0-9]+:	62 b1 7d 08 6f d5    	vmovdqa32 %xmm21,%xmm2
  +[a-f0-9]+:	62 b1 fd 08 6f d5    	vmovdqa64 %xmm21,%xmm2
  +[a-f0-9]+:	62 b1 7f 08 6f d5    	vmovdqu8 %xmm21,%xmm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.d b/gas/testsuite/gas/i386/x86-64-optimize-5.d
index 012237df57..5065d650d4 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-5.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-5.d
@@ -124,12 +124,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,12 +148,13 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 7d 28 6f d1    	vmovdqa32 %ymm1,%ymm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.d b/gas/testsuite/gas/i386/x86-64-optimize-6.d
index aca119e4f9..8ebd9b2475 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-6.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-6.d
@@ -124,12 +124,12 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
  +[a-f0-9]+:	c5 fa 6f 50 7f       	vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 f9 7f 88 80 00 00 00 	vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fa 7f 88 80 00 00 00 	vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 08 7f 48 08 	vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 08 7f 48 08 	vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 08 7f 48 08 	vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 08 7f 48 08 	vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 08 7f 48 08 	vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 08 7f 48 08 	vmovdqu64 %xmm1,0x80\(%rax\)
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fd 6f d1          	vmovdqa %ymm1,%ymm2
  +[a-f0-9]+:	c5 fe 6f d1          	vmovdqu %ymm1,%ymm2
@@ -148,12 +148,13 @@  Disassembly of section .text:
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
  +[a-f0-9]+:	c5 fe 6f 50 7f       	vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fd 7f 88 80 00 00 00 	vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+:	c5 fe 7f 88 80 00 00 00 	vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 28 7f 48 04 	vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fd 28 7f 48 04 	vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7f 28 7f 48 04 	vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 ff 28 7f 48 04 	vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7e 28 7f 48 04 	vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 fe 28 7f 48 04 	vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+:	62 f1 7d 48 6f 10    	vmovdqa32 \(%rax\),%zmm2
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 f5 08 55 e9    	vandnpd %xmm1,%xmm1,%xmm5
  +[a-f0-9]+:	62 f1 7d 28 6f d1    	vmovdqa32 %ymm1,%ymm2
-- 
2.20.1