[1/2,1/2] Enable Intel AVX512_FP16 instructions

Message ID 20210701074736.9534-2-lili.cui@intel.com
State New
Headers show
Series
  • Enable Intel AVX512_FP16 instructions and add tests for it
Related show

Commit Message

H.J. Lu via Binutils July 1, 2021, 7:47 a.m.
Intel AVX512 FP16 instructions use maps 3, 5 and 6. Maps 5 and 6 use 3 bits
in the EVEX.mmm field (0b101, 0b110). Map 5 is for instructions that were FP32
in map 1 (0Fxx). Map 6 is for instructions that were FP32 in map 2 (0F38xx).
There are some exceptions to this rule. Some things in map 1 (0Fxx) with imm8
operands predated our current conventions; those instructions moved to map 3.
FP32 things in map 3 (0F3Axx) found new opcodes in map3 for FP16 because map3
is very sparsely populated. Most of the FP16 instructions share opcodes and
prefix (EVEX.pp) bits with the related FP32 operations.

Intel AVX512 FP16 instructions has new displacements scaling rules, please refer
to the public software developer manual for detail information.

gas/

2021-07-01  Igor Tsimbalist  <igor.v.tsimbalist@intel.com>
            H.J. Lu  <hongjiu.lu@intel.com>
            Wei Xiao <wei3.xiao@intel.com>
            Lili Cui  <lili.cui@intel.com>

	* config/tc-i386.c (struct Broadcast_Operation): Adjust comment.
	(cpu_arch): Add .avx512_fp16.
	(cpu_noarch): Add noavx512_fp16.
	(pte): Add evexmap5 and evexmap6.
	(build_evex_prefix): Handle EVEXMAP5 and EVEXMAP6.
	(check_VecOperations): Handle {1to32}.
	* doc/c-i386.texi: Document avx512_fp16, noavx512_fp16.

opcodes/

2021-07-01  Igor Tsimbalist  <igor.v.tsimbalist@intel.com>
            H.J. Lu  <hongjiu.lu@intel.com>
            Wei Xiao <wei3.xiao@intel.com>
            Lili Cui  <lili.cui@intel.com>

	* opcodes/i386-dis.c (EXwScalarS): New.
	(EXxh): Ditto.
	(EXxhc): Ditto.
	(EXxmmqh): Ditto.
	(EXxmmqdh): Ditto.
	(EXEvexXwb): Ditto.
	(enum): Add evex_x_wb_mode, xh_mode, xhc_mode, xmmqh_mode,
	xmmqdh_mode and w_scalar_swap_mode.
	(enum): Add MOD_EVEX_MAP510_P_1_W_0 and MOD_EVEX_MAP511_P_1_W_0.
	(enum): Add PREFIX_EVEX_0F3A08_W_0, PREFIX_EVEX_0F3A0A_W_0,
	PREFIX_EVEX_0F3AC2_W_0, PREFIX_EVEX_MAP510, PREFIX_EVEX_MAP511,
	PREFIX_EVEX_MAP51D_W_0, PREFIX_EVEX_MAP52A, PREFIX_EVEX_MAP52C,
	PREFIX_EVEX_MAP52D, PREFIX_EVEX_MAP52E, PREFIX_EVEX_MAP52F,
	PREFIX_EVEX_MAP551_W_0, PREFIX_EVEX_MAP558_W_0, PREFIX_EVEX_MAP559_W_0,
	PREFIX_EVEX_MAP55A_W_0, PREFIX_EVEX_MAP55A_W_1, PREFIX_EVEX_MAP55B_W_0,
	PREFIX_EVEX_MAP55B_W_1, PREFIX_EVEX_MAP55C_W_0, PREFIX_EVEX_MAP55D_W_0,
	PREFIX_EVEX_MAP55E_W_0, PREFIX_EVEX_MAP55F_W_0, PREFIX_EVEX_MAP56E,
	PREFIX_EVEX_MAP578_W_0, PREFIX_EVEX_MAP578_W_1, PREFIX_EVEX_MAP579_W_0,
	PREFIX_EVEX_MAP579_W_1, PREFIX_EVEX_MAP57A_W_0, PREFIX_EVEX_MAP57A_W_1,
	PREFIX_EVEX_MAP57B_W_0, PREFIX_EVEX_MAP57B_W_1, PREFIX_EVEX_MAP57C_W_0,
	PREFIX_EVEX_MAP57D_W_0, PREFIX_EVEX_MAP57E, PREFIX_EVEX_MAP613_w_0,
	PREFIX_EVEX_MAP62C, PREFIX_EVEX_MAP62D, PREFIX_EVEX_MAP642,
	PREFIX_EVEX_MAP643, PREFIX_EVEX_MAP64C, PREFIX_EVEX_MAP64D,
	PREFIX_EVEX_MAP64E, PREFIX_EVEX_MAP64F, PREFIX_EVEX_MAP656_W_0,
	PREFIX_EVEX_MAP657_W_0, PREFIX_EVEX_MAP696, PREFIX_EVEX_MAP697,
	PREFIX_EVEX_MAP698, PREFIX_EVEX_MAP699, PREFIX_EVEX_MAP69A,
	PREFIX_EVEX_MAP69B, PREFIX_EVEX_MAP69C, PREFIX_EVEX_MAP69D,
	PREFIX_EVEX_MAP69E, PREFIX_EVEX_MAP69F, PREFIX_EVEX_MAP6A6,
	PREFIX_EVEX_MAP6A7, PREFIX_EVEX_MAP6A8, PREFIX_EVEX_MAP6A9,
	PREFIX_EVEX_MAP6AA, PREFIX_EVEX_MAP6AB, PREFIX_EVEX_MAP6AC,
	PREFIX_EVEX_MAP6AD, PREFIX_EVEX_MAP6AE, PREFIX_EVEX_MAP6AF,
	PREFIX_EVEX_MAP6B6, PREFIX_EVEX_MAP6B7, PREFIX_EVEX_MAP6B8,
	PREFIX_EVEX_MAP6B9, PREFIX_EVEX_MAP6BA, PREFIX_EVEX_MAP6BB,
	PREFIX_EVEX_MAP6BC, PREFIX_EVEX_MAP6BD, PREFIX_EVEX_MAP6BE,
	PREFIX_EVEX_MAP6BF, PREFIX_EVEX_MAP6D6_W_0 and PREFIX_EVEX_MAP6D7_W_0.
	(enum): Add EVEX_MAP5 and EVEX_MAP6.
	(enum): Add  EVEX_W_0F3A08, EVEX_W_0F3A0A, EVEX_W_0F3A26_P_0,
	EVEX_W_0F3A27_P_0, EVEX_W_0F3A56_P_0, EVEX_W_0F3A57_P_0,
	EVEX_W_0F3A66_P_0, EVEX_W_0F3A67_P_0, EVEX_W_0F3AC2,
	EVEX_W_MAP510_P_1, EVEX_W_MAP511_P_1, EVEX_W_MAP51D,
	EVEX_W_MAP52A_P_1, EVEX_W_MAP52C_P_1, EVEX_W_MAP52D_P_1,
	EVEX_W_MAP52E_P_0, EVEX_W_MAP52F_P_0, EVEX_W_MAP551_P_0,
	EVEX_W_MAP551, EVEX_W_MAP558, EVEX_W_MAP559,
	EVEX_W_MAP55A, EVEX_W_MAP55B, EVEX_W_MAP55C,
	EVEX_W_MAP55D, EVEX_W_MAP55E, EVEX_W_MAP55F,
	EVEX_W_MAP578, EVEX_W_MAP579, EVEX_W_MAP57A,
	EVEX_W_MAP57B, EVEX_W_MAP57C, EVEX_W_MAP57D,
	EVEX_W_MAP613, EVEX_W_MAP62C_P_2, EVEX_W_MAP62D_P_2,
	EVEX_W_MAP642_P_2, EVEX_W_MAP643_P_2, EVEX_W_MAP64C_P_2,
	EVEX_W_MAP64D_P_2, EVEX_W_MAP64E_P_2, EVEX_W_MAP64F_P_2,
	EVEX_W_MAP656, EVEX_W_MAP6D7, EVEX_W_MAP696_P_2,
	EVEX_W_MAP697_P_2, EVEX_W_MAP698_P_2, EVEX_W_MAP699_P_2,
	EVEX_W_MAP69A_P_2, EVEX_W_MAP69B_P_2, EVEX_W_MAP69C_P_2,
	EVEX_W_MAP69D_P_2, EVEX_W_MAP69E_P_2, EVEX_W_MAP69F_P_2,
	EVEX_W_MAP6A6_P_2, EVEX_W_MAP6A7_P_2, EVEX_W_MAP6A8_P_2,
	EVEX_W_MAP6A9_P_2, EVEX_W_MAP6AA_P_2, EVEX_W_MAP6AB_P_2,
	EVEX_W_MAP6AC_P_2, EVEX_W_MAP6AD_P_2, EVEX_W_MAP6AE_P_2,
	EVEX_W_MAP6AF_P_2, EVEX_W_MAP6B6_P_2, EVEX_W_MAP6B7_P_2,
	EVEX_W_MAP6B8_P_2, EVEX_W_MAP6B9_P_2, EVEX_W_MAP6BA_P_2,
	EVEX_W_MAP6BB_P_2, EVEX_W_MAP6BC_P_2, EVEX_W_MAP6BD_P_2,
	EVEX_W_MAP6BE_P_2, EVEX_W_MAP6BF_P_2, EVEX_W_MAP6D6_P_1,
	EVEX_W_MAP6D6_P_3, EVEX_W_MAP6D7_P_1 and EVEX_W_MAP6D7_P_3.
	(get_valid_dis386): Properly handle new instructions.
	(intel_operand_size): Handle new modes.
	(OP_E_memory): Ditto.
	(OP_EX): Ditto.
	* i386-dis-evex.h: Updated for AVX512_FP16.
	* i386-dis-evex-mod.h: Updated for AVX512_FP16.
	* i386-dis-evex-prefix.h: Updated for AVX512_FP16.
	* i386-dis-evex-reg.h : Updated for AVX512_FP16.
	* i386-dis-evex-w.h : Updated for AVX512_FP16.
	* i386-gen.c (cpu_flag_init): Add CPU_AVX512_FP16_FLAGS,
	CPU_ANY_AVX512_FP16_FLAGS. Update CPU_ANY_AVX512F_FLAGS.
	(cpu_flags): Add CpuAVX512_FP16.
	* i386-opc.h (enum): (AVX512_FP16): New.
	(i386_cpu_flags): Add cpuavx512_fp16.
	(EVEXMAP5): Defined as a macro.
	(EVEXMAP6): Ditto.
	* i386-opc.tbl: Add Intel AVX512_FP16 instructions.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Ditto.
---
 gas/config/tc-i386.c           |  15 +-
 gas/doc/c-i386.texi            |   4 +-
 opcodes/i386-dis-evex-mod.h    |  10 +
 opcodes/i386-dis-evex-prefix.h | 212 ++++++++++++
 opcodes/i386-dis-evex-w.h      | 442 +++++++++++++++++++++++-
 opcodes/i386-dis-evex.h        | 600 ++++++++++++++++++++++++++++++++-
 opcodes/i386-dis.c             | 263 ++++++++++++++-
 opcodes/i386-gen.c             |   7 +-
 opcodes/i386-opc.h             |   7 +
 opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++
 10 files changed, 1904 insertions(+), 32 deletions(-)

Comments

H.J. Lu via Binutils July 2, 2021, 1:42 p.m. | #1
On 01.07.2021 09:47, Cui,Lili wrote:
> ---

>  gas/config/tc-i386.c           |  15 +-

>  gas/doc/c-i386.texi            |   4 +-

>  opcodes/i386-dis-evex-mod.h    |  10 +

>  opcodes/i386-dis-evex-prefix.h | 212 ++++++++++++

>  opcodes/i386-dis-evex-w.h      | 442 +++++++++++++++++++++++-

>  opcodes/i386-dis-evex.h        | 600 ++++++++++++++++++++++++++++++++-

>  opcodes/i386-dis.c             | 263 ++++++++++++++-

>  opcodes/i386-gen.c             |   7 +-


The expansion of CPU_AVX512_FP16_FLAGS wants to use
CPU_AVX512BW_FLAGS instead of CPU_AVX512F_FLAGS. Similarly
CPU_ANY_AVX512BW_FLAGS wants to be altered to include
CPU_ANY_AVX512_FP16_FLAGS.

In the CPU flags enum (and related data) I wonder whether we wouldn't
better keep all AVX512* together. H.J. - do you have an opinion there
either way?

>  opcodes/i386-opc.h             |   7 +

>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++


May I suggest SpaceEVexMap{5,6} to become just EVexMap{5,6}?
The table entries are hard enough to read, being partly far over 200
columns. Any unambiguous shortening of names is a win imo.

You appear to be adding IgnoreSize to most (all?) SAE templates. May
I ask why that is? I've taken quite a bit of time over the last
couple of years to remove stray uses, and I'd really prefer if no
unneeded new ones got introduced.

Some SAE templates have Disp8MemShift settings, despite these not
allowing memory operands in the first place.

You seem to be using a mixture of new-style shorthands (e.g. VexW0)
and old-style unreadable forms (e.g. VexW=2). Please use only
new-style variants (where ones exist already).

While purely cosmetic, I'm puzzled here (and I've been puzzled by
many other similar pre-existing inconsistencies) by the mixture of
operand attribute ordering. I'd really appreciate if you could
standardize on one form in all of the additions here, e.g.

RegXMM|Dword|Unspecified|BaseIndex

(that is: register specifier(s) [of course also always in the same
order when there are multiple], memory specifiers, and BaseIndex in
this order). Getting this uniform (eventually) helps when searching
for patterns or when comparing templates.

There looks to be a redundant vcmpph template.

vcvtqq2ph* and vcvtuqq2ph* would imo better move up next to the
other v*2ph group (initially I was suspecting they might be missing
altogether). This then makes it easier to see their similarity with
vcvtpd2ph*.

vcvtph2pd may need to have their AVX512VL templates split: Having
e.g. Word|RegXMM|Dword means Word is the broadcast form and both
XmmWord and Dword can be used with non-broadcast memory operands.
But that's not true - in Intel syntax only "dword ptr" ought to be
valid there, afaict. Same for vcvtph2qq (where the group also has
a stray blank line in the middle), vcvtph2uqq, vcvttph2qq, and
vcvttph2uqq. (The similarity of such seems to call for
templatization, but you may not be up to going this far.)

Like vcvtph2ps I think you can have a single template for the
512-bit form of vcvtph2psx handling both register and memory
operands. Assuming you started there, it's not clear to me why
you thought you'd need to split the template. (Also please don't
have double blank lines anywhere; right here they should all go
together anyway imo.) vcvtph2dq et al then should match the
resulting set here; again templatization may help avoid subtle
differences for sufficiently similar insns.

There looks to be a stray register-only (but not SAE) vcvtph2uw
template, when the prior one already handles register sources
alongside memory ones.

vcvtsi2sh and vcvtusi2sh specify Reg32|Reg64 and Word|Dword.
The former already implies the latter, so please drop the
redundant specifiers. Also their IntelSyntax SAE forms have
their operands in wrong order - see vcvtsi2ss/vcvtusi2ss.

Can vmulph and vmulsh please live next to each other?

So far for the assembler side of things; I'll see to get to the
disassembler part as well.

Jan
H.J. Lu via Binutils July 2, 2021, 3:08 p.m. | #2
On 01.07.2021 09:47, Cui,Lili wrote:
>  opcodes/i386-dis-evex-mod.h    |  10 +

>  opcodes/i386-dis-evex-prefix.h | 212 ++++++++++++

>  opcodes/i386-dis-evex-w.h      | 442 +++++++++++++++++++++++-


Some of the vcvt* entries have two identical table entries. This
suggests that splitting decode from the W bit is not needed (and
this really is a general pattern: If all table entries at a level
end up identical, then this decode step can be omitted unless
there's a side effect of that step); see the pre-existing entries,
where this gets handled (iirc) in the processing of Gdq. Or are
you suggesting that there's a bug in the handling of pre-existing
encodings?

>  opcodes/i386-dis-evex.h        | 600 ++++++++++++++++++++++++++++++++-

>  opcodes/i386-dis.c             | 263 ++++++++++++++-


xmmqh_mode seems to rather parallel evex_half_bcst_xmmq_mode,
not xmmq_mode. Perhaps it would then also better be named
similarly? Even if there might not be a pre-existing similar
entry there, an analogous concern then applies to xmmqdh_mode
wrt its name.

Is w_scalar_swap_mode really (as the comment says) like
b_mode, not w_mode?

While in pre-existing enumerators like PREFIX_EVEX_0F38AB I'm
fine with the sequence of numbers (they express a sort-of
sequence of opcode bytes, after all), I think names like
PREFIX_EVEX_MAP510 really want an underscore inserted:
PREFIX_EVEX_MAP5_10.

Jan
H.J. Lu via Binutils July 2, 2021, 3:46 p.m. | #3
On 02.07.2021 15:42, Jan Beulich via Binutils wrote:
> vcvtph2pd may need to have their AVX512VL templates split: Having

> e.g. Word|RegXMM|Dword means Word is the broadcast form and both

> XmmWord and Dword can be used with non-broadcast memory operands.

> But that's not true - in Intel syntax only "dword ptr" ought to be

> valid there, afaict. Same for vcvtph2qq (where the group also has

> a stray blank line in the middle), vcvtph2uqq, vcvttph2qq, and

> vcvttph2uqq.


Hmm, looking at your testcase addition to xmmword.s I'm guessing
now that I was wrong with the above, albeit I couldn't point at
the code in tc-i386.c that's making this work.

Jan
H.J. Lu via Binutils July 5, 2021, 6:30 a.m. | #4
On 01.07.2021 09:47, Cui,Lili wrote:
>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++


A few more observations:

While personally I think what you have is the best way of encoding
it (allowing 64-bit register use to control EVEX.W in 64-bit bit
mode), VMOVW is neither consistent with VPEXTRW (using EvexWIG) nor
with VMOVD (only permitting Reg32). H.J., what are your thoughts
here?

The VMOVW template needs splitting afaict: By OR-ing Word with
Reg32 and/or Reg64, you effectively also permit Reg16, and I guess
in Intel syntax VMOVW memory operands using "dword ptr" or
"qword ptr". You'll notice that e.g. {,V}PEXTRW and {,V}PINSRW
have separate register and memory operand templates, which is for
this reason, iirc. (You may recall my other remark regarding
combining e.g. Reg32 and Dword - there it is merely redundant, but
having such is liable to suggest to people that combinations like
Reg32 and Word are also okay. I intend to have i386-gen warn about
such down the road, but obviously only once all present redundancies
have been eliminated.)

VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.
But really it's unclear why each of them has three templates when
the corresponding pre-existing SD and SS insns get away with two. I
would have expected new templates to have been cloned from similar
existing ones, rather than introducing new ones (with new
inconsistencies). Of course there's (again) the possibility that
you've spotted a bug with pre-existing templates, but then - if you
don't want to fix those right away - I'd expect you to at least
point out why you deviate from what we've got.

I suppose VCMP{P,S}H should have a large set of pseudos just like
VCMP{P,S}{S,D} do, even if (for now) the spec doesn't spell those
out.

Jan
H.J. Lu via Binutils July 5, 2021, 12:38 p.m. | #5
On Sun, Jul 4, 2021 at 11:30 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 01.07.2021 09:47, Cui,Lili wrote:

> >  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

>

> A few more observations:

>

> While personally I think what you have is the best way of encoding

> it (allowing 64-bit register use to control EVEX.W in 64-bit bit

> mode), VMOVW is neither consistent with VPEXTRW (using EvexWIG) nor

> with VMOVD (only permitting Reg32). H.J., what are your thoughts

> here?


VMOVD is different.  PEXTRW takes Reg64 for historical reasons.
Lili, please remove Reg64 and add VexWIG on VMOVW.

> The VMOVW template needs splitting afaict: By OR-ing Word with

> Reg32 and/or Reg64, you effectively also permit Reg16, and I guess


Reg16 shouldn't be allowed.

> in Intel syntax VMOVW memory operands using "dword ptr" or

> "qword ptr". You'll notice that e.g. {,V}PEXTRW and {,V}PINSRW

> have separate register and memory operand templates, which is for

> this reason, iirc. (You may recall my other remark regarding

> combining e.g. Reg32 and Dword - there it is merely redundant, but

> having such is liable to suggest to people that combinations like

> Reg32 and Word are also okay. I intend to have i386-gen warn about

> such down the road, but obviously only once all present redundancies

> have been eliminated.)

>

> VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.

> But really it's unclear why each of them has three templates when

> the corresponding pre-existing SD and SS insns get away with two. I

> would have expected new templates to have been cloned from similar

> existing ones, rather than introducing new ones (with new

> inconsistencies). Of course there's (again) the possibility that

> you've spotted a bug with pre-existing templates, but then - if you

> don't want to fix those right away - I'd expect you to at least

> point out why you deviate from what we've got.

>

> I suppose VCMP{P,S}H should have a large set of pseudos just like

> VCMP{P,S}{S,D} do, even if (for now) the spec doesn't spell those

> out.

>

> Jan

>



-- 
H.J.
H.J. Lu via Binutils July 6, 2021, 12:42 p.m. | #6
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Friday, July 2, 2021 11:47 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: binutils@sourceware.org; hjl.tools@gmail.com

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 02.07.2021 15:42, Jan Beulich via Binutils wrote:

> > vcvtph2pd may need to have their AVX512VL templates split: Having e.g.

> > Word|RegXMM|Dword means Word is the broadcast form and both

> XmmWord

> > and Dword can be used with non-broadcast memory operands.

> > But that's not true - in Intel syntax only "dword ptr" ought to be

> > valid there, afaict. Same for vcvtph2qq (where the group also has a

> > stray blank line in the middle), vcvtph2uqq, vcvttph2qq, and

> > vcvttph2uqq.

> 

> Hmm, looking at your testcase addition to xmmword.s I'm guessing now that

> I was wrong with the above, albeit I couldn't point at the code in tc-i386.c

> that's making this work.

> 


Hi Jan, 
really appreciate your so many good suggestions, I will review and modify them one by one.

I found a similar instruction "vcvtph2ps" in AVX512F, and it also works normally, because there is a special 
judgment in function " match_mem_size", which maybe not a good way to resolve this case.

vcvtph2ps, 0x6613, None, CpuAVX512F|CpuAVX512VL, Modrm|EVex=2|Masking=3|Space0F38|VexW0|Disp8MemShift=3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }

In match_mem_size function, 

               /* For scalar opcode templates to allow register and memory
                  operands at the same time, some special casing is needed
                  here.  Also for v{,p}broadcast*, {,v}pmov{s,z}*, and
                  down-conversion vpmov*.  */
               || ((t->operand_types[wanted].bitfield.class == RegSIMD
                    && t->operand_types[wanted].bitfield.byte
                       + t->operand_types[wanted].bitfield.word
                       + t->operand_types[wanted].bitfield.dword
                       + t->operand_types[wanted].bitfield.qword
                       > !!t->opcode_modifier.broadcast)

                   ? (i.types[given].bitfield.xmmword
                      || i.types[given].bitfield.ymmword
                      || i.types[given].bitfield.zmmword)
                   : !match_simd_size(t, wanted, given))));
H.J. Lu via Binutils July 6, 2021, 12:48 p.m. | #7
> -----Original Message-----

> From: H.J. Lu <hjl.tools@gmail.com>

> Sent: Monday, July 5, 2021 8:39 PM

> To: Jan Beulich <jbeulich@suse.com>

> Cc: Cui, Lili <lili.cui@intel.com>; Binutils <binutils@sourceware.org>

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On Sun, Jul 4, 2021 at 11:30 PM Jan Beulich <jbeulich@suse.com> wrote:

> >

> > On 01.07.2021 09:47, Cui,Lili wrote:

> > >  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

> >

> > A few more observations:

> >

> > While personally I think what you have is the best way of encoding it

> > (allowing 64-bit register use to control EVEX.W in 64-bit bit mode),

> > VMOVW is neither consistent with VPEXTRW (using EvexWIG) nor with

> > VMOVD (only permitting Reg32). H.J., what are your thoughts here?

> 

> VMOVD is different.  PEXTRW takes Reg64 for historical reasons.

> Lili, please remove Reg64 and add VexWIG on VMOVW.

> 

OK

> > The VMOVW template needs splitting afaict: By OR-ing Word with

> > Reg32 and/or Reg64, you effectively also permit Reg16, and I guess

> 

> Reg16 shouldn't be allowed.

> 

> > in Intel syntax VMOVW memory operands using "dword ptr" or "qword

> > ptr". You'll notice that e.g. {,V}PEXTRW and {,V}PINSRW have separate

> > register and memory operand templates, which is for this reason, iirc.

> > (You may recall my other remark regarding combining e.g. Reg32 and

> > Dword - there it is merely redundant, but having such is liable to

> > suggest to people that combinations like

> > Reg32 and Word are also okay. I intend to have i386-gen warn about

> > such down the road, but obviously only once all present redundancies

> > have been eliminated.)

> >

> > VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.

> > But really it's unclear why each of them has three templates when the

> > corresponding pre-existing SD and SS insns get away with two. I would

> > have expected new templates to have been cloned from similar existing

> > ones, rather than introducing new ones (with new inconsistencies). Of

> > course there's (again) the possibility that you've spotted a bug with

> > pre-existing templates, but then - if you don't want to fix those

> > right away - I'd expect you to at least point out why you deviate from

> > what we've got.

> >

> > I suppose VCMP{P,S}H should have a large set of pseudos just like

> > VCMP{P,S}{S,D} do, even if (for now) the spec doesn't spell those out.

> >

> > Jan

> >

> 

> 

> --

> H.J.
H.J. Lu via Binutils July 9, 2021, 11:47 a.m. | #8
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Monday, July 5, 2021 2:30 PM

> To: Cui, Lili <lili.cui@intel.com>; hjl.tools@gmail.com

> Cc: binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 01.07.2021 09:47, Cui,Lili wrote:

> >  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

> 

> A few more observations:

> 

> While personally I think what you have is the best way of encoding it

> (allowing 64-bit register use to control EVEX.W in 64-bit bit mode), VMOVW

> is neither consistent with VPEXTRW (using EvexWIG) nor with VMOVD (only

> permitting Reg32). H.J., what are your thoughts here?

> 

Removed Reg64 and add VexWIG on VMOVW.

> The VMOVW template needs splitting afaict: By OR-ing Word with

> Reg32 and/or Reg64, git s have separate register and

> memory operand templates, which is for this reason, iirc. (You may recall my

> other remark regarding combining e.g. Reg32 and Dword - there it is merely

> redundant, but having such is liable to suggest to people that combinations

> like

> Reg32 and Word are also okay. I intend to have i386-gen warn about such

> down the road, but obviously only once all prese/vnt redundancies have been

> eliminated.)

> 

Done.

> VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.

> But really it's unclear why each of them has three templates when the

> corresponding pre-existing SD and SS insns get away with two. I would have

> expected new templates to have been cloned from similar existing ones,

> rather than introducing new ones (with new inconsistencies). Of course

> there's (again) the possibility that you've spotted a bug with pre-existing

> templates, but then - if you don't want to fix those right away - I'd expect you

> to at least point out why you deviate from what we've got.

> 

vcvtss2si has two templates because there is a special judgment in check_long_reg function, then the instruction can encode as EVEX.W = 1 without explicit VexW1.

if (intel_syntax
    && i.tm.opcode_modifier.toqword
    && i.types[0].bitfield.class != RegSIMD)
          {
            /* Convert to QWORD.  We want REX byte. */
            i.suffix = QWORD_MNEM_SUFFIX;
          }

I add a special judgment in check_word_reg function, then VCVT{,T}SH2{,U}SI can also have two templates. I think that in order to reduce the number of templates and make the code less readable, this is a trade-off. I changed it anyway. Jan, what are your thoughts here?

    else if (i.types[op].bitfield.qword
             && (i.tm.operand_types[op].bitfield.class == Reg
                 || i.tm.operand_types[op].bitfield.instance == Accum)
             && i.tm.operand_types[op].bitfield.qword)
      {
        if (intel_syntax
            && i.tm.opcode_modifier.toqword
            && i.types[0].bitfield.class != RegSIMD)
          {
            /* Convert to QWORD.  We want REX byte. */
            i.suffix = QWORD_MNEM_SUFFIX;
          }
      }


> I suppose VCMP{P,S}H should have a large set of pseudos just like

> VCMP{P,S}{S,D} do, even if (for now) the spec doesn't spell those out.

> 

> Jan
H.J. Lu via Binutils July 9, 2021, 11:50 a.m. | #9
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Friday, July 2, 2021 11:09 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 01.07.2021 09:47, Cui,Lili wrote:

> >  opcodes/i386-dis-evex-mod.h    |  10 +

> >  opcodes/i386-dis-evex-prefix.h | 212 ++++++++++++

> >  opcodes/i386-dis-evex-w.h      | 442 +++++++++++++++++++++++-

> 

> Some of the vcvt* entries have two identical table entries. This suggests that

> splitting decode from the W bit is not needed (and this really is a general

> pattern: If all table entries at a level end up identical, then this decode step

> can be omitted unless there's a side effect of that step); see the pre-existing

> entries, where this gets handled (iirc) in the processing of Gdq. Or are you

> suggesting that there's a bug in the handling of pre-existing encodings?

> 

> >  opcodes/i386-dis-evex.h        | 600 ++++++++++++++++++++++++++++++++-

> >  opcodes/i386-dis.c             | 263 ++++++++++++++-

> 

VCVT{,T}SH2{,U}SI don't need W bit, I deleted W bit for them.


> xmmqh_mode seems to rather parallel evex_half_bcst_xmmq_mode, not

> xmmq_mode. Perhaps it would then also better be named similarly? Even if

> there might not be a pre-existing similar entry there, an analogous concern

> then applies to xmmqdh_mode wrt its name.

> 

Replace xmmqh_mode with evex_half_bcst_xmmqh_mode.
Replace xmmqdh_mode with evex_half_bcst_xmmqdh_mode.

> Is w_scalar_swap_mode really (as the comment says) like b_mode, not

> w_mode?


It should be like w_mode, I changed comments.

> While in pre-existing enumerators like PREFIX_EVEX_0F38AB I'm fine with the

> sequence of numbers (they express a sort-of sequence of opcode bytes,

> after all), I think names like

> PREFIX_EVEX_MAP510 really want an underscore inserted:

> PREFIX_EVEX_MAP5_10.

> 

Agree with you, I changed MAP510/ MAP6 to MAP5_/ MAP6_.


> Jan
H.J. Lu via Binutils July 9, 2021, 11:52 a.m. | #10
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Friday, July 2, 2021 9:42 PM

> To: Cui, Lili <lili.cui@intel.com>; hjl.tools@gmail.com

> Cc: binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 01.07.2021 09:47, Cui,Lili wrote:

> > ---

> >  gas/config/tc-i386.c           |  15 +-

> >  gas/doc/c-i386.texi            |   4 +-

> >  opcodes/i386-dis-evex-mod.h    |  10 +

> >  opcodes/i386-dis-evex-prefix.h | 212 ++++++++++++

> >  opcodes/i386-dis-evex-w.h      | 442 +++++++++++++++++++++++-

> >  opcodes/i386-dis-evex.h        | 600 ++++++++++++++++++++++++++++++++-

> >  opcodes/i386-dis.c             | 263 ++++++++++++++-

> >  opcodes/i386-gen.c             |   7 +-

> 

> The expansion of CPU_AVX512_FP16_FLAGS wants to use

> CPU_AVX512BW_FLAGS instead of CPU_AVX512F_FLAGS. Similarly

> CPU_ANY_AVX512BW_FLAGS wants to be altered to include

> CPU_ANY_AVX512_FP16_FLAGS.

> 

> In the CPU flags enum (and related data) I wonder whether we wouldn't

> better keep all AVX512* together. H.J. - do you have an opinion there either

> way?

> 

> >  opcodes/i386-opc.h             |   7 +

> >  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

> 

> May I suggest SpaceEVexMap{5,6} to become just EVexMap{5,6}?

> The table entries are hard enough to read, being partly far over 200 columns.

> Any unambiguous shortening of names is a win imo.

> 

I changed it to EVexMap{5,6} and put it behind with other VEX.m-mmmm interpretation. Do you think SpaceEVexMap{5,6} is better? because it maintains the same style as the others.

#define Space0F    OpcodeSpace=SPACE_0F
#define Space0F38  OpcodeSpace=SPACE_0F38
#define Space0F3A  OpcodeSpace=SPACE_0F3A
#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A

#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6

> You appear to be adding IgnoreSize to most (all?) SAE templates. May I ask

> why that is? I've taken quite a bit of time over the last couple of years to

> remove stray uses, and I'd really prefer if no unneeded new ones got

> introduced.

> 

Sorry for that, I nearly deleted  all IgnoreSize For FP16, except vcvtph2psx, vcvtsi2sh and vcvtusi2sh. This patch was written a few years ago, the part of IgnoreSize was not updated with community.


> Some SAE templates have Disp8MemShift settings, despite these not allowing

> memory operands in the first place.

> 

Removed  Disp8MemShift for SAE.

> You seem to be using a mixture of new-style shorthands (e.g. VexW0) and

> old-style unreadable forms (e.g. VexW=2). Please use only new-style variants

> (where ones exist already).

> 

Done.

> While purely cosmetic, I'm puzzled here (and I've been puzzled by many

> other similar pre-existing inconsistencies) by the mixture of operand

> attribute ordering. I'd really appreciate if you could standardize on one form

> in all of the additions here, e.g.

> 

> RegXMM|Dword|Unspecified|BaseIndex

> 

Done.

> (that is: register specifier(s) [of course also always in the same order when

> there are multiple], memory specifiers, and BaseIndex in this order). Getting

> this uniform (eventually) helps when searching for patterns or when

> comparing templates.

> 

> There looks to be a redundant vcmpph template.

> 

Yes, there are two identical vcmpph, I deleted one.

> vcvtqq2ph* and vcvtuqq2ph* would imo better move up next to the other

> v*2ph group (initially I was suspecting they might be missing altogether). This

> then makes it easier to see their similarity with vcvtpd2ph*.

> 

I reordered all vcvt* instructions.

> vcvtph2pd may need to have their AVX512VL templates split: Having e.g.

> Word|RegXMM|Dword means Word is the broadcast form and both

> XmmWord and Dword can be used with non-broadcast memory operands.

> But that's not true - in Intel syntax only "dword ptr" ought to be valid there,

> afaict. Same for vcvtph2qq (where the group also has a stray blank line in the

> middle), vcvtph2uqq, vcvttph2qq, and vcvttph2uqq. (The similarity of such

> seems to call for templatization, but you may not be up to going this far.)

> 


Unnecessary blank lines removed.

> Like vcvtph2ps I think you can have a single template for the 512-bit form of

> vcvtph2psx handling both register and memory operands. Assuming you

> started there, it's not clear to me why you thought you'd need to split the

> template. (Also please don't have double blank lines anywhere; right here

> they should all go together anyway imo.) vcvtph2dq et al then should match

> the resulting set here; again templatization may help avoid subtle differences

> for sufficiently similar insns.

> 

Done.


> There looks to be a stray register-only (but not SAE) vcvtph2uw template,

> when the prior one already handles register sources alongside memory ones.

> 

This is a redundant template and I deleted it.


> vcvtsi2sh and vcvtusi2sh specify Reg32|Reg64 and Word|Dword.

> The former already implies the latter, so please drop the redundant

> specifiers. Also their IntelSyntax SAE forms have their operands in wrong

> order - see vcvtsi2ss/vcvtusi2ss.

> 

Done.

> Can vmulph and vmulsh please live next to each other?

> 

Done.
> So far for the assembler side of things; I'll see to get to the disassembler part

> as well.

> 

> Jan
H.J. Lu via Binutils July 9, 2021, 12:16 p.m. | #11
On 09.07.2021 13:47, Cui, Lili wrote:
>> -----Original Message-----

>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Monday, July 5, 2021 2:30 PM

>> To: Cui, Lili <lili.cui@intel.com>; hjl.tools@gmail.com

>> Cc: binutils@sourceware.org

>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

>>

>> On 01.07.2021 09:47, Cui,Lili wrote:

>>>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

>> VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.

>> But really it's unclear why each of them has three templates when the

>> corresponding pre-existing SD and SS insns get away with two. I would have

>> expected new templates to have been cloned from similar existing ones,

>> rather than introducing new ones (with new inconsistencies). Of course

>> there's (again) the possibility that you've spotted a bug with pre-existing

>> templates, but then - if you don't want to fix those right away - I'd expect you

>> to at least point out why you deviate from what we've got.

>>

> vcvtss2si has two templates because there is a special judgment in check_long_reg function, then the instruction can encode as EVEX.W = 1 without explicit VexW1.

> 

> if (intel_syntax

>     && i.tm.opcode_modifier.toqword

>     && i.types[0].bitfield.class != RegSIMD)

>           {

>             /* Convert to QWORD.  We want REX byte. */

>             i.suffix = QWORD_MNEM_SUFFIX;

>           }

> 

> I add a special judgment in check_word_reg function, then VCVT{,T}SH2{,U}SI can also have two templates. I think that in order to reduce the number of templates and make the code less readable, this is a trade-off. I changed it anyway. Jan, what are your thoughts here?

> 

>     else if (i.types[op].bitfield.qword

>              && (i.tm.operand_types[op].bitfield.class == Reg

>                  || i.tm.operand_types[op].bitfield.instance == Accum)

>              && i.tm.operand_types[op].bitfield.qword)

>       {

>         if (intel_syntax

>             && i.tm.opcode_modifier.toqword

>             && i.types[0].bitfield.class != RegSIMD)

>           {

>             /* Convert to QWORD.  We want REX byte. */

>             i.suffix = QWORD_MNEM_SUFFIX;

>           }

>       }


I think this is the right thing to do, to keep things as symmetric / 
consistent as possible. The one part I don't understand though is
the check against Accum. I'm also not sure you really need to check
both i.types[] and i.tm.operand_types[] for qword: Doesn't this
function run after template matching, in which case you only need
to check the actual register type, not the one(s) the template
permits? And finally - but without seeing the context I may be
wrong here - as presented I'd suggest the two nested if()-s to be
folded.

Jan
H.J. Lu via Binutils July 13, 2021, 6:58 a.m. | #12
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Friday, July 9, 2021 8:17 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 09.07.2021 13:47, Cui, Lili wrote:

> >> -----Original Message-----

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Monday, July 5, 2021 2:30 PM

> >> To: Cui, Lili <lili.cui@intel.com>; hjl.tools@gmail.com

> >> Cc: binutils@sourceware.org

> >> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >> instructions

> >>

> >> On 01.07.2021 09:47, Cui,Lili wrote:

> >>>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

> >> VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.

> >> But really it's unclear why each of them has three templates when the

> >> corresponding pre-existing SD and SS insns get away with two. I would

> >> have expected new templates to have been cloned from similar existing

> >> ones, rather than introducing new ones (with new inconsistencies). Of

> >> course there's (again) the possibility that you've spotted a bug with

> >> pre-existing templates, but then - if you don't want to fix those

> >> right away - I'd expect you to at least point out why you deviate from what

> we've got.

> >>

> > vcvtss2si has two templates because there is a special judgment in

> check_long_reg function, then the instruction can encode as EVEX.W = 1

> without explicit VexW1.

> >

> > if (intel_syntax

> >     && i.tm.opcode_modifier.toqword

> >     && i.types[0].bitfield.class != RegSIMD)

> >           {

> >             /* Convert to QWORD.  We want REX byte. */

> >             i.suffix = QWORD_MNEM_SUFFIX;

> >           }

> >

> > I add a special judgment in check_word_reg function, then VCVT{,T}SH2{,U}SI

> can also have two templates. I think that in order to reduce the number of

> templates and make the code less readable, this is a trade-off. I changed it

> anyway. Jan, what are your thoughts here?

> >

> >     else if (i.types[op].bitfield.qword

> >              && (i.tm.operand_types[op].bitfield.class == Reg

> >                  || i.tm.operand_types[op].bitfield.instance == Accum)

> >              && i.tm.operand_types[op].bitfield.qword)

> >       {

> >         if (intel_syntax

> >             && i.tm.opcode_modifier.toqword

> >             && i.types[0].bitfield.class != RegSIMD)

> >           {

> >             /* Convert to QWORD.  We want REX byte. */

> >             i.suffix = QWORD_MNEM_SUFFIX;

> >           }

> >       }

> 

> I think this is the right thing to do, to keep things as symmetric / consistent as

> possible. The one part I don't understand though is the check against Accum.

> I'm also not sure you really need to check both i.types[] and

> i.tm.operand_types[] for qword: Doesn't this function run after template

> matching, in which case you only need to check the actual register type, not the

> one(s) the template permits? And finally - but without seeing the context I may

> be wrong here - as presented I'd suggest the two nested if()-s to be folded.

> 

I merged two nested if(), and removed unnecessary checks. 

    /* For some instructions need encode as EVEX.W=1 without explicit VexW1. */
    else if (i.types[op].bitfield.qword
             && i.tm.operand_types[op].bitfield.class == Reg
             && intel_syntax
             && i.tm.opcode_modifier.toqword
             && i.types[0].bitfield.class != RegSIMD)
      {
          /* Convert to QWORD.  We want REX byte. */
          i.suffix = QWORD_MNEM_SUFFIX;
      }

Thank,
Lili.
H.J. Lu via Binutils July 13, 2021, 7:25 a.m. | #13
On 09.07.2021 13:52, Cui, Lili wrote:
>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Friday, July 2, 2021 9:42 PM

>>

>> On 01.07.2021 09:47, Cui,Lili wrote:

>>>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

>>

>> May I suggest SpaceEVexMap{5,6} to become just EVexMap{5,6}?

>> The table entries are hard enough to read, being partly far over 200 columns.

>> Any unambiguous shortening of names is a win imo.

>>

> I changed it to EVexMap{5,6} and put it behind with other VEX.m-mmmm interpretation. Do you think SpaceEVexMap{5,6} is better? because it maintains the same style as the others.

> 

> #define Space0F    OpcodeSpace=SPACE_0F

> #define Space0F38  OpcodeSpace=SPACE_0F38

> #define Space0F3A  OpcodeSpace=SPACE_0F3A

> #define SpaceXOP08 OpcodeSpace=SPACE_XOP08

> #define SpaceXOP09 OpcodeSpace=SPACE_XOP09

> #define SpaceXOP0A OpcodeSpace=SPACE_XOP0A

> 

> #define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5

> #define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6


The shorter form you have seems better to me.

>> You appear to be adding IgnoreSize to most (all?) SAE templates. May I ask

>> why that is? I've taken quite a bit of time over the last couple of years to

>> remove stray uses, and I'd really prefer if no unneeded new ones got

>> introduced.

>>

> Sorry for that, I nearly deleted  all IgnoreSize For FP16, except vcvtph2psx, vcvtsi2sh and vcvtusi2sh. This patch was written a few years ago,


A few _years_ ago? Oh.

Jan

> the part of IgnoreSize was not updated with community.
H.J. Lu via Binutils July 13, 2021, 7:35 a.m. | #14
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Tuesday, July 13, 2021 3:26 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 09.07.2021 13:52, Cui, Lili wrote:

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Friday, July 2, 2021 9:42 PM

> >>

> >> On 01.07.2021 09:47, Cui,Lili wrote:

> >>>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

> >>

> >> May I suggest SpaceEVexMap{5,6} to become just EVexMap{5,6}?

> >> The table entries are hard enough to read, being partly far over 200

> columns.

> >> Any unambiguous shortening of names is a win imo.

> >>

> > I changed it to EVexMap{5,6} and put it behind with other VEX.m-mmmm

> interpretation. Do you think SpaceEVexMap{5,6} is better? because it

> maintains the same style as the others.

> >

> > #define Space0F    OpcodeSpace=SPACE_0F

> > #define Space0F38  OpcodeSpace=SPACE_0F38 #define Space0F3A

> > OpcodeSpace=SPACE_0F3A #define SpaceXOP08

> OpcodeSpace=SPACE_XOP08

> > #define SpaceXOP09 OpcodeSpace=SPACE_XOP09 #define SpaceXOP0A

> > OpcodeSpace=SPACE_XOP0A

> >

> > #define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5 #define EVexMap6

> > OpcodeSpace=SPACE_EVEXMAP6

> 

> The shorter form you have seems better to me.

> 

OK,  thanks.

> >> You appear to be adding IgnoreSize to most (all?) SAE templates. May

> >> I ask why that is? I've taken quite a bit of time over the last

> >> couple of years to remove stray uses, and I'd really prefer if no

> >> unneeded new ones got introduced.

> >>

> > Sorry for that, I nearly deleted  all IgnoreSize For FP16, except

> > vcvtph2psx, vcvtsi2sh and vcvtusi2sh. This patch was written a few

> > years ago,

> 

> A few _years_ ago? Oh.

> 

 Yes, more than two years ago.

> Jan

> 

> > the part of IgnoreSize was not updated with community.
H.J. Lu via Binutils July 13, 2021, 7:54 a.m. | #15
On 13.07.2021 08:58, Cui, Lili wrote:
>> -----Original Message-----

>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Friday, July 9, 2021 8:17 PM

>> To: Cui, Lili <lili.cui@intel.com>

>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

>>

>> On 09.07.2021 13:47, Cui, Lili wrote:

>>>> -----Original Message-----

>>>> From: Jan Beulich <jbeulich@suse.com>

>>>> Sent: Monday, July 5, 2021 2:30 PM

>>>> To: Cui, Lili <lili.cui@intel.com>; hjl.tools@gmail.com

>>>> Cc: binutils@sourceware.org

>>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

>>>> instructions

>>>>

>>>> On 01.07.2021 09:47, Cui,Lili wrote:

>>>>>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

>>>> VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.

>>>> But really it's unclear why each of them has three templates when the

>>>> corresponding pre-existing SD and SS insns get away with two. I would

>>>> have expected new templates to have been cloned from similar existing

>>>> ones, rather than introducing new ones (with new inconsistencies). Of

>>>> course there's (again) the possibility that you've spotted a bug with

>>>> pre-existing templates, but then - if you don't want to fix those

>>>> right away - I'd expect you to at least point out why you deviate from what

>> we've got.

>>>>

>>> vcvtss2si has two templates because there is a special judgment in

>> check_long_reg function, then the instruction can encode as EVEX.W = 1

>> without explicit VexW1.

>>>

>>> if (intel_syntax

>>>     && i.tm.opcode_modifier.toqword

>>>     && i.types[0].bitfield.class != RegSIMD)

>>>           {

>>>             /* Convert to QWORD.  We want REX byte. */

>>>             i.suffix = QWORD_MNEM_SUFFIX;

>>>           }

>>>

>>> I add a special judgment in check_word_reg function, then VCVT{,T}SH2{,U}SI

>> can also have two templates. I think that in order to reduce the number of

>> templates and make the code less readable, this is a trade-off. I changed it

>> anyway. Jan, what are your thoughts here?

>>>

>>>     else if (i.types[op].bitfield.qword

>>>              && (i.tm.operand_types[op].bitfield.class == Reg

>>>                  || i.tm.operand_types[op].bitfield.instance == Accum)

>>>              && i.tm.operand_types[op].bitfield.qword)

>>>       {

>>>         if (intel_syntax

>>>             && i.tm.opcode_modifier.toqword

>>>             && i.types[0].bitfield.class != RegSIMD)

>>>           {

>>>             /* Convert to QWORD.  We want REX byte. */

>>>             i.suffix = QWORD_MNEM_SUFFIX;

>>>           }

>>>       }

>>

>> I think this is the right thing to do, to keep things as symmetric / consistent as

>> possible. The one part I don't understand though is the check against Accum.

>> I'm also not sure you really need to check both i.types[] and

>> i.tm.operand_types[] for qword: Doesn't this function run after template

>> matching, in which case you only need to check the actual register type, not the

>> one(s) the template permits? And finally - but without seeing the context I may

>> be wrong here - as presented I'd suggest the two nested if()-s to be folded.

>>

> I merged two nested if(), and removed unnecessary checks. 

> 

>     /* For some instructions need encode as EVEX.W=1 without explicit VexW1. */

>     else if (i.types[op].bitfield.qword

>              && i.tm.operand_types[op].bitfield.class == Reg

>              && intel_syntax

>              && i.tm.opcode_modifier.toqword

>              && i.types[0].bitfield.class != RegSIMD)

>       {

>           /* Convert to QWORD.  We want REX byte. */

>           i.suffix = QWORD_MNEM_SUFFIX;

>       }


Before I go look at this in detail - is this expected to be the final
form of the rework following the v1 comments (i.e. is it v2), or are
there yet further changes to be expected? If it is v2, it would be
nice if you could submit it as such, as that would clarify matters.

Thanks, Jan
H.J. Lu via Binutils July 13, 2021, 8:03 a.m. | #16
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Tuesday, July 13, 2021 3:55 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 13.07.2021 08:58, Cui, Lili wrote:

> >> -----Original Message-----

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Friday, July 9, 2021 8:17 PM

> >> To: Cui, Lili <lili.cui@intel.com>

> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >> instructions

> >>

> >> On 09.07.2021 13:47, Cui, Lili wrote:

> >>>> -----Original Message-----

> >>>> From: Jan Beulich <jbeulich@suse.com>

> >>>> Sent: Monday, July 5, 2021 2:30 PM

> >>>> To: Cui, Lili <lili.cui@intel.com>; hjl.tools@gmail.com

> >>>> Cc: binutils@sourceware.org

> >>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >>>> instructions

> >>>>

> >>>> On 01.07.2021 09:47, Cui,Lili wrote:

> >>>>>  opcodes/i386-opc.tbl           | 376 +++++++++++++++++++++

> >>>> VCVT{,T}SH2{,U}SI should have EvexWIG for their non-64bit encodings.

> >>>> But really it's unclear why each of them has three templates when

> >>>> the corresponding pre-existing SD and SS insns get away with two. I

> >>>> would have expected new templates to have been cloned from similar

> >>>> existing ones, rather than introducing new ones (with new

> >>>> inconsistencies). Of course there's (again) the possibility that

> >>>> you've spotted a bug with pre-existing templates, but then - if you

> >>>> don't want to fix those right away - I'd expect you to at least

> >>>> point out why you deviate from what

> >> we've got.

> >>>>

> >>> vcvtss2si has two templates because there is a special judgment in

> >> check_long_reg function, then the instruction can encode as EVEX.W =

> >> 1 without explicit VexW1.

> >>>

> >>> if (intel_syntax

> >>>     && i.tm.opcode_modifier.toqword

> >>>     && i.types[0].bitfield.class != RegSIMD)

> >>>           {

> >>>             /* Convert to QWORD.  We want REX byte. */

> >>>             i.suffix = QWORD_MNEM_SUFFIX;

> >>>           }

> >>>

> >>> I add a special judgment in check_word_reg function, then

> >>> VCVT{,T}SH2{,U}SI

> >> can also have two templates. I think that in order to reduce the

> >> number of templates and make the code less readable, this is a

> >> trade-off. I changed it anyway. Jan, what are your thoughts here?

> >>>

> >>>     else if (i.types[op].bitfield.qword

> >>>              && (i.tm.operand_types[op].bitfield.class == Reg

> >>>                  || i.tm.operand_types[op].bitfield.instance == Accum)

> >>>              && i.tm.operand_types[op].bitfield.qword)

> >>>       {

> >>>         if (intel_syntax

> >>>             && i.tm.opcode_modifier.toqword

> >>>             && i.types[0].bitfield.class != RegSIMD)

> >>>           {

> >>>             /* Convert to QWORD.  We want REX byte. */

> >>>             i.suffix = QWORD_MNEM_SUFFIX;

> >>>           }

> >>>       }

> >>

> >> I think this is the right thing to do, to keep things as symmetric /

> >> consistent as possible. The one part I don't understand though is the

> check against Accum.

> >> I'm also not sure you really need to check both i.types[] and

> >> i.tm.operand_types[] for qword: Doesn't this function run after

> >> template matching, in which case you only need to check the actual

> >> register type, not the

> >> one(s) the template permits? And finally - but without seeing the

> >> context I may be wrong here - as presented I'd suggest the two nested if()-

> s to be folded.

> >>

> > I merged two nested if(), and removed unnecessary checks.

> >

> >     /* For some instructions need encode as EVEX.W=1 without explicit

> VexW1. */

> >     else if (i.types[op].bitfield.qword

> >              && i.tm.operand_types[op].bitfield.class == Reg

> >              && intel_syntax

> >              && i.tm.opcode_modifier.toqword

> >              && i.types[0].bitfield.class != RegSIMD)

> >       {

> >           /* Convert to QWORD.  We want REX byte. */

> >           i.suffix = QWORD_MNEM_SUFFIX;

> >       }

> 

> Before I go look at this in detail - is this expected to be the final form of the

> rework following the v1 comments (i.e. is it v2), or are there yet further

> changes to be expected? If it is v2, it would be nice if you could submit it as

> such, as that would clarify matters.


It is v2, it is the final form.
H.J. Lu via Binutils July 13, 2021, 4:25 p.m. | #17
On 13.07.2021 08:58, Cui, Lili wrote:

Assembler:

In check_word_reg() the comment will want to say "EVEX.W" instead of
"REX byte".

I don't think you've applied my comments regarding the cpu_flag_init[]
additions.

I can't seem to find any pseudos of VCMP{P,S}H, despite the prior
comment.

There are two VMOVW templates now, but the first still allows for both
Reg32 and Word, effectively being Reg32|Reg16.

There's an IgnoreSize on one of the VCVTPH2PSX, but none on the operand-
size-wise similar VCVTPH2{,U}DQ. I can't right away tell which one is
right, but this set of templates needs to be consistent in this regard.

Jan
H.J. Lu via Binutils July 14, 2021, 3:21 p.m. | #18
On 13.07.2021 08:58, Cui, Lili wrote:

Disassembler:

d_scalar_mode looks to be unused.

This

  /* EVEX_W_MAP5_2A_P_1 */
  {
    { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },
    { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },
  },

can imo be expressed without decoding EVEX.W, by using Edq instead
of (separately) Ed and Eq. There's at least one similar case
elsewhere. Interestingly in the 2si/2usi conversions you do use
Gdq already, which I think handles the EVEX.W=1 case correctly
outside of 64-bit mode (unlike Eq, which will unconditionally
produce 64-bit register names afaict).

As to a broader question on decoding EVEX.W: Did you consider
introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a
valid encoding), to avoid this decode step for perhaps almost all
entries? And if that's not an option, decoding EVEX.W first for
all the opcodes which previously had no meaning at all would, in
some cases, reduce the overall number of table entries (and in all
other cases this would then merely be for consistency, as it also
wouldn't increase the number of table entries). To give an example:

    { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

=>

  /* PREFIX_EVEX_0F3AC2 */
  {
    { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },
    { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },
  },

=>

  /* EVEX_W_0F3AC2_P_0 */
  {
    { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
  },
  /* EVEX_W_0F3AC2_P_1 */
  {
    { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
  },

i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first
would yield 1 (evex) + 2 (evex_w) + 4 (prefix) entries. The
delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2
vs. 1 + 2 + 4. This also results in more related entries ending
up closer to one another.

As to formatting, it looks as if the first hunk changing
intel_operand_size() mis-indents the return statement that was
there already before.

Jan
H.J. Lu via Binutils July 20, 2021, 7:08 a.m. | #19
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Wednesday, July 14, 2021 11:21 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 13.07.2021 08:58, Cui, Lili wrote:

> 

> Disassembler:

> 

> d_scalar_mode looks to be unused.

>

> This

> 

>   /* EVEX_W_MAP5_2A_P_1 */

>   {

>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

>   },

> 

> can imo be expressed without decoding EVEX.W, by using Edq instead of

> (separately) Ed and Eq. There's at least one similar case elsewhere.

> Interestingly in the 2si/2usi conversions you do use Gdq already, which I

> think handles the EVEX.W=1 case correctly outside of 64-bit mode (unlike Eq,

> which will unconditionally produce 64-bit register names afaict).

> 

> As to a broader question on decoding EVEX.W: Did you consider introducing

> e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a valid encoding), to

> avoid this decode step for perhaps almost all entries? And if that's not an

> option, decoding EVEX.W first for all the opcodes which previously had no

> meaning at all would, in some cases, reduce the overall number of table

> entries (and in all other cases this would then merely be for consistency, as it

> also wouldn't increase the number of table entries). To give an example:

> 

>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> 

> =>

> 

>   /* PREFIX_EVEX_0F3AC2 */

>   {

>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

>   },

> 

> =>

> 

>   /* EVEX_W_0F3AC2_P_0 */

>   {

>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>   },

>   /* EVEX_W_0F3AC2_P_1 */

>   {

>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>   },

> 

> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would yield 1

> (evex) + 2 (evex_w) + 4 (prefix) entries. 


Hi Jan,

Do you want me to change it like this?
     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
 
 =>
 
   /* PREFIX_EVEX_0F3AC2 */
   {
     { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
     { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
   },

"XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report bad code.
if  (EVEX.LL== EVEX.LLIG)
      print 'sh'   
else 
      print 'ph'

I do not quite understand it, if we change the disassembler like this, when we add a new instruction with same opcode and EVEX.W==W1, we need to change the old one.


>The delta is even larger for something

> like MAP5_7D: 1 + 4 + 4 * 2 vs. 1 + 2 + 4. This also results in more related

> entries ending up closer to one another.

> 

I don't quite understand here,  should I let all FP16 disassembler go through W_TABLE fist? or just add something like %XH instead of going through W_TABLE? Thanks.


> As to formatting, it looks as if the first hunk changing

> intel_operand_size() mis-indents the return statement that was there

> already before.

> 

> Jan
H.J. Lu via Binutils July 20, 2021, 8:46 a.m. | #20
On 20.07.2021 09:08, Cui, Lili wrote:
> 

>> -----Original Message-----

>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Wednesday, July 14, 2021 11:21 PM

>> To: Cui, Lili <lili.cui@intel.com>

>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

>>

>> On 13.07.2021 08:58, Cui, Lili wrote:

>>

>> Disassembler:

>>

>> d_scalar_mode looks to be unused.

>>

>> This

>>

>>   /* EVEX_W_MAP5_2A_P_1 */

>>   {

>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

>>   },

>>

>> can imo be expressed without decoding EVEX.W, by using Edq instead of

>> (separately) Ed and Eq. There's at least one similar case elsewhere.

>> Interestingly in the 2si/2usi conversions you do use Gdq already, which I

>> think handles the EVEX.W=1 case correctly outside of 64-bit mode (unlike Eq,

>> which will unconditionally produce 64-bit register names afaict).

>>

>> As to a broader question on decoding EVEX.W: Did you consider introducing

>> e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a valid encoding), to

>> avoid this decode step for perhaps almost all entries? And if that's not an

>> option, decoding EVEX.W first for all the opcodes which previously had no

>> meaning at all would, in some cases, reduce the overall number of table

>> entries (and in all other cases this would then merely be for consistency, as it

>> also wouldn't increase the number of table entries). To give an example:

>>

>>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

>>

>> =>

>>

>>   /* PREFIX_EVEX_0F3AC2 */

>>   {

>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

>>   },

>>

>> =>

>>

>>   /* EVEX_W_0F3AC2_P_0 */

>>   {

>>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>>   },

>>   /* EVEX_W_0F3AC2_P_1 */

>>   {

>>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>>   },

>>

>> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would yield 1

>> (evex) + 2 (evex_w) + 4 (prefix) entries. 

> 

> Hi Jan,

> 

> Do you want me to change it like this?

>      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

>  

>  =>

>  

>    /* PREFIX_EVEX_0F3AC2 */

>    {

>      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>    },

> 

> "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report bad code.

> if  (EVEX.LL== EVEX.LLIG)

>       print 'sh'   

> else 

>       print 'ph'


Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'
depending on VEX.W. %XH would print 'h' if EVEX.W is clear and produce
an appropriate indication of the encoding being bad if EVEX.W is set.
IOW something like

   /* PREFIX_EVEX_0F3AC2 */
   {
     { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
     { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
   },

>> The delta is even larger for something

>> like MAP5_7D: 1 + 4 + 4 * 2 vs. 1 + 2 + 4. This also results in more related

>> entries ending up closer to one another.

>>

> I don't quite understand here,  should I let all FP16 disassembler go through W_TABLE fist? or just add something like %XH instead of going through W_TABLE? Thanks.


Where beneficial you will want to decode EVEX.W first, yes. Unless, as
per above, you can avoid that decoding step altogether by using %XH.

Jan
H.J. Lu via Binutils July 20, 2021, 11:13 a.m. | #21
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Tuesday, July 20, 2021 4:46 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> instructions

> 

> On 20.07.2021 09:08, Cui, Lili wrote:

> >

> >> -----Original Message-----

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Wednesday, July 14, 2021 11:21 PM

> >> To: Cui, Lili <lili.cui@intel.com>

> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >> instructions

> >>

> >> On 13.07.2021 08:58, Cui, Lili wrote:

> >>

> >> Disassembler:

> >>

> >> d_scalar_mode looks to be unused.

> >>

> >> This

> >>

> >>   /* EVEX_W_MAP5_2A_P_1 */

> >>   {

> >>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

> >>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

> >>   },

> >>

> >> can imo be expressed without decoding EVEX.W, by using Edq instead of

> >> (separately) Ed and Eq. There's at least one similar case elsewhere.

> >> Interestingly in the 2si/2usi conversions you do use Gdq already,

> >> which I think handles the EVEX.W=1 case correctly outside of 64-bit

> >> mode (unlike Eq, which will unconditionally produce 64-bit register names

> afaict).

> >>

> >> As to a broader question on decoding EVEX.W: Did you consider

> >> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a

> >> valid encoding), to avoid this decode step for perhaps almost all

> >> entries? And if that's not an option, decoding EVEX.W first for all

> >> the opcodes which previously had no meaning at all would, in some

> >> cases, reduce the overall number of table entries (and in all other

> >> cases this would then merely be for consistency, as it also wouldn't

> increase the number of table entries). To give an example:

> >>

> >>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >>

> >> =>

> >>

> >>   /* PREFIX_EVEX_0F3AC2 */

> >>   {

> >>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

> >>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

> >>   },

> >>

> >> =>

> >>

> >>   /* EVEX_W_0F3AC2_P_0 */

> >>   {

> >>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>   },

> >>   /* EVEX_W_0F3AC2_P_1 */

> >>   {

> >>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>   },

> >>

> >> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would

> >> yield 1

> >> (evex) + 2 (evex_w) + 4 (prefix) entries.

> >

> > Hi Jan,

> >

> > Do you want me to change it like this?

> >      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >

> >  =>

> >

> >    /* PREFIX_EVEX_0F3AC2 */

> >    {

> >      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >    },

> >

> > "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report

> bad code.

> > if  (EVEX.LL== EVEX.LLIG)

> >       print 'sh'

> > else

> >       print 'ph'

> 

> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'

> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and produce an

> appropriate indication of the encoding being bad if EVEX.W is set.

> IOW something like

> 

>    /* PREFIX_EVEX_0F3AC2 */

>    {

>      { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>      { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>    },

> 

> >> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2

> >> vs. 1 + 2 + 4. This also results in more related entries ending up

> >> closer to one another.

> >>

> > I don't quite understand here,  should I let all FP16 disassembler go

> through W_TABLE fist? or just add something like %XH instead of going

> through W_TABLE? Thanks.

> 

> Where beneficial you will want to decode EVEX.W first, yes. Unless, as per

> above, you can avoid that decoding step altogether by using %XH.

> 

Okay, It is clear to me,  many thanks!

Lili
H.J. Lu via Binutils July 20, 2021, 11:26 a.m. | #22
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Tuesday, July 20, 2021 4:46 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> instructions

> 

> On 20.07.2021 09:08, Cui, Lili wrote:

> >

> >> -----Original Message-----

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Wednesday, July 14, 2021 11:21 PM

> >> To: Cui, Lili <lili.cui@intel.com>

> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >> instructions

> >>

> >> On 13.07.2021 08:58, Cui, Lili wrote:

> >>

> >> Disassembler:

> >>

> >> d_scalar_mode looks to be unused.

> >>

> >> This

> >>

> >>   /* EVEX_W_MAP5_2A_P_1 */

> >>   {

> >>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

> >>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

> >>   },

> >>

> >> can imo be expressed without decoding EVEX.W, by using Edq instead of

> >> (separately) Ed and Eq. There's at least one similar case elsewhere.

> >> Interestingly in the 2si/2usi conversions you do use Gdq already,

> >> which I think handles the EVEX.W=1 case correctly outside of 64-bit

> >> mode (unlike Eq, which will unconditionally produce 64-bit register names

> afaict).

> >>

> >> As to a broader question on decoding EVEX.W: Did you consider

> >> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a

> >> valid encoding), to avoid this decode step for perhaps almost all

> >> entries? And if that's not an option, decoding EVEX.W first for all

> >> the opcodes which previously had no meaning at all would, in some

> >> cases, reduce the overall number of table entries (and in all other

> >> cases this would then merely be for consistency, as it also wouldn't

> increase the number of table entries). To give an example:

> >>

> >>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >>

> >> =>

> >>

> >>   /* PREFIX_EVEX_0F3AC2 */

> >>   {

> >>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

> >>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

> >>   },

> >>

> >> =>

> >>

> >>   /* EVEX_W_0F3AC2_P_0 */

> >>   {

> >>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>   },

> >>   /* EVEX_W_0F3AC2_P_1 */

> >>   {

> >>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>   },

> >>

> >> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would

> >> yield 1

> >> (evex) + 2 (evex_w) + 4 (prefix) entries.

> >

> > Hi Jan,

> >

> > Do you want me to change it like this?

> >      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >

> >  =>

> >

> >    /* PREFIX_EVEX_0F3AC2 */

> >    {

> >      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >    },

> >

> > "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report

> bad code.

> > if  (EVEX.LL== EVEX.LLIG)

> >       print 'sh'

> > else

> >       print 'ph'

> 

> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'

> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and produce an

> appropriate indication of the encoding being bad if EVEX.W is set.

> IOW something like

> 

>    /* PREFIX_EVEX_0F3AC2 */

>    {

>      { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>      { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>    },

> 

> >> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2

> >> vs. 1 + 2 + 4. This also results in more related entries ending up

> >> closer to one another.

> >>

> > I don't quite understand here,  should I let all FP16 disassembler go

> through W_TABLE fist? or just add something like %XH instead of going

> through W_TABLE? Thanks.

> 

> Where beneficial you will want to decode EVEX.W first, yes. Unless, as per

> above, you can avoid that decoding step altogether by using %XH.

> 

I prefer to decode EVEX.W first instead of using %XH.

Thanks,
Lili.
H.J. Lu via Binutils July 20, 2021, 1:02 p.m. | #23
On 20.07.2021 13:26, Cui, Lili wrote:
> 

> 

>> -----Original Message-----

>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Tuesday, July 20, 2021 4:46 PM

>> To: Cui, Lili <lili.cui@intel.com>

>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

>> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

>> instructions

>>

>> On 20.07.2021 09:08, Cui, Lili wrote:

>>>

>>>> -----Original Message-----

>>>> From: Jan Beulich <jbeulich@suse.com>

>>>> Sent: Wednesday, July 14, 2021 11:21 PM

>>>> To: Cui, Lili <lili.cui@intel.com>

>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

>>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

>>>> instructions

>>>>

>>>> On 13.07.2021 08:58, Cui, Lili wrote:

>>>>

>>>> Disassembler:

>>>>

>>>> d_scalar_mode looks to be unused.

>>>>

>>>> This

>>>>

>>>>   /* EVEX_W_MAP5_2A_P_1 */

>>>>   {

>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

>>>>   },

>>>>

>>>> can imo be expressed without decoding EVEX.W, by using Edq instead of

>>>> (separately) Ed and Eq. There's at least one similar case elsewhere.

>>>> Interestingly in the 2si/2usi conversions you do use Gdq already,

>>>> which I think handles the EVEX.W=1 case correctly outside of 64-bit

>>>> mode (unlike Eq, which will unconditionally produce 64-bit register names

>> afaict).

>>>>

>>>> As to a broader question on decoding EVEX.W: Did you consider

>>>> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a

>>>> valid encoding), to avoid this decode step for perhaps almost all

>>>> entries? And if that's not an option, decoding EVEX.W first for all

>>>> the opcodes which previously had no meaning at all would, in some

>>>> cases, reduce the overall number of table entries (and in all other

>>>> cases this would then merely be for consistency, as it also wouldn't

>> increase the number of table entries). To give an example:

>>>>

>>>>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

>>>>

>>>> =>

>>>>

>>>>   /* PREFIX_EVEX_0F3AC2 */

>>>>   {

>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

>>>>   },

>>>>

>>>> =>

>>>>

>>>>   /* EVEX_W_0F3AC2_P_0 */

>>>>   {

>>>>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>>>>   },

>>>>   /* EVEX_W_0F3AC2_P_1 */

>>>>   {

>>>>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>>>>   },

>>>>

>>>> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would

>>>> yield 1

>>>> (evex) + 2 (evex_w) + 4 (prefix) entries.

>>>

>>> Hi Jan,

>>>

>>> Do you want me to change it like this?

>>>      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

>>>

>>>  =>

>>>

>>>    /* PREFIX_EVEX_0F3AC2 */

>>>    {

>>>      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>>>      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>>>    },

>>>

>>> "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report

>> bad code.

>>> if  (EVEX.LL== EVEX.LLIG)

>>>       print 'sh'

>>> else

>>>       print 'ph'

>>

>> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'

>> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and produce an

>> appropriate indication of the encoding being bad if EVEX.W is set.

>> IOW something like

>>

>>    /* PREFIX_EVEX_0F3AC2 */

>>    {

>>      { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>>      { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>>    },

>>

>>>> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2

>>>> vs. 1 + 2 + 4. This also results in more related entries ending up

>>>> closer to one another.

>>>>

>>> I don't quite understand here,  should I let all FP16 disassembler go

>> through W_TABLE fist? or just add something like %XH instead of going

>> through W_TABLE? Thanks.

>>

>> Where beneficial you will want to decode EVEX.W first, yes. Unless, as per

>> above, you can avoid that decoding step altogether by using %XH.

>>

> I prefer to decode EVEX.W first instead of using %XH.


Well, I can't stop you from avoiding %XH, but I did intentionally say
"Where beneficial you will want to ...". That is, I think that where
possible you should use %XH, and only where that's not suitable decode
EVEX.W (typically earlier than EVEX.pp).

Jan
H.J. Lu via Binutils July 20, 2021, 1:38 p.m. | #24
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Tuesday, July 20, 2021 9:03 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> instructions

> 

> On 20.07.2021 13:26, Cui, Lili wrote:

> >

> >

> >> -----Original Message-----

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Tuesday, July 20, 2021 4:46 PM

> >> To: Cui, Lili <lili.cui@intel.com>

> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >> instructions

> >>

> >> On 20.07.2021 09:08, Cui, Lili wrote:

> >>>

> >>>> -----Original Message-----

> >>>> From: Jan Beulich <jbeulich@suse.com>

> >>>> Sent: Wednesday, July 14, 2021 11:21 PM

> >>>> To: Cui, Lili <lili.cui@intel.com>

> >>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >>>> instructions

> >>>>

> >>>> On 13.07.2021 08:58, Cui, Lili wrote:

> >>>>

> >>>> Disassembler:

> >>>>

> >>>> d_scalar_mode looks to be unused.

> >>>>

> >>>> This

> >>>>

> >>>>   /* EVEX_W_MAP5_2A_P_1 */

> >>>>   {

> >>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

> >>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

> >>>>   },

> >>>>

> >>>> can imo be expressed without decoding EVEX.W, by using Edq instead

> >>>> of

> >>>> (separately) Ed and Eq. There's at least one similar case elsewhere.

> >>>> Interestingly in the 2si/2usi conversions you do use Gdq already,

> >>>> which I think handles the EVEX.W=1 case correctly outside of 64-bit

> >>>> mode (unlike Eq, which will unconditionally produce 64-bit register

> >>>> names

> >> afaict).

> >>>>

> >>>> As to a broader question on decoding EVEX.W: Did you consider

> >>>> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a

> >>>> valid encoding), to avoid this decode step for perhaps almost all

> >>>> entries? And if that's not an option, decoding EVEX.W first for all

> >>>> the opcodes which previously had no meaning at all would, in some

> >>>> cases, reduce the overall number of table entries (and in all other

> >>>> cases this would then merely be for consistency, as it also

> >>>> wouldn't

> >> increase the number of table entries). To give an example:

> >>>>

> >>>>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >>>>

> >>>> =>

> >>>>

> >>>>   /* PREFIX_EVEX_0F3AC2 */

> >>>>   {

> >>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

> >>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

> >>>>   },

> >>>>

> >>>> =>

> >>>>

> >>>>   /* EVEX_W_0F3AC2_P_0 */

> >>>>   {

> >>>>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>>>   },

> >>>>   /* EVEX_W_0F3AC2_P_1 */

> >>>>   {

> >>>>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>>>   },

> >>>>

> >>>> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first

> >>>> would yield 1

> >>>> (evex) + 2 (evex_w) + 4 (prefix) entries.

> >>>

> >>> Hi Jan,

> >>>

> >>> Do you want me to change it like this?

> >>>      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >>>

> >>>  =>

> >>>

> >>>    /* PREFIX_EVEX_0F3AC2 */

> >>>    {

> >>>      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>>      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>>    },

> >>>

> >>> "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1

> >>> report

> >> bad code.

> >>> if  (EVEX.LL== EVEX.LLIG)

> >>>       print 'sh'

> >>> else

> >>>       print 'ph'

> >>

> >> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'

> >> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and

> >> produce an appropriate indication of the encoding being bad if EVEX.W is

> set.

> >> IOW something like

> >>

> >>    /* PREFIX_EVEX_0F3AC2 */

> >>    {

> >>      { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>      { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>    },

> >>

> >>>> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2

> >>>> vs. 1 + 2 + 4. This also results in more related entries ending up

> >>>> closer to one another.

> >>>>

> >>> I don't quite understand here,  should I let all FP16 disassembler

> >>> go

> >> through W_TABLE fist? or just add something like %XH instead of going

> >> through W_TABLE? Thanks.

> >>

> >> Where beneficial you will want to decode EVEX.W first, yes. Unless,

> >> as per above, you can avoid that decoding step altogether by using %XH.

> >>

> > I prefer to decode EVEX.W first instead of using %XH.

> 

> Well, I can't stop you from avoiding %XH, but I did intentionally say "Where

> beneficial you will want to ...". That is, I think that where possible you should

> use %XH, and only where that's not suitable decode EVEX.W (typically earlier

> than EVEX.pp).

> 

OK, for the instruction that EVEX.W cannot be decoded earlier than EVEX.PP, I will use %XH.

Thanks,
Lili.
H.J. Lu via Binutils July 20, 2021, 2:15 p.m. | #25
On 20.07.2021 15:38, Cui, Lili wrote:
> 

> 

>> -----Original Message-----

>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Tuesday, July 20, 2021 9:03 PM

>> To: Cui, Lili <lili.cui@intel.com>

>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

>> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

>> instructions

>>

>> On 20.07.2021 13:26, Cui, Lili wrote:

>>>

>>>

>>>> -----Original Message-----

>>>> From: Jan Beulich <jbeulich@suse.com>

>>>> Sent: Tuesday, July 20, 2021 4:46 PM

>>>> To: Cui, Lili <lili.cui@intel.com>

>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

>>>> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

>>>> instructions

>>>>

>>>> On 20.07.2021 09:08, Cui, Lili wrote:

>>>>>

>>>>>> -----Original Message-----

>>>>>> From: Jan Beulich <jbeulich@suse.com>

>>>>>> Sent: Wednesday, July 14, 2021 11:21 PM

>>>>>> To: Cui, Lili <lili.cui@intel.com>

>>>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

>>>>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

>>>>>> instructions

>>>>>>

>>>>>> On 13.07.2021 08:58, Cui, Lili wrote:

>>>>>>

>>>>>> Disassembler:

>>>>>>

>>>>>> d_scalar_mode looks to be unused.

>>>>>>

>>>>>> This

>>>>>>

>>>>>>   /* EVEX_W_MAP5_2A_P_1 */

>>>>>>   {

>>>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

>>>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

>>>>>>   },

>>>>>>

>>>>>> can imo be expressed without decoding EVEX.W, by using Edq instead

>>>>>> of

>>>>>> (separately) Ed and Eq. There's at least one similar case elsewhere.

>>>>>> Interestingly in the 2si/2usi conversions you do use Gdq already,

>>>>>> which I think handles the EVEX.W=1 case correctly outside of 64-bit

>>>>>> mode (unlike Eq, which will unconditionally produce 64-bit register

>>>>>> names

>>>> afaict).

>>>>>>

>>>>>> As to a broader question on decoding EVEX.W: Did you consider

>>>>>> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a

>>>>>> valid encoding), to avoid this decode step for perhaps almost all

>>>>>> entries? And if that's not an option, decoding EVEX.W first for all

>>>>>> the opcodes which previously had no meaning at all would, in some

>>>>>> cases, reduce the overall number of table entries (and in all other

>>>>>> cases this would then merely be for consistency, as it also

>>>>>> wouldn't

>>>> increase the number of table entries). To give an example:

>>>>>>

>>>>>>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

>>>>>>

>>>>>> =>

>>>>>>

>>>>>>   /* PREFIX_EVEX_0F3AC2 */

>>>>>>   {

>>>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

>>>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

>>>>>>   },

>>>>>>

>>>>>> =>

>>>>>>

>>>>>>   /* EVEX_W_0F3AC2_P_0 */

>>>>>>   {

>>>>>>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>>>>>>   },

>>>>>>   /* EVEX_W_0F3AC2_P_1 */

>>>>>>   {

>>>>>>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>>>>>>   },

>>>>>>

>>>>>> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first

>>>>>> would yield 1

>>>>>> (evex) + 2 (evex_w) + 4 (prefix) entries.

>>>>>

>>>>> Hi Jan,

>>>>>

>>>>> Do you want me to change it like this?

>>>>>      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

>>>>>

>>>>>  =>

>>>>>

>>>>>    /* PREFIX_EVEX_0F3AC2 */

>>>>>    {

>>>>>      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>>>>>      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>>>>>    },

>>>>>

>>>>> "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1

>>>>> report

>>>> bad code.

>>>>> if  (EVEX.LL== EVEX.LLIG)

>>>>>       print 'sh'

>>>>> else

>>>>>       print 'ph'

>>>>

>>>> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'

>>>> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and

>>>> produce an appropriate indication of the encoding being bad if EVEX.W is

>> set.

>>>> IOW something like

>>>>

>>>>    /* PREFIX_EVEX_0F3AC2 */

>>>>    {

>>>>      { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

>>>>      { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

>>>>    },

>>>>

>>>>>> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2

>>>>>> vs. 1 + 2 + 4. This also results in more related entries ending up

>>>>>> closer to one another.

>>>>>>

>>>>> I don't quite understand here,  should I let all FP16 disassembler

>>>>> go

>>>> through W_TABLE fist? or just add something like %XH instead of going

>>>> through W_TABLE? Thanks.

>>>>

>>>> Where beneficial you will want to decode EVEX.W first, yes. Unless,

>>>> as per above, you can avoid that decoding step altogether by using %XH.

>>>>

>>> I prefer to decode EVEX.W first instead of using %XH.

>>

>> Well, I can't stop you from avoiding %XH, but I did intentionally say "Where

>> beneficial you will want to ...". That is, I think that where possible you should

>> use %XH, and only where that's not suitable decode EVEX.W (typically earlier

>> than EVEX.pp).

>>

> OK, for the instruction that EVEX.W cannot be decoded earlier than EVEX.PP, I will use %XH.


I'm sorry, but no, it's the other way around. The first check would be whether
%XH can be used. Only then would you check whether decoding EVEX.W first is at
least no worse than decoding EVEX.pp first; I think there will be few if any
cases where decoding EVEX.pp first is beneficial.

Jan
H.J. Lu via Binutils July 20, 2021, 2:29 p.m. | #26
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Tuesday, July 20, 2021 10:15 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> instructions

> 

> On 20.07.2021 15:38, Cui, Lili wrote:

> >

> >

> >> -----Original Message-----

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Tuesday, July 20, 2021 9:03 PM

> >> To: Cui, Lili <lili.cui@intel.com>

> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >> instructions

> >>

> >> On 20.07.2021 13:26, Cui, Lili wrote:

> >>>

> >>>

> >>>> -----Original Message-----

> >>>> From: Jan Beulich <jbeulich@suse.com>

> >>>> Sent: Tuesday, July 20, 2021 4:46 PM

> >>>> To: Cui, Lili <lili.cui@intel.com>

> >>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >>>> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >>>> instructions

> >>>>

> >>>> On 20.07.2021 09:08, Cui, Lili wrote:

> >>>>>

> >>>>>> -----Original Message-----

> >>>>>> From: Jan Beulich <jbeulich@suse.com>

> >>>>>> Sent: Wednesday, July 14, 2021 11:21 PM

> >>>>>> To: Cui, Lili <lili.cui@intel.com>

> >>>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> >>>>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16

> >>>>>> instructions

> >>>>>>

> >>>>>> On 13.07.2021 08:58, Cui, Lili wrote:

> >>>>>>

> >>>>>> Disassembler:

> >>>>>>

> >>>>>> d_scalar_mode looks to be unused.

> >>>>>>

> >>>>>> This

> >>>>>>

> >>>>>>   /* EVEX_W_MAP5_2A_P_1 */

> >>>>>>   {

> >>>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },

> >>>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },

> >>>>>>   },

> >>>>>>

> >>>>>> can imo be expressed without decoding EVEX.W, by using Edq

> >>>>>> instead of

> >>>>>> (separately) Ed and Eq. There's at least one similar case elsewhere.

> >>>>>> Interestingly in the 2si/2usi conversions you do use Gdq already,

> >>>>>> which I think handles the EVEX.W=1 case correctly outside of

> >>>>>> 64-bit mode (unlike Eq, which will unconditionally produce 64-bit

> >>>>>> register names

> >>>> afaict).

> >>>>>>

> >>>>>> As to a broader question on decoding EVEX.W: Did you consider

> >>>>>> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not

> >>>>>> a valid encoding), to avoid this decode step for perhaps almost

> >>>>>> all entries? And if that's not an option, decoding EVEX.W first

> >>>>>> for all the opcodes which previously had no meaning at all would,

> >>>>>> in some cases, reduce the overall number of table entries (and in

> >>>>>> all other cases this would then merely be for consistency, as it

> >>>>>> also wouldn't

> >>>> increase the number of table entries). To give an example:

> >>>>>>

> >>>>>>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >>>>>>

> >>>>>> =>

> >>>>>>

> >>>>>>   /* PREFIX_EVEX_0F3AC2 */

> >>>>>>   {

> >>>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },

> >>>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },

> >>>>>>   },

> >>>>>>

> >>>>>> =>

> >>>>>>

> >>>>>>   /* EVEX_W_0F3AC2_P_0 */

> >>>>>>   {

> >>>>>>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>>>>>   },

> >>>>>>   /* EVEX_W_0F3AC2_P_1 */

> >>>>>>   {

> >>>>>>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>>>>>   },

> >>>>>>

> >>>>>> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first

> >>>>>> would yield 1

> >>>>>> (evex) + 2 (evex_w) + 4 (prefix) entries.

> >>>>>

> >>>>> Hi Jan,

> >>>>>

> >>>>> Do you want me to change it like this?

> >>>>>      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },

> >>>>>

> >>>>>  =>

> >>>>>

> >>>>>    /* PREFIX_EVEX_0F3AC2 */

> >>>>>    {

> >>>>>      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>>>>      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>>>>    },

> >>>>>

> >>>>> "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if

> >>>>> EVEX.W==W1 report

> >>>> bad code.

> >>>>> if  (EVEX.LL== EVEX.LLIG)

> >>>>>       print 'sh'

> >>>>> else

> >>>>>       print 'ph'

> >>>>

> >>>> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'

> >>>> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and

> >>>> produce an appropriate indication of the encoding being bad if

> >>>> EVEX.W is

> >> set.

> >>>> IOW something like

> >>>>

> >>>>    /* PREFIX_EVEX_0F3AC2 */

> >>>>    {

> >>>>      { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },

> >>>>      { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },

> >>>>    },

> >>>>

> >>>>>> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 *

> >>>>>> 2 vs. 1 + 2 + 4. This also results in more related entries ending

> >>>>>> up closer to one another.

> >>>>>>

> >>>>> I don't quite understand here,  should I let all FP16 disassembler

> >>>>> go

> >>>> through W_TABLE fist? or just add something like %XH instead of

> >>>> going through W_TABLE? Thanks.

> >>>>

> >>>> Where beneficial you will want to decode EVEX.W first, yes. Unless,

> >>>> as per above, you can avoid that decoding step altogether by using %XH.

> >>>>

> >>> I prefer to decode EVEX.W first instead of using %XH.

> >>

> >> Well, I can't stop you from avoiding %XH, but I did intentionally say

> >> "Where beneficial you will want to ...". That is, I think that where

> >> possible you should use %XH, and only where that's not suitable

> >> decode EVEX.W (typically earlier than EVEX.pp).

> >>

> > OK, for the instruction that EVEX.W cannot be decoded earlier than

> EVEX.PP, I will use %XH.

> 

> I'm sorry, but no, it's the other way around. The first check would be

> whether %XH can be used. Only then would you check whether decoding

> EVEX.W first is at least no worse than decoding EVEX.pp first; I think there

> will be few if any cases where decoding EVEX.pp first is beneficial.

> 

Haha, I think I get it this time, and hope I understand it right this time.

Thanks,
Lili.
H.J. Lu via Binutils July 21, 2021, 10:32 a.m. | #27
On 14.07.2021 17:21, Jan Beulich via Binutils wrote:
> On 13.07.2021 08:58, Cui, Lili wrote:

> 

> Disassembler:

> 

> d_scalar_mode looks to be unused.


In addition to this, as noticed while doing some of the cleanup that
I've just posted a series for, I think this

#define EXwScalarS { OP_EX, w_scalar_swap_mode }

wants to be

#define EXwS { OP_EX, w_swap_mode }

as the "scalar-ness" doesn't really matter here, and names would
then be better in line with others we've got already.

Jan
H.J. Lu via Binutils July 21, 2021, 2:29 p.m. | #28
On 21.07.2021 15:14, Cui, Lili wrote:
>> -----Original Message-----

>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Wednesday, July 14, 2021 12:25 AM

>> On 13.07.2021 08:58, Cui, Lili wrote:

>>

>> I don't think you've applied my comments regarding the cpu_flag_init[]

>> additions

> 

> Sorry, I forgot this one, I changed it in this patch.

> 

> { "CPU_AVX512_FP16_FLAGS",

>     "CPU_AVX512BW_FLAGS|CpuAVX512_FP16" },

> ...

>   { "CPU_ANY_AVX512BW_FLAGS",

>     "CpuAVX512BW|CpuAVX512_FP16" },

> ...


In the latter case you will want to use CPU_ANY_AVX512_FP16_FLAGS,
not CpuAVX512_FP16. This not originally having got done properly
for CPU_ANY_AVX512F_FLAGS is why you now also need to change that
one. There preferably you'd replace CpuAVX512BW by
CPU_ANY_AVX512BW_FLAGS, instead of explicitly adding CpuAVX512_FP16.

>> I can't seem to find any pseudos of VCMP{P,S}H, despite the prior comment.

>>

> I have a misunderstanding about this last time, but this time I got it. 

> I added it to assembler and disassembler, also added test cases for it. 


Ah yes, albeit there's some confusion about the numbering of the
attached patches. I wanted to ask you anyway to send new versions of
the patches as new mail threads, instead of in reply to the prior
discussion. In fact, while the patch with the new tests is probably
indeed too large to send inline (but as said before, it's not
really reviewable anyway), I would much appreciate if you could
send the first patch inline instead of as attachment. This makes
commenting a lot easier.

As a purely cosmetic request - may I ask that you flip the order
of the two each vcmp<avx_frel>ph and vcmpph entries, such that - like
everywhere else - the rounding/SAE forms come second? As mentioned
previously, keeping things as consistent as possible helps readers
as well as making future changes or spotting possible problems.

As to the testsuite additions, I'm afraid you've misunderstood
what H.J. and I are requesting for xmmword.{s,d}: You don't need to
add _all_ new insns there - that would be making an unreasonably
large test case going forward. Only insns with irregular operand
combinations permitting %xmmN but not "xmmword ptr ..." for a given
operand position need putting there. This is to prove (now and going
forward) that despite allowing %xmmN the (wrong size) memory form
gets rejected. Regular scalar insns don't need putting there, and
even less so VDIVPH; if at all, a single random example (not from
the FP16 set, but from more basic ones) would be sufficient. So what
I was after with my request was that all the non-scalar VCVT* forms
would be represented there, when their vector element sizes vary.

I'm sorry if the earlier request wasn't explicit enough.

The other remark on the testsuite addition patch is that _if_ you
maintain a ChangeLog entry (which you aren't required to anymore),
then you will want to keep it up-to-date with patch contents.

I'll try to get to look at the assembler and disassembler parts in
more detail later this week, as the following week I'll be OoO.

Jan
H.J. Lu via Binutils July 22, 2021, 7:05 a.m. | #29
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Wednesday, July 21, 2021 10:29 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: hjl.tools@gmail.com; binutils@sourceware.org

> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

> 

> On 21.07.2021 15:14, Cui, Lili wrote:

> >> -----Original Message-----

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Wednesday, July 14, 2021 12:25 AM On 13.07.2021 08:58, Cui,

> >> Lili wrote:

> >>

> >> I don't think you've applied my comments regarding the

> >> cpu_flag_init[] additions

> >

> > Sorry, I forgot this one, I changed it in this patch.

> >

> > { "CPU_AVX512_FP16_FLAGS",

> >     "CPU_AVX512BW_FLAGS|CpuAVX512_FP16" }, ...

> >   { "CPU_ANY_AVX512BW_FLAGS",

> >     "CpuAVX512BW|CpuAVX512_FP16" },

> > ...

> 

> In the latter case you will want to use CPU_ANY_AVX512_FP16_FLAGS, not

> CpuAVX512_FP16. This not originally having got done properly for

> CPU_ANY_AVX512F_FLAGS is why you now also need to change that one.

> There preferably you'd replace CpuAVX512BW by

> CPU_ANY_AVX512BW_FLAGS, instead of explicitly adding CpuAVX512_FP16.

> 

Done.
 
> Ah yes, albeit there's some confusion about the numbering of the attached

> patches. I wanted to ask you anyway to send new versions of the patches as

> new mail threads, instead of in reply to the prior discussion. In fact, while the

> patch with the new tests is probably indeed too large to send inline (but as

> said before, it's not really reviewable anyway), I would much appreciate if

> you could send the first patch inline instead of as attachment. This makes

> commenting a lot easier.

> 

Sorry, I thought you would feel too big to put source patch inline, anyway I will let them inline next.

> As a purely cosmetic request - may I ask that you flip the order of the two

> each vcmp<avx_frel>ph and vcmpph entries, such that - like everywhere else

> - the rounding/SAE forms come second? As mentioned previously, keeping

> things as consistent as possible helps readers as well as making future

> changes or spotting possible problems.

> 

Done.

> As to the testsuite additions, I'm afraid you've misunderstood what H.J. and I

> are requesting for xmmword.{s,d}: You don't need to add _all_ new insns

> there - that would be making an unreasonably large test case going forward.

> Only insns with irregular operand combinations permitting %xmmN but not

> "xmmword ptr ..." for a given operand position need putting there. This is to

> prove (now and going

> forward) that despite allowing %xmmN the (wrong size) memory form gets

> rejected. Regular scalar insns don't need putting there, and even less so

> VDIVPH; if at all, a single random example (not from the FP16 set, but from

> more basic ones) would be sufficient. So what I was after with my request

> was that all the non-scalar VCVT* forms would be represented there, when

> their vector element sizes vary.

> 

Done

> I'm sorry if the earlier request wasn't explicit enough.

> 

No, that was my fault. Thanks again for your patience.

> The other remark on the testsuite addition patch is that _if_ you maintain a

> ChangeLog entry (which you aren't required to anymore), then you will want

> to keep it up-to-date with patch contents.

> 

I updated the ChangeLog of PATCH 2/2.

> I'll try to get to look at the assembler and disassembler parts in more detail

> later this week, as the following week I'll be OoO.


Thank you for spending so much time helping review this big patch, and hope you have a good vacation.

Lili.

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 168f7d5ba75..ec9f18879d6 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -380,7 +380,7 @@  struct _i386_insn
        expresses the broadcast factor.  */
     struct Broadcast_Operation
     {
-      /* Type of broadcast: {1to2}, {1to4}, {1to8}, or {1to16}.  */
+      /* Type of broadcast: {1to2}, {1to4}, {1to8}, {1to16} or {1to32}.  */
       unsigned int type;
 
       /* Index of broadcasted operand.  */
@@ -1237,6 +1237,8 @@  static const arch_entry cpu_arch[] =
     CPU_UINTR_FLAGS, 0 },
   { STRING_COMMA_LEN (".hreset"), PROCESSOR_UNKNOWN,
     CPU_HRESET_FLAGS, 0 },
+  { STRING_COMMA_LEN (".avx512_fp16"), PROCESSOR_UNKNOWN,
+    CPU_AVX512_FP16_FLAGS, 0 },
 };
 
 static const noarch_entry cpu_noarch[] =
@@ -1292,6 +1294,7 @@  static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("nowidekl"), CPU_ANY_WIDEKL_FLAGS },
   { STRING_COMMA_LEN ("nouintr"), CPU_ANY_UINTR_FLAGS },
   { STRING_COMMA_LEN ("nohreset"), CPU_ANY_HRESET_FLAGS },
+  { STRING_COMMA_LEN ("noavx512_fp16"), CPU_ANY_AVX512_FP16_FLAGS },
 };
 
 #ifdef I386COFF
@@ -3263,7 +3266,7 @@  pte (insn_template *t)
 {
   static const unsigned char opc_pfx[] = { 0, 0x66, 0xf3, 0xf2 };
   static const char *const opc_spc[] = {
-    NULL, "0f", "0f38", "0f3a", NULL, NULL, NULL, NULL,
+    NULL, "0f", "0f38", "0f3a", NULL, "evexmap5", "evexmap6", NULL,
     "XOP08", "XOP09", "XOP0A",
   };
   unsigned int j;
@@ -3858,7 +3861,7 @@  build_evex_prefix (void)
   /* The high 3 bits of the second EVEX byte are 1's compliment of RXB
      bits from REX.  */
   gas_assert (i.tm.opcode_modifier.opcodespace >= SPACE_0F);
-  gas_assert (i.tm.opcode_modifier.opcodespace <= SPACE_0F3A);
+  gas_assert (i.tm.opcode_modifier.opcodespace <= SPACE_EVEXMAP6);
   i.vex.bytes[1] = (~i.rex & 0x7) << 5 | i.tm.opcode_modifier.opcodespace;
 
   /* The fifth bit of the second EVEX byte is 1's compliment of the
@@ -10514,6 +10517,12 @@  check_VecOperations (char *op_string)
 		  bcst_type = 16;
 		  op_string++;
 		}
+	      else if (*op_string == '3'
+		       && *(op_string+1) == '2')
+		{
+		  bcst_type = 32;
+		  op_string++;
+		}
 	      else
 		{
 		  as_bad (_("Unsupported broadcast: `%s'"), saved);
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index c987dc03782..9058ad444b0 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -214,6 +214,7 @@  accept various extension mnemonics.  For example,
 @code{tdx},
 @code{avx512_bf16},
 @code{avx_vnni},
+@code{avx512_fp16},
 @code{noavx512f},
 @code{noavx512cd},
 @code{noavx512er},
@@ -233,6 +234,7 @@  accept various extension mnemonics.  For example,
 @code{notdx},
 @code{noavx512_bf16},
 @code{noavx_vnni},
+@code{noavx512_fp16},
 @code{noenqcmd},
 @code{noserialize},
 @code{notsxldtrk},
@@ -1519,7 +1521,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.avx512vbmi} @tab @samp{.avx512_4fmaps} @tab @samp{.avx512_4vnniw}
 @item @samp{.avx512_vpopcntdq} @tab @samp{.avx512_vbmi2} @tab @samp{.avx512_vnni}
 @item @samp{.avx512_bitalg} @tab @samp{.avx512_bf16} @tab @samp{.avx512_vp2intersect}
-@item @samp{.tdx} @tab @samp{.avx_vnni}
+@item @samp{.tdx} @tab @samp{.avx_vnni}  @tab @samp{.avx512_fp16}
 @item @samp{.clwb} @tab @samp{.rdpid} @tab @samp{.ptwrite} @tab @item @samp{.ibt}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
diff --git a/opcodes/i386-dis-evex-mod.h b/opcodes/i386-dis-evex-mod.h
index a1cd69a1c9e..57a5db719e6 100644
--- a/opcodes/i386-dis-evex-mod.h
+++ b/opcodes/i386-dis-evex-mod.h
@@ -87,3 +87,13 @@ 
     /* MOD_EVEX_0F38C7 */
     { EVEX_LEN_TABLE (EVEX_LEN_0F38C7_M_0) },
   },
+  {
+    /* MOD_EVEX_MAP510_PREFIX_1 */
+    { VEX_W_TABLE (EVEX_W_MAP510_P_1_M_0) },
+    { VEX_W_TABLE (EVEX_W_MAP510_P_1_M_1) },
+  },
+  {
+    /* MOD_EVEX_MAP511_PREFIX_1 */
+    { VEX_W_TABLE (EVEX_W_MAP511_P_1_M_0) },
+    { VEX_W_TABLE (EVEX_W_MAP511_P_1_M_1) },
+  },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 50a11f417ad..e204d4d4c9b 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -375,3 +375,215 @@ 
     { "vfmsub213s%XW",	{ XMScalar, VexScalar, EXVexWdqScalar, EXxEVexR }, 0 },
     { "v4fnmaddss",	{ XMScalar, VexScalar, Mxmm }, 0 },
   },
+  /* PREFIX_EVEX_0F3A08 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A08_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F3A08_P_2) },
+  },
+  /* PREFIX_EVEX_0F3A0A */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A0A_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F3A0A_P_2) },
+  },
+  /* PREFIX_EVEX_0F3A26 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A26_P_0) },
+    { Bad_Opcode },
+    { "vgetmantp%XW",	{ XM, EXx, EXxEVexS, Ib }, 0 },
+  },
+  /* PREFIX_EVEX_0F3A27 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A27_P_0) },
+    { Bad_Opcode },
+    { "vgetmants%XW",	{ XMScalar, VexScalar, EXVexWdqScalar, EXxEVexS, Ib }, 0 },
+  },
+  /* PREFIX_EVEX_0F3A56 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A56_P_0) },
+    { Bad_Opcode },
+    { "vreducep%XW",	{ XM, EXx, EXxEVexS, Ib }, 0 },
+  },
+  /* PREFIX_EVEX_0F3A57 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A57_P_0) },
+    { Bad_Opcode },
+    { "vreduces%XW",	{ XMScalar, VexScalar, EXVexWdqScalar, EXxEVexS, Ib }, 0 },
+  },
+  /* PREFIX_EVEX_0F3A66 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A66_P_0) },
+    { Bad_Opcode },
+    { "vfpclassp%XW%XZ",	{ XMask, EXx, Ib }, 0 },
+  },
+  /* PREFIX_EVEX_0F3A67 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3A67_P_0) },
+    { Bad_Opcode },
+    { "vfpclasss%XW",	{ XMask, EXVexWdqScalar, Ib }, 0 },
+  },
+  /* PREFIX_EVEX_0F3AC2 */
+  {
+    { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },
+    { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },
+  },
+  /* PREFIX_EVEX_MAP510 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP510_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP511 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_EVEX_MAP511_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP51D */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP51D_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP51D_P_2) },
+  },
+  /* PREFIX_EVEX_MAP52A */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP52A_P_1) },
+  },
+  /* PREFIX_EVEX_MAP52C */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP52C_P_1) },
+  },
+  /* PREFIX_EVEX_MAP52D */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP52D_P_1) },
+  },
+  /* PREFIX_EVEX_MAP52E */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP52E_P_0) },
+  },
+  /* PREFIX_EVEX_MAP52F */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP52F_P_0) },
+  },
+  /* PREFIX_EVEX_MAP551 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP551_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP551_P_1) },
+  },
+  /* PREFIX_EVEX_MAP558 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP558_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP558_P_1) },
+  },
+  /* PREFIX_EVEX_MAP559 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP559_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP559_P_1) },
+  },
+  /* PREFIX_EVEX_MAP55A */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP55A_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP55A_P_1) },
+    { VEX_W_TABLE (EVEX_W_MAP55A_P_2) },
+    { VEX_W_TABLE (EVEX_W_MAP55A_P_3) },
+  },
+  /* PREFIX_EVEX_MAP55B */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP55B_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP55B_P_1) },
+    { VEX_W_TABLE (EVEX_W_MAP55B_P_2) },
+  },
+  /* PREFIX_EVEX_MAP55C */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP55C_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP55C_P_1) },
+  },
+  /* PREFIX_EVEX_MAP55D */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP55D_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP55D_P_1) },
+  },
+  /* PREFIX_EVEX_MAP55E */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP55E_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP55E_P_1) },
+  },
+  /* PREFIX_EVEX_MAP55F */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP55F_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP55F_P_1) },
+  },
+  /* PREFIX_EVEX_MAP578 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP578_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP578_P_1) },
+    { VEX_W_TABLE (EVEX_W_MAP578_P_2) },
+  },
+  /* PREFIX_EVEX_MAP579 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP579_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP579_P_1) },
+    { VEX_W_TABLE (EVEX_W_MAP579_P_2) },
+  },
+  /* PREFIX_EVEX_MAP57A */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP57A_P_2) },
+    { VEX_W_TABLE (EVEX_W_MAP57A_P_3) },
+  },
+  /* PREFIX_EVEX_MAP57B */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP57B_P_1) },
+    { VEX_W_TABLE (EVEX_W_MAP57B_P_2) },
+  },
+  /* PREFIX_EVEX_MAP57C */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP57C_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP57C_P_2) },
+  },
+  /* PREFIX_EVEX_MAP57D */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP57D_P_0) },
+    { VEX_W_TABLE (EVEX_W_MAP57D_P_1) },
+    { VEX_W_TABLE (EVEX_W_MAP57D_P_2) },
+    { VEX_W_TABLE (EVEX_W_MAP57D_P_3) },
+  },
+  /* PREFIX_EVEX_MAP613 */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP613_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP613_P_2) },
+  },
+  /* PREFIX_EVEX_MAP656 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP656_P_1) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP656_P_3) },
+  },
+  /* PREFIX_EVEX_MAP657 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP657_P_1) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP657_P_3) },
+  },
+  /* PREFIX_EVEX_MAP6D6 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP6D6_P_1) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP6D6_P_3) },
+  },
+  /* PREFIX_EVEX_MAP6D7 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP6D7_P_1) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP6D7_P_3) },
+  },
diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h
index 637ab846562..0d3f517e2c4 100644
--- a/opcodes/i386-dis-evex-w.h
+++ b/opcodes/i386-dis-evex-w.h
@@ -560,18 +560,26 @@ 
     { Bad_Opcode },
     { "vpermilpd",	{ XM, EXx, Ib }, PREFIX_DATA },
   },
-  /* EVEX_W_0F3A08 */
+  /* EVEX_W_0F3A08_P_0 */
   {
-    { "vrndscaleps",	{ XM, EXx, EXxEVexS, Ib }, PREFIX_DATA },
+    { "vrndscaleph",	{ XM, EXxh, EXxEVexS, Ib }, 0 },
+  },
+  /* EVEX_W_0F3A08_P_2 */
+  {
+    { "vrndscaleps",	{ XM, EXx, EXxEVexS, Ib }, 0 },
   },
   /* EVEX_W_0F3A09 */
   {
     { Bad_Opcode },
     { "vrndscalepd",	{ XM, EXx, EXxEVexS, Ib }, PREFIX_DATA },
   },
-  /* EVEX_W_0F3A0A */
+  /* EVEX_W_0F3A0A_P_0 */
+  {
+    { "vrndscalesh",	{ XMScalar, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
+  },
+  /* EVEX_W_0F3A0A_P_2 */
   {
-    { "vrndscaless",	{ XMScalar, VexScalar, EXxmm_md, EXxEVexS, Ib }, PREFIX_DATA },
+    { "vrndscaless",	{ XMScalar, VexScalar, EXxmm_md, EXxEVexS, Ib }, 0 },
   },
   /* EVEX_W_0F3A0B */
   {
@@ -607,6 +615,14 @@ 
     { "vshuff32x4",	{ XM, Vex, EXx, Ib }, PREFIX_DATA },
     { "vshuff64x2",	{ XM, Vex, EXx, Ib }, PREFIX_DATA },
   },
+  /* EVEX_W_0F3A26_P_0 */
+  {
+    { "vgetmantph",	{ XM, EXxh, EXxEVexS, Ib }, 0 },
+  },
+  /* EVEX_W_0F3A27_P_0 */
+  {
+    { "vgetmantsh",	{ XMScalar, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
+  },
   /* EVEX_W_0F3A38_L_n */
   {
     { "vinserti32x4",	{ XM, Vex, EXxmm, Ib }, PREFIX_DATA },
@@ -636,6 +652,22 @@ 
     { "vshufi32x4",	{ XM, Vex, EXx, Ib }, PREFIX_DATA },
     { "vshufi64x2",	{ XM, Vex, EXx, Ib }, PREFIX_DATA },
   },
+  /* EVEX_W_0F3A56_P_0 */
+  {
+    { "vreduceph",	{ XM, EXxh, EXxEVexS, Ib }, 0 },
+  },
+  /* EVEX_W_0F3A57_P_0 */
+  {
+    { "vreducesh",	{ XMScalar, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
+  },
+  /* EVEX_W_0F3A66_P_0 */
+  {
+    { "vfpclassph%XZ",	{ XMask, EXxh, Ib }, 0 },
+  },
+  /* EVEX_W_0F3A67_P_0 */
+  {
+    { "vfpclasssh",	{ XMask, EXxmm_mw, Ib }, 0 },
+  },
   /* EVEX_W_0F3A70 */
   {
     { Bad_Opcode },
@@ -646,3 +678,405 @@ 
     { Bad_Opcode },
     { "vpshrdw",   { XM, Vex, EXx, Ib }, 0 },
   },
+  /* EVEX_W_0F3AC2_P_0 */
+  {
+    { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
+  },
+  /* EVEX_W_0F3AC2_P_1 */
+  {
+    { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
+  },
+  /* EVEX_W_MAP510_P_1_M_0 */
+  {
+    { "vmovsh",	{ XMScalar, EXwScalarS }, 0 },
+  },
+  /* EVEX_W_MAP510_P_1_M_1 */
+  {
+    { "vmovsh",	{ XMScalar, VexScalar, EXxmm_md }, 0 },
+  },
+  /* EVEX_W_MAP511_P_1_M_0 */
+  {
+    { "vmovsh",	{ EXwScalarS, XMScalar }, 0 },
+  },
+  /* EVEX_W_MAP511_P_1_M_1 */
+  {
+    { "vmovsh",	{ EXxS, Vex, XMScalar }, 0 },
+  },
+  /* EVEX_W_MAP51D_P_0 */
+  {
+    { "vcvtss2sh",	{ XMM, VexScalar, EXxmm_md, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP51D_P_2 */
+  {
+    { "vcvtps2phx%XY",	{ XMxmmq, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP52A_P_1 */
+  {
+    { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, Ed, EXxEVexR }, 0 },
+    { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, Eq, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP52C_P_1 */
+  {
+    { "vcvttsh2si",	{ Gdq, EXxmm_mw, EXxEVexS }, 0 },
+    { "vcvttsh2si",	{ Gdq, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP52D_P_1 */
+  {
+    { "vcvtsh2si",	{ Gdq, EXxmm_mw, EXxEVexR }, 0 },
+    { "vcvtsh2si",	{ Gdq, EXxmm_mw, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP52E_P_0 */
+  {
+    { "vucomish",	{ XMScalar, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP52F_P_0 */
+  {
+    { "vcomish",	{ XMScalar, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP551_P_0 */
+  {
+    { "vsqrtph",	{ XM, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP551_P_1 */
+  {
+    { "vsqrtsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP558_P_0 */
+  {
+    { "vaddph",	{ XM, Vex, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP558_P_1 */
+  {
+    { "vaddsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP559_P_0 */
+  {
+    { "vmulph",	{ XM, Vex, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP559_P_1 */
+  {
+    { "vmulsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55A_P_0 */
+  {
+    { "vcvtph2pd",	{ XM, EXxmmqdh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP55A_P_1 */
+  {
+    { "vcvtsh2sd",	{ XMM, VexScalar, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP55A_P_2 */
+  {
+    { Bad_Opcode },
+    { "vcvtpd2ph%XZ",	{ XMM, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55A_P_3 */
+  {
+    { Bad_Opcode },
+    { "vcvtsd2sh",	{ XMM, VexScalar, EXxmm_mq, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55B_P_0 */
+  {
+    { "vcvtdq2ph%XY",	{ XMxmmq, EXx, EXxEVexR }, 0 },
+    { "vcvtqq2ph%XZ",	{ XMM, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55B_P_1 */
+  {
+    { "vcvttph2dq",	{ XM, EXxmmqh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP55B_P_2 */
+  {
+    { "vcvtph2dq",	{ XM, EXxmmqh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55C_P_0 */
+  {
+    { "vsubph",	{ XM, Vex, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55C_P_1 */
+  {
+    { "vsubsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55D_P_0 */
+  {
+    { "vminph",	{ XM, Vex, EXxh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP55D_P_1 */
+  {
+    { "vminsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP55E_P_0 */
+  {
+    { "vdivph",	{ XM, Vex, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55E_P_1 */
+  {
+    { "vdivsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP55F_P_0 */
+  {
+    { "vmaxph",	{ XM, Vex, EXxh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP55F_P_1 */
+  {
+    { "vmaxsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP578_P_0 */
+  {
+    { "vcvttph2udq",	{ XM, EXxmmqh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP578_P_1 */
+  {
+    { "vcvttsh2usi",	{ Gdq, EXxmm_mw, EXxEVexS }, 0 },
+    { "vcvttsh2usi",	{ Gdq, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP578_P_2 */
+  {
+    { "vcvttph2uqq",	{ XM, EXxmmqdh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP579_P_0 */
+  {
+    { "vcvtph2udq",	{ XM, EXxmmqh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP579_P_1 */
+  {
+    { "vcvtsh2usi",	{ Gdq, EXxmm_mw, EXxEVexR }, 0 },
+    { "vcvtsh2usi",	{ Gdq, EXxmm_mw, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP579_P_2 */
+  {
+    { "vcvtph2uqq",	{ XM, EXxmmqdh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP57A_P_2 */
+  {
+    { "vcvttph2qq",	{ XM, EXxmmqdh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP57A_P_3 */
+  {
+    { "vcvtudq2ph%XY",	{ XMxmmq, EXx, EXxEVexR }, 0 },
+    { "vcvtuqq2ph%XZ",	{ XMM, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP57B_P_1 */
+  {
+    { "vcvtusi2sh{%LQ|}",	{ XMScalar, VexScalar, Ed, EXxEVexR }, 0 },
+    { "vcvtusi2sh{%LQ|}",	{ XMScalar, VexScalar, Eq, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP57B_P_2 */
+  {
+    { "vcvtph2qq",	{ XM, EXxmmqdh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP57C_P_0 */
+  {
+    { "vcvttph2uw",	{ XM, EXxh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP57C_P_2 */
+  {
+    { "vcvttph2w",	{ XM, EXxh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP57D_P_0 */
+  {
+    { "vcvtph2uw",	{ XM, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP57D_P_1 */
+  {
+    { "vcvtw2ph",	{ XM, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP57D_P_2 */
+  {
+    { "vcvtph2w",	{ XM, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP57D_P_3 */
+  {
+    { "vcvtuw2ph",	{ XM, EXxh, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP613_P_0 */
+  {
+    { "vcvtsh2ss",	{ XMM, VexScalar, EXxmm_mw, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP613_P_2 */
+  {
+    { "vcvtph2psx",	{ XM, EXxmmqh, EXxEVexS }, 0 },
+  },
+  /* EVEX_W_MAP62C */
+  {
+    { "vscalefph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP62D */
+  {
+    { "vscalefsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP642 */
+  {
+    { "vgetexpph",	{ XM, EXxh, EXxEVexS }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP643 */
+  {
+    { "vgetexpsh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexS }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP64C */
+  {
+    { "vrcpph",	{ XM, EXxh }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP64D */
+  {
+    { "vrcpsh",	{ XMM, VexScalar, EXxmm_mw }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP64E */
+  {
+    { "vrsqrtph",	{ XM, EXxh }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP64F */
+  {
+    { "vrsqrtsh",	{ XMM, VexScalar, EXxmm_mw }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP656_P_1 */
+  {
+    { "vfmaddcph",	{ XM, Vex, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP656_P_3 */
+  {
+    { "vfcmaddcph",	{ XM, Vex, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP657_P_1 */
+  {
+    { "vfmaddcsh",	{ XMM, VexScalar, EXxmm_md, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP657_P_3 */
+  {
+    { "vfcmaddcsh",	{ XMM, VexScalar, EXxmm_md, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP696 */
+  {
+    { "vfmaddsub132ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP697 */
+  {
+    { "vfmsubadd132ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP698 */
+  {
+    { "vfmadd132ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP699 */
+  {
+    { "vfmadd132sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP69A */
+  {
+    { "vfmsub132ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP69B */
+  {
+    { "vfmsub132sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP69C */
+  {
+    { "vfnmadd132ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP69D */
+  {
+    { "vfnmadd132sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP69E */
+  {
+    { "vfnmsub132ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP69F */
+  {
+    { "vfnmsub132sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6A6 */
+  {
+    { "vfmaddsub213ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6A7 */
+  {
+    { "vfmsubadd213ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6A8 */
+  {
+    { "vfmadd213ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6A9 */
+  {
+    { "vfmadd213sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6AA */
+  {
+    { "vfmsub213ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6AB */
+  {
+    { "vfmsub213sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6AC */
+  {
+    { "vfnmadd213ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6AD */
+  {
+    { "vfnmadd213sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6AE */
+  {
+    { "vfnmsub213ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6AF */
+  {
+    { "vfnmsub213sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6B6 */
+  {
+    { "vfmaddsub231ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6B7 */
+  {
+    { "vfmsubadd231ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6B8 */
+  {
+    { "vfmadd231ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6B9 */
+  {
+    { "vfmadd231sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6BA */
+  {
+    { "vfmsub231ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6BB */
+  {
+    { "vfmsub231sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6BC */
+  {
+    { "vfnmadd231ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6BD */
+  {
+    { "vfnmadd231sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6BE */
+  {
+    { "vfnmsub231ph",	{ XM, Vex, EXxh, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6BF */
+  {
+    { "vfnmsub231sh",	{ XMM, VexScalar, EXxmm_mw, EXxEVexR }, PREFIX_DATA },
+  },
+  /* EVEX_W_MAP6D6_P_1 */
+  {
+    { "vfmulcph",	{ XM, Vex, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP6D6_P_3 */
+  {
+    { "vfcmulcph",	{ XM, Vex, EXx, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP6D7_P_1 */
+  {
+    { "vfmulcsh",	{ XMM, VexScalar, EXxmm_md, EXxEVexR }, 0 },
+  },
+  /* EVEX_W_MAP6D7_P_3 */
+  {
+    { "vfcmulcsh",	{ XMM, VexScalar, EXxmm_md, EXxEVexR }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 151f61d95a4..d55bc78e479 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -593,9 +593,9 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 08 */
-    { VEX_W_TABLE (EVEX_W_0F3A08) },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A08) },
     { VEX_W_TABLE (EVEX_W_0F3A09) },
-    { VEX_W_TABLE (EVEX_W_0F3A0A) },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A0A) },
     { VEX_W_TABLE (EVEX_W_0F3A0B) },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -626,8 +626,8 @@  static const struct dis386 evex_table[][256] = {
     { EVEX_LEN_TABLE (EVEX_LEN_0F3A23) },
     { Bad_Opcode },
     { "vpternlog%DQ",	{ XM, Vex, EXx, Ib }, PREFIX_DATA },
-    { "vgetmantp%XW",	{ XM, EXx, EXxEVexS, Ib }, PREFIX_DATA },
-    { "vgetmants%XW",	{ XMScalar, VexScalar, EXVexWdqScalar, EXxEVexS, Ib }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A26) },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A27) },
     /* 28 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -680,8 +680,8 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { "vfixupimmp%XW",	{ XM, Vex, EXx, EXxEVexS, Ib }, PREFIX_DATA },
     { "vfixupimms%XW",	{ XMScalar, VexScalar, EXVexWdqScalar, EXxEVexS, Ib }, PREFIX_DATA },
-    { "vreducep%XW",	{ XM, EXx, EXxEVexS, Ib }, PREFIX_DATA },
-    { "vreduces%XW",	{ XMScalar, VexScalar, EXVexWdqScalar, EXxEVexS, Ib }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A56) },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A57) },
     /* 58 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -698,8 +698,8 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { "vfpclassp%XW%XZ",	{ XMask, EXx, Ib }, PREFIX_DATA },
-    { "vfpclasss%XW",	{ XMask, EXVexWdqScalar, Ib }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A66) },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A67) },
     /* 68 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -802,7 +802,7 @@  static const struct dis386 evex_table[][256] = {
     /* C0 */
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -872,4 +872,586 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
   },
+  /* EVEX_MAP5 */
+  {
+    /* 00 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 08 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 10 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP510) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP511) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 18 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP51D) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 20 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 28 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP52A) },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP52C) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP52D) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP52E) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP52F) },
+    /* 30 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 38 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 40 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 48 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 50 */
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP551) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 58 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP558) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP559) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP55A) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP55B) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP55C) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP55D) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP55E) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP55F) },
+    /* 60 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 68 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "vmovw", { XMScalar, Edw }, PREFIX_DATA },
+    { Bad_Opcode },
+    /* 70 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 78 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP578) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP579) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP57A) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP57B) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP57C) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP57D) },
+    { "vmovw", { Edw, XMScalar }, PREFIX_DATA },
+    { Bad_Opcode },
+    /* 80 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 88 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 90 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 98 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
+  /* EVEX_MAP6 */
+  {
+    /* 00 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 08 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 10 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP613) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 18 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 20 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 28 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP62C) },
+    { VEX_W_TABLE (EVEX_W_MAP62D) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 30 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 38 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 40 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP642) },
+    { VEX_W_TABLE (EVEX_W_MAP643) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 48 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP64C) },
+    { VEX_W_TABLE (EVEX_W_MAP64D) },
+    { VEX_W_TABLE (EVEX_W_MAP64E) },
+    { VEX_W_TABLE (EVEX_W_MAP64F) },
+    /* 50 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP656) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP657) },
+    /* 58 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 60 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 68 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 70 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 78 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 80 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 88 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 90 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP696) },
+    { VEX_W_TABLE (EVEX_W_MAP697) },
+    /* 98 */
+    { VEX_W_TABLE (EVEX_W_MAP698) },
+    { VEX_W_TABLE (EVEX_W_MAP699) },
+    { VEX_W_TABLE (EVEX_W_MAP69A) },
+    { VEX_W_TABLE (EVEX_W_MAP69B) },
+    { VEX_W_TABLE (EVEX_W_MAP69C) },
+    { VEX_W_TABLE (EVEX_W_MAP69D) },
+    { VEX_W_TABLE (EVEX_W_MAP69E) },
+    { VEX_W_TABLE (EVEX_W_MAP69F) },
+    /* A0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP6A6) },
+    { VEX_W_TABLE (EVEX_W_MAP6A7) },
+    /* A8 */
+    { VEX_W_TABLE (EVEX_W_MAP6A8) },
+    { VEX_W_TABLE (EVEX_W_MAP6A9) },
+    { VEX_W_TABLE (EVEX_W_MAP6AA) },
+    { VEX_W_TABLE (EVEX_W_MAP6AB) },
+    { VEX_W_TABLE (EVEX_W_MAP6AC) },
+    { VEX_W_TABLE (EVEX_W_MAP6AD) },
+    { VEX_W_TABLE (EVEX_W_MAP6AE) },
+    { VEX_W_TABLE (EVEX_W_MAP6AF) },
+    /* B0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP6B6) },
+    { VEX_W_TABLE (EVEX_W_MAP6B7) },
+    /* B8 */
+    { VEX_W_TABLE (EVEX_W_MAP6B8) },
+    { VEX_W_TABLE (EVEX_W_MAP6B9) },
+    { VEX_W_TABLE (EVEX_W_MAP6BA) },
+    { VEX_W_TABLE (EVEX_W_MAP6BB) },
+    { VEX_W_TABLE (EVEX_W_MAP6BC) },
+    { VEX_W_TABLE (EVEX_W_MAP6BD) },
+    { VEX_W_TABLE (EVEX_W_MAP6BE) },
+    { VEX_W_TABLE (EVEX_W_MAP6BF) },
+    /* C0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP6D6) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP6D7) },
+    /* D8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
 };
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 21e40850544..0f4f953ecb1 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -358,14 +358,17 @@  fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXw { OP_EX, w_mode }
 #define EXd { OP_EX, d_mode }
 #define EXdS { OP_EX, d_swap_mode }
+#define EXwScalarS { OP_EX, w_scalar_swap_mode }
 #define EXq { OP_EX, q_mode }
 #define EXqS { OP_EX, q_swap_mode }
 #define EXx { OP_EX, x_mode }
+#define EXxh { OP_EX, xh_mode }
 #define EXxS { OP_EX, x_swap_mode }
 #define EXxmm { OP_EX, xmm_mode }
 #define EXymm { OP_EX, ymm_mode }
 #define EXtmm { OP_EX, tmm_mode }
 #define EXxmmq { OP_EX, xmmq_mode }
+#define EXxmmqh { OP_EX, xmmqh_mode }
 #define EXEvexHalfBcstXmmq { OP_EX, evex_half_bcst_xmmq_mode }
 #define EXxmm_mb { OP_EX, xmm_mb_mode }
 #define EXxmm_mw { OP_EX, xmm_mw_mode }
@@ -373,6 +376,7 @@  fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXxmm_mq { OP_EX, xmm_mq_mode }
 #define EXxmmdw { OP_EX, xmmdw_mode }
 #define EXxmmqd { OP_EX, xmmqd_mode }
+#define EXxmmqdh { OP_EX, xmmqdh_mode }
 #define EXymmq { OP_EX, ymmq_mode }
 #define EXVexWdqScalar { OP_EX, vex_scalar_w_dq_mode }
 #define EXEvexXGscat { OP_EX, evex_x_gscat_mode }
@@ -484,6 +488,9 @@  enum
   /* Similar to x_mode, but with operands swapped and disabled broadcast
      in EVEX.  */
   x_swap_mode,
+  /* 16-byte XMM, 32-byte YMM or 64-byte ZMM operand.  In EVEX with
+     broadcast of 16bit enabled.  */
+  xh_mode,
   /* 16-byte XMM operand */
   xmm_mode,
   /* XMM, XMM or YMM register operand, or quad word, xmmword or ymmword
@@ -492,6 +499,9 @@  enum
   xmmq_mode,
   /* Same as xmmq_mode, but broadcast is allowed.  */
   evex_half_bcst_xmmq_mode,
+  /* XMM, XMM or YMM register operand, or quad word, xmmword or ymmword
+     memory operand (depending on vector length).  16bit broadcast.  */
+  xmmqh_mode,
   /* XMM register or byte memory operand */
   xmm_mb_mode,
   /* XMM register or word memory operand */
@@ -504,6 +514,9 @@  enum
   xmmdw_mode,
   /* 16-byte XMM, double word, quad word operand or xmm word operand.  */
   xmmqd_mode,
+  /* 16-byte XMM, double word, quad word operand or xmm word operand.
+     16bit broadcast.  */
+  xmmqdh_mode,
   /* 32-byte YMM operand */
   ymm_mode,
   /* quad word, ymmword or zmmword memory operand.  */
@@ -562,6 +575,10 @@  enum
 
   /* scalar, ignore vector length.  */
   scalar_mode,
+  /* like b_mode, ignore vector length.  */
+  w_scalar_swap_mode,
+  /* like d_mode, ignore vector length.  */
+  d_scalar_mode,
   /* like vex_mode, ignore vector length.  */
   vex_scalar_mode,
   /* Operand size depends on the VEX.W bit, ignore vector length.  */
@@ -865,7 +882,9 @@  enum
   MOD_EVEX_0F387B_W_0,
   MOD_EVEX_0F387C,
   MOD_EVEX_0F38C6,
-  MOD_EVEX_0F38C7
+  MOD_EVEX_0F38C7,
+  MOD_EVEX_MAP510_PREFIX_1,
+  MOD_EVEX_MAP511_PREFIX_1
 };
 
 enum
@@ -1102,6 +1121,45 @@  enum
   PREFIX_EVEX_0F389B,
   PREFIX_EVEX_0F38AA,
   PREFIX_EVEX_0F38AB,
+  PREFIX_EVEX_0F3A08,
+  PREFIX_EVEX_0F3A0A,
+  PREFIX_EVEX_0F3A26,
+  PREFIX_EVEX_0F3A27,
+  PREFIX_EVEX_0F3A56,
+  PREFIX_EVEX_0F3A57,
+  PREFIX_EVEX_0F3A66,
+  PREFIX_EVEX_0F3A67,
+  PREFIX_EVEX_0F3AC2,
+
+  PREFIX_EVEX_MAP510,
+  PREFIX_EVEX_MAP511,
+  PREFIX_EVEX_MAP51D,
+  PREFIX_EVEX_MAP52A,
+  PREFIX_EVEX_MAP52C,
+  PREFIX_EVEX_MAP52D,
+  PREFIX_EVEX_MAP52E,
+  PREFIX_EVEX_MAP52F,
+  PREFIX_EVEX_MAP551,
+  PREFIX_EVEX_MAP558,
+  PREFIX_EVEX_MAP559,
+  PREFIX_EVEX_MAP55A,
+  PREFIX_EVEX_MAP55B,
+  PREFIX_EVEX_MAP55C,
+  PREFIX_EVEX_MAP55D,
+  PREFIX_EVEX_MAP55E,
+  PREFIX_EVEX_MAP55F,
+  PREFIX_EVEX_MAP578,
+  PREFIX_EVEX_MAP579,
+  PREFIX_EVEX_MAP57A,
+  PREFIX_EVEX_MAP57B,
+  PREFIX_EVEX_MAP57C,
+  PREFIX_EVEX_MAP57D,
+
+  PREFIX_EVEX_MAP613,
+  PREFIX_EVEX_MAP656,
+  PREFIX_EVEX_MAP657,
+  PREFIX_EVEX_MAP6D6,
+  PREFIX_EVEX_MAP6D7
 };
 
 enum
@@ -1183,7 +1241,9 @@  enum
 {
   EVEX_0F = 0,
   EVEX_0F38,
-  EVEX_0F3A
+  EVEX_0F3A,
+  EVEX_MAP5,
+  EVEX_MAP6
 };
 
 enum
@@ -1595,9 +1655,11 @@  enum
   EVEX_W_0F3883,
 
   EVEX_W_0F3A05,
-  EVEX_W_0F3A08,
+  EVEX_W_0F3A08_P_0,
+  EVEX_W_0F3A08_P_2,
   EVEX_W_0F3A09,
-  EVEX_W_0F3A0A,
+  EVEX_W_0F3A0A_P_0,
+  EVEX_W_0F3A0A_P_2,
   EVEX_W_0F3A0B,
   EVEX_W_0F3A18_L_n,
   EVEX_W_0F3A19_L_n,
@@ -1605,14 +1667,120 @@  enum
   EVEX_W_0F3A1B_L_2,
   EVEX_W_0F3A21,
   EVEX_W_0F3A23_L_n,
+  EVEX_W_0F3A26_P_0,
+  EVEX_W_0F3A27_P_0,
   EVEX_W_0F3A38_L_n,
   EVEX_W_0F3A39_L_n,
   EVEX_W_0F3A3A_L_2,
   EVEX_W_0F3A3B_L_2,
   EVEX_W_0F3A42,
   EVEX_W_0F3A43_L_n,
+  EVEX_W_0F3A56_P_0,
+  EVEX_W_0F3A57_P_0,
+  EVEX_W_0F3A66_P_0,
+  EVEX_W_0F3A67_P_0,
   EVEX_W_0F3A70,
   EVEX_W_0F3A72,
+  EVEX_W_0F3AC2_P_0,
+  EVEX_W_0F3AC2_P_1,
+
+  EVEX_W_MAP510_P_1_M_0,
+  EVEX_W_MAP510_P_1_M_1,
+  EVEX_W_MAP511_P_1_M_0,
+  EVEX_W_MAP511_P_1_M_1,
+  EVEX_W_MAP51D_P_0,
+  EVEX_W_MAP51D_P_2,
+  EVEX_W_MAP52A_P_1,
+  EVEX_W_MAP52C_P_1,
+  EVEX_W_MAP52D_P_1,
+  EVEX_W_MAP52E_P_0,
+  EVEX_W_MAP52F_P_0,
+  EVEX_W_MAP551_P_0,
+  EVEX_W_MAP551_P_1,
+  EVEX_W_MAP558_P_0,
+  EVEX_W_MAP558_P_1,
+  EVEX_W_MAP559_P_0,
+  EVEX_W_MAP559_P_1,
+  EVEX_W_MAP55A_P_0,
+  EVEX_W_MAP55A_P_1,
+  EVEX_W_MAP55A_P_2,
+  EVEX_W_MAP55A_P_3,
+  EVEX_W_MAP55B_P_0,
+  EVEX_W_MAP55B_P_1,
+  EVEX_W_MAP55B_P_2,
+  EVEX_W_MAP55C_P_0,
+  EVEX_W_MAP55C_P_1,
+  EVEX_W_MAP55D_P_0,
+  EVEX_W_MAP55D_P_1,
+  EVEX_W_MAP55E_P_0,
+  EVEX_W_MAP55E_P_1,
+  EVEX_W_MAP55F_P_0,
+  EVEX_W_MAP55F_P_1,
+  EVEX_W_MAP578_P_0,
+  EVEX_W_MAP578_P_1,
+  EVEX_W_MAP578_P_2,
+  EVEX_W_MAP579_P_0,
+  EVEX_W_MAP579_P_1,
+  EVEX_W_MAP579_P_2,
+  EVEX_W_MAP57A_P_2,
+  EVEX_W_MAP57A_P_3,
+  EVEX_W_MAP57B_P_1,
+  EVEX_W_MAP57B_P_2,
+  EVEX_W_MAP57C_P_0,
+  EVEX_W_MAP57C_P_2,
+  EVEX_W_MAP57D_P_0,
+  EVEX_W_MAP57D_P_1,
+  EVEX_W_MAP57D_P_2,
+  EVEX_W_MAP57D_P_3,
+
+  EVEX_W_MAP613_P_0,
+  EVEX_W_MAP613_P_2,
+  EVEX_W_MAP62C,
+  EVEX_W_MAP62D,
+  EVEX_W_MAP642,
+  EVEX_W_MAP643,
+  EVEX_W_MAP64C,
+  EVEX_W_MAP64D,
+  EVEX_W_MAP64E,
+  EVEX_W_MAP64F,
+  EVEX_W_MAP656_P_1,
+  EVEX_W_MAP656_P_3,
+  EVEX_W_MAP657_P_1,
+  EVEX_W_MAP657_P_3,
+  EVEX_W_MAP696,
+  EVEX_W_MAP697,
+  EVEX_W_MAP698,
+  EVEX_W_MAP699,
+  EVEX_W_MAP69A,
+  EVEX_W_MAP69B,
+  EVEX_W_MAP69C,
+  EVEX_W_MAP69D,
+  EVEX_W_MAP69E,
+  EVEX_W_MAP69F,
+  EVEX_W_MAP6A6,
+  EVEX_W_MAP6A7,
+  EVEX_W_MAP6A8,
+  EVEX_W_MAP6A9,
+  EVEX_W_MAP6AA,
+  EVEX_W_MAP6AB,
+  EVEX_W_MAP6AC,
+  EVEX_W_MAP6AD,
+  EVEX_W_MAP6AE,
+  EVEX_W_MAP6AF,
+  EVEX_W_MAP6B6,
+  EVEX_W_MAP6B7,
+  EVEX_W_MAP6B8,
+  EVEX_W_MAP6B9,
+  EVEX_W_MAP6BA,
+  EVEX_W_MAP6BB,
+  EVEX_W_MAP6BC,
+  EVEX_W_MAP6BD,
+  EVEX_W_MAP6BE,
+  EVEX_W_MAP6BF,
+  EVEX_W_MAP6D6_P_1,
+  EVEX_W_MAP6D6_P_3,
+  EVEX_W_MAP6D7_P_1,
+  EVEX_W_MAP6D7_P_3
 };
 
 typedef void (*op_rtn) (int bytemode, int sizeflag);
@@ -9277,6 +9445,12 @@  get_valid_dis386 (const struct dis386 *dp, disassemble_info *info)
 	case 0x3:
 	  vex_table_index = EVEX_0F3A;
 	  break;
+	case 0x5:
+	  vex_table_index = EVEX_MAP5;
+	  break;
+	case 0x6:
+	  vex_table_index = EVEX_MAP6;
+	  break;
 	}
 
       /* The second byte after 0x62.  */
@@ -10967,15 +11141,24 @@  print_displacement (char *buf, bfd_vma disp)
 static void
 intel_operand_size (int bytemode, int sizeflag)
 {
-  if (vex.b
-      && (bytemode == x_mode
-	  || bytemode == evex_half_bcst_xmmq_mode))
+  if (vex.b)
     {
-      if (vex.w)
-	oappend ("QWORD PTR ");
-      else
-	oappend ("DWORD PTR ");
-      return;
+      switch (bytemode)
+	{
+	case x_mode:
+	case evex_half_bcst_xmmq_mode:
+	  if (vex.w)
+	    oappend ("QWORD PTR ");
+	  else
+	    oappend ("DWORD PTR ");
+	  break;
+	case xh_mode:
+	case xmmqh_mode:
+	case xmmqdh_mode:
+	  oappend ("WORD PTR ");
+	  break;
+	}
+	return;
     }
   switch (bytemode)
     {
@@ -10986,6 +11169,7 @@  intel_operand_size (int bytemode, int sizeflag)
       oappend ("BYTE PTR ");
       break;
     case w_mode:
+    case w_scalar_swap_mode:
     case dw_mode:
     case dqw_mode:
       oappend ("WORD PTR ");
@@ -11068,6 +11252,7 @@  intel_operand_size (int bytemode, int sizeflag)
       oappend ("TBYTE PTR ");
       break;
     case x_mode:
+    case xh_mode:
     case x_swap_mode:
     case evex_x_gscat_mode:
     case evex_x_nobcst_mode:
@@ -11099,6 +11284,7 @@  intel_operand_size (int bytemode, int sizeflag)
       oappend ("YMMWORD PTR ");
       break;
     case xmmq_mode:
+    case xmmqh_mode:
     case evex_half_bcst_xmmq_mode:
       if (!need_vex)
 	abort ();
@@ -11198,6 +11384,7 @@  intel_operand_size (int bytemode, int sizeflag)
 	}
       break;
     case xmmqd_mode:
+    case xmmqdh_mode:
       if (!need_vex)
 	abort ();
 
@@ -11433,6 +11620,9 @@  OP_E_memory (int bytemode, int sizeflag)
       /* In EVEX, if operand doesn't allow broadcast, vex.b should be 0.  */
       if (vex.b
 	  && bytemode != x_mode
+	  && bytemode != xh_mode
+	  && bytemode != xmmqh_mode
+	  && bytemode != xmmqdh_mode
 	  && bytemode != evex_half_bcst_xmmq_mode)
 	{
 	  BadOp ();
@@ -11443,6 +11633,7 @@  OP_E_memory (int bytemode, int sizeflag)
 	case dqw_mode:
 	case dw_mode:
 	case xmm_mw_mode:
+	case w_scalar_swap_mode:
 	  shift = 1;
 	  break;
 	case dqb_mode:
@@ -11467,6 +11658,15 @@  OP_E_memory (int bytemode, int sizeflag)
 	case evex_x_gscat_mode:
 	  shift = vex.w ? 3 : 2;
 	  break;
+	case xh_mode:
+	case xmmqh_mode:
+	case xmmqdh_mode:
+	  if (vex.b)
+	    {
+	      shift = vex.w ? 2 : 1;
+	      break;
+	    }
+	  /* Fall through.  */
 	case x_mode:
 	case evex_half_bcst_xmmq_mode:
 	  if (vex.b)
@@ -11497,10 +11697,12 @@  OP_E_memory (int bytemode, int sizeflag)
 	    }
 	  /* Make necessary corrections to shift for modes that need it.  */
 	  if (bytemode == xmmq_mode
+	      || bytemode == xmmqh_mode
 	      || bytemode == evex_half_bcst_xmmq_mode
 	      || (bytemode == ymmq_mode && vex.length == 128))
 	    shift -= 1;
-	  else if (bytemode == xmmqd_mode)
+	  else if (bytemode == xmmqd_mode
+	           || bytemode == xmmqdh_mode)
 	    shift -= 2;
 	  else if (bytemode == xmmdw_mode)
 	    shift -= 3;
@@ -11881,9 +12083,36 @@  OP_E_memory (int bytemode, int sizeflag)
     }
   if (vex.b
       && (bytemode == x_mode
+	  || bytemode == xh_mode
+	  || bytemode == xmmqh_mode
+	  || bytemode == xmmqdh_mode
 	  || bytemode == evex_half_bcst_xmmq_mode))
     {
-      if (vex.w
+      if (bytemode == xh_mode)
+	{
+	  if (vex.w)
+	    {
+	    }
+	  else
+	    {
+	      switch (vex.length)
+		{
+		case 128:
+		  oappend ("{1to8}");
+		  break;
+		case 256:
+		  oappend ("{1to16}");
+		  break;
+		case 512:
+		  oappend ("{1to32}");
+		  break;
+		default:
+		  abort ();
+		}
+	    }
+	}
+      else if (vex.w
+	  || bytemode == xmmqdh_mode
 	  || bytemode == evex_half_bcst_xmmq_mode)
 	{
 	  switch (vex.length)
@@ -12793,6 +13022,7 @@  OP_EX (int bytemode, int sizeflag)
 
   if ((sizeflag & SUFFIX_ALWAYS)
       && (bytemode == x_swap_mode
+	  || bytemode == w_scalar_swap_mode
 	  || bytemode == d_swap_mode
 	  || bytemode == q_swap_mode))
     swap_operand ();
@@ -12807,8 +13037,12 @@  OP_EX (int bytemode, int sizeflag)
       && bytemode != xmm_mq_mode
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
+      && bytemode != xmmqh_mode
+      && bytemode != xmmqdh_mode
       && bytemode != ymm_mode
       && bytemode != tmm_mode
+      && bytemode != w_scalar_swap_mode
+      && bytemode != d_scalar_mode
       && bytemode != vex_scalar_w_dq_mode)
     {
       switch (vex.length)
@@ -12827,6 +13061,7 @@  OP_EX (int bytemode, int sizeflag)
 	}
     }
   else if (bytemode == xmmq_mode
+	   || bytemode == xmmqh_mode
 	   || bytemode == evex_half_bcst_xmmq_mode)
     {
       switch (vex.length)
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 27ddad49528..5611ba5a5be 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -243,6 +243,8 @@  static initializer cpu_flag_init[] =
     "CPU_AVX512F_FLAGS|CpuAVX512_BITALG" },
   { "CPU_AVX512_BF16_FLAGS",
     "CPU_AVX512F_FLAGS|CpuAVX512_BF16" },
+  { "CPU_AVX512_FP16_FLAGS",
+    "CPU_AVX512F_FLAGS|CpuAVX512_FP16" },
   { "CPU_L1OM_FLAGS",
     "unknown" },
   { "CPU_K1OM_FLAGS",
@@ -374,7 +376,7 @@  static initializer cpu_flag_init[] =
   { "CPU_ANY_AVX2_FLAGS",
     "CPU_ANY_AVX512F_FLAGS|CpuAVX2" },
   { "CPU_ANY_AVX512F_FLAGS",
-    "CpuAVX512F|CpuAVX512CD|CpuAVX512ER|CpuAVX512PF|CpuAVX512DQ|CpuAVX512BW|CpuAVX512VL|CpuAVX512IFMA|CpuAVX512VBMI|CpuAVX512_4FMAPS|CpuAVX512_4VNNIW|CpuAVX512_VPOPCNTDQ|CpuAVX512_VBMI2|CpuAVX512_VNNI|CpuAVX512_BITALG|CpuAVX512_BF16|CpuAVX512_VP2INTERSECT" },
+    "CpuAVX512F|CpuAVX512CD|CpuAVX512ER|CpuAVX512PF|CpuAVX512DQ|CpuAVX512BW|CpuAVX512VL|CpuAVX512IFMA|CpuAVX512VBMI|CpuAVX512_4FMAPS|CpuAVX512_4VNNIW|CpuAVX512_VPOPCNTDQ|CpuAVX512_VBMI2|CpuAVX512_VNNI|CpuAVX512_BITALG|CpuAVX512_BF16|CpuAVX512_VP2INTERSECT|CpuAVX512_FP16" },
   { "CPU_ANY_AVX512CD_FLAGS",
     "CpuAVX512CD" },
   { "CPU_ANY_AVX512ER_FLAGS",
@@ -439,6 +441,8 @@  static initializer cpu_flag_init[] =
     "CpuWideKL" },
   { "CPU_ANY_HRESET_FLAGS",
     "CpuHRESET" },
+  { "CPU_ANY_AVX512_FP16_FLAGS",
+    "CpuAVX512_FP16" },
 };
 
 static initializer operand_type_init[] =
@@ -645,6 +649,7 @@  static bitfield cpu_flags[] =
   BITFIELD (CpuAVX512_VP2INTERSECT),
   BITFIELD (CpuTDX),
   BITFIELD (CpuAVX_VNNI),
+  BITFIELD (CpuAVX512_FP16),
   BITFIELD (CpuMWAITX),
   BITFIELD (CpuCLZERO),
   BITFIELD (CpuOSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 8f0479b937b..077d936c793 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -211,6 +211,8 @@  enum
   CpuTDX,
   /* Intel AVX VNNI Instructions support required.  */
   CpuAVX_VNNI,
+  /* Intel AVX-512 FP16 Instructions support required.  */
+  CpuAVX512_FP16,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -388,6 +390,7 @@  typedef union i386_cpu_flags
       unsigned int cpuavx512_vp2intersect:1;
       unsigned int cputdx:1;
       unsigned int cpuavx_vnni:1;
+      unsigned int cpuavx512_fp16:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
@@ -578,6 +581,8 @@  enum
      1: 0F opcode prefix / space.
      2: 0F38 opcode prefix / space.
      3: 0F3A opcode prefix / space.
+     5: EVEXMAP5 opcode prefix / space.
+     6: EVEXMAP6 opcode prefix / space.
      8: XOP 08 opcode space.
      9: XOP 09 opcode space.
      A: XOP 0A opcode space.
@@ -586,6 +591,8 @@  enum
 #define SPACE_0F	1
 #define SPACE_0F38	2
 #define SPACE_0F3A	3
+#define SPACE_EVEXMAP5	5
+#define SPACE_EVEXMAP6	6
 #define SPACE_XOP08	8
 #define SPACE_XOP09	9
 #define SPACE_XOP0A	0xA
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index b0530e5fb82..207ef33acf7 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3929,3 +3929,379 @@  senduipi, 0xf30fc7, 6, CpuUINTR|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_
 hreset, 0xf30f3af0c0, None, CpuHRESET, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8 }
 
 // HRESET instructions end.
+
+// FP16 (HFNI) instructions.
+
+#define SpaceEVexMap5 OpcodeSpace=SPACE_EVEXMAP5
+#define SpaceEVexMap6 OpcodeSpace=SPACE_EVEXMAP6
+
+
+vaddph, 0x58, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vaddph, 0x58, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vaddsh, 0xf358, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vaddsh, 0xf358, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfcmaddcph, 0xf256, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfcmaddcph, 0xf256, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfcmaddcsh, 0xf257, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfcmaddcsh, 0xf257, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmaddcph, 0xf356, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmaddcph, 0xf356, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmaddcsh, 0xf357, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmaddcsh, 0xf357, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfcmulcph, 0xf2d6, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfcmulcph, 0xf2d6, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfcmulcsh, 0xf2d7, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfcmulcsh, 0xf2d7, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmulcph, 0xf3d6, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmulcph, 0xf3d6, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmulcsh, 0xf3d7, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmulcsh, 0xf3d7, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vcmpph, 0xc2, None, CpuAVX512_FP16, Modrm|EVex=1|Masking=2|Space0F3A|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegZMM, RegZMM, RegMask }
+
+vcmpph, 0xc2, None, CpuAVX512_FP16, Modrm|Masking=2|Space0F3A|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask }
+vcmpph, 0xc2, None, CpuAVX512_FP16, Modrm|EVex=1|Masking=2|Space0F3A|VexVVVV=1|VexW=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegZMM, RegZMM, RegMask }
+
+vcmpsh, 0xf3c2, None, CpuAVX512_FP16, Modrm|EVex128|Masking=2|Space0F3A|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegMask }
+vcmpsh, 0xf3c2, None, CpuAVX512_FP16, Modrm|EVex128|Masking=2|Space0F3A|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegXMM, RegXMM, RegMask }
+
+vcomish, 0x2f, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM }
+vcomish, 0x2f, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM }
+vucomish, 0x2e, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM }
+vucomish, 0x2e, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM }
+
+vcvtdq2ph, 0x5b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Dword|BaseIndex, RegXMM }
+vcvtdq2ph, 0x5b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM|Dword|Unspecified|BaseIndex, RegYMM }
+vcvtdq2phx, 0x5b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegXMM|Unspecified|BaseIndex, RegXMM }
+vcvtdq2phy, 0x5b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegYMM|Unspecified|BaseIndex, RegXMM }
+vcvtdq2ph, 0x5b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegYMM }
+
+vcvtps2phx, 0x661d, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM|Dword|Unspecified|BaseIndex, RegYMM }
+vcvtps2phx, 0x661d, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Dword|BaseIndex, RegXMM }
+vcvtps2phxx, 0x661d, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegXMM|Unspecified|BaseIndex, RegXMM }
+vcvtps2phxy, 0x661d, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegYMM|Unspecified|BaseIndex, RegXMM }
+vcvtps2phx, 0x661d, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegYMM }
+
+vcvtudq2ph, 0xf27a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM|Dword|Unspecified|BaseIndex, RegYMM }
+vcvtudq2ph, 0xf27a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|Dword|BaseIndex, RegXMM }
+vcvtudq2phx, 0xf27a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegXMM|Unspecified|BaseIndex, RegXMM }
+vcvtudq2phy, 0xf27a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegYMM|Unspecified|BaseIndex, RegXMM }
+vcvtudq2ph, 0xf27a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegYMM }
+
+vcvtpd2ph, 0x665a, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW=2|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Qword|BaseIndex, RegXMM }
+vcvtpd2phx, 0x665a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegXMM|Unspecified|BaseIndex, RegXMM }
+vcvtpd2phy, 0x665a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegYMM|Unspecified|BaseIndex, RegXMM }
+vcvtpd2phz, 0x665a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegZMM|Unspecified|BaseIndex, RegXMM }
+vcvtpd2ph, 0x665a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW=2|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegXMM }
+
+vcvtph2pd, 0x5a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Dword|Unspecified|BaseIndex, RegXMM }
+vcvtph2pd, 0x5a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Qword|Unspecified|BaseIndex, RegYMM }
+vcvtph2pd, 0x5a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegZMM }
+vcvtph2pd, 0x5a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegZMM }
+
+vcvtph2qq, 0x667b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Dword|Unspecified|BaseIndex, RegXMM }
+vcvtph2qq, 0x667b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegYMM }
+vcvtph2qq, 0x667b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegZMM }
+
+vcvtph2qq, 0x667b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegZMM }
+
+vcvtph2uqq, 0x6679, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Dword|Unspecified|BaseIndex, RegXMM }
+vcvtph2uqq, 0x6679, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegYMM }
+vcvtph2uqq, 0x6679, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegZMM }
+vcvtph2uqq, 0x6679, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegZMM }
+
+vcvttph2qq, 0x667a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Dword|Unspecified|BaseIndex, RegXMM }
+vcvttph2qq, 0x667a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegYMM }
+vcvttph2qq, 0x667a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegZMM }
+vcvttph2qq, 0x667a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegZMM }
+
+vcvttph2uqq, 0x6678, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Dword|Unspecified|BaseIndex, RegXMM }
+vcvttph2uqq, 0x6678, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegYMM }
+vcvttph2uqq, 0x6678, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegZMM }
+vcvttph2uqq, 0x6678, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegZMM }
+
+vcvtph2psx, 0x6613, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Qword|Unspecified|BaseIndex, RegXMM }
+vcvtph2psx, 0x6613, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|XMMword|Unspecified|BaseIndex, RegYMM }
+vcvtph2psx, 0x6613, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|YMMWord|Unspecified|BaseIndex, RegZMM }
+
+
+vcvtph2psx, 0x6613, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|Masking=3|SpaceEVexMap6|VexW0|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM|RegYMM }
+vcvtph2psx, 0x6613, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegZMM }
+vcvtph2psx, 0x6613, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegYMM, RegZMM }
+
+vcvtph2dq, 0x665b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
+vcvtph2dq, 0x665b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegYMM }
+vcvtph2dq, 0x665b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegYMM|Unspecified|BaseIndex, RegZMM }
+
+vcvtph2dq, 0x665b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegYMM, RegZMM }
+
+vcvtph2udq, 0x79, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
+vcvtph2udq, 0x79, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegYMM }
+vcvtph2udq, 0x79, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegYMM|Unspecified|BaseIndex, RegZMM }
+
+vcvtph2udq, 0x79, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegYMM, RegZMM }
+
+vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
+vcvttph2dq, 0xf35b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegYMM }
+vcvttph2dq, 0xf35b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegYMM|Unspecified|BaseIndex, RegZMM }
+vcvttph2dq, 0xf35b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegYMM, RegZMM }
+
+vcvttph2udq, 0x78, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Qword|Unspecified|BaseIndex, RegXMM }
+vcvttph2udq, 0x78, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|Unspecified|BaseIndex, RegYMM }
+vcvttph2udq, 0x78, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegYMM|Unspecified|BaseIndex, RegZMM }
+vcvttph2udq, 0x78, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegYMM, RegZMM }
+
+vcvtph2w, 0x667d, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vcvtph2w, 0x667d, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM }
+
+vcvtph2uw, 0x7d, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vcvtph2uw, 0x7d, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vcvtph2uw, 0x7d, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM }
+
+vcvttph2w, 0x667c, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vcvttph2w, 0x667c, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegZMM }
+
+vcvttph2uw, 0x7c, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vcvttph2uw, 0x7c, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegZMM }
+
+vcvtqq2ph, 0x5b, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW=2|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Qword|BaseIndex, RegXMM }
+vcvtqq2phz, 0x5b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegZMM|Unspecified|BaseIndex, RegXMM }
+vcvtqq2phx, 0x5b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegXMM|Unspecified|BaseIndex, RegXMM }
+vcvtqq2phy, 0x5b, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegYMM|Unspecified|BaseIndex, RegXMM }
+vcvtqq2ph, 0x5b, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegXMM }
+
+vcvtuqq2ph, 0xf27a, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW=2|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Qword|BaseIndex, RegXMM }
+vcvtuqq2phz, 0xf27a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegZMM|Unspecified|BaseIndex, RegXMM }
+vcvtuqq2phx, 0xf27a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegXMM|Unspecified|BaseIndex, RegXMM }
+vcvtuqq2phy, 0xf27a, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=3|SpaceEVexMap5|VexW=2|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegYMM|Unspecified|BaseIndex, RegXMM }
+vcvtuqq2ph, 0xf27a, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegXMM }
+
+vcvtsd2sh, 0xf25a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW=2|Disp8MemShift=3|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsd2sh, 0xf25a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vcvtsh2si, 0xf32d, None, CpuAVX512_FP16|Cpu64, Modrm|EVexLIG|SpaceEVexMap5|VexW=2|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg64 }
+vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg32 }
+vcvtsh2si, 0xf32d, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, Reg32|Reg64 }
+
+vcvtsh2usi, 0xf379, None, CpuAVX512_FP16|Cpu64, Modrm|EVexLIG|SpaceEVexMap5|VexW=2|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg64 }
+vcvtsh2usi, 0xf379, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg32 }
+vcvtsh2usi, 0xf379, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, Reg32|Reg64 }
+
+vcvttsh2si, 0xf32c, None, CpuAVX512_FP16|Cpu64, Modrm|EVexLIG|SpaceEVexMap5|VexW=2|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg64 }
+vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg32 }
+vcvttsh2si, 0xf32c, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, Reg32|Reg64 }
+
+vcvttsh2usi, 0xf378, None, CpuAVX512_FP16|Cpu64, Modrm|EVexLIG|SpaceEVexMap5|VexW=2|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg64 }
+vcvttsh2usi, 0xf378, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, Reg32 }
+vcvttsh2usi, 0xf378, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, Reg32|Reg64 }
+
+vcvtsh2sd, 0xf35a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsh2sd, 0xf35a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vcvtsh2ss, 0x13, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsh2ss, 0x13, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vcvtsi2sh, 0xf32a, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sh, 0xf32a, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtsi2sh, 0xf32a, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64, Imm8, RegXMM, RegXMM }
+vcvtsi2sh, 0xf32a, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64, Imm8, RegXMM, RegXMM }
+
+vcvtss2sh, 0x1d, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtss2sh, 0x1d, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=2|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|ATTSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|IntelSyntax, { Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE|ATTSyntax, { Reg32|Reg64, Imm8, RegXMM, RegXMM }
+vcvtusi2sh, 0xf37b, None, CpuAVX512_FP16, Modrm|EVexLIG|SpaceEVexMap5|VexVVVV|Disp8ShiftVL|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|StaticRounding|SAE|IntelSyntax, { Reg32|Reg64, Imm8, RegXMM, RegXMM }
+
+vcvtw2ph, 0xf37d, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vcvtw2ph, 0xf37d, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM }
+
+vcvtuw2ph, 0xf27d, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vcvtuw2ph, 0xf27d, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM }
+
+vdivph, 0x5e, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vdivph, 0x5e, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vdivsh, 0xf35e, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vdivsh, 0xf35e, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmadd132ph, 0x6698, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmadd132ph, 0x6698, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmadd213ph, 0x66a8, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmadd213ph, 0x66a8, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmadd231ph, 0x66b8, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmadd231ph, 0x66b8, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmadd132sh, 0x6699, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmadd132sh, 0x6699, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmadd213sh, 0x66a9, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmadd213sh, 0x66a9, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmadd231sh, 0x66b9, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmadd231sh, 0x66b9, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmaddsub132ph, 0x6696, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmaddsub132ph, 0x6696, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmaddsub213ph, 0x66a6, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmaddsub213ph, 0x66a6, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmaddsub231ph, 0x66b6, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmaddsub231ph, 0x66b6, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmadd132ph, 0x669c, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmadd132ph, 0x669c, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmadd213ph, 0x66ac, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmadd213ph, 0x66ac, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmadd231ph, 0x66bc, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmadd231ph, 0x66bc, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmadd132sh, 0x669d, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfnmadd132sh, 0x669d, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfnmadd213sh, 0x66ad, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfnmadd213sh, 0x66ad, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfnmadd231sh, 0x66bd, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfnmadd231sh, 0x66bd, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfpclassph, 0x66, None, CpuAVX512_FP16, Modrm|Masking=2|Space0F3A|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|RegYMM|RegZMM|Word|BaseIndex, RegMask }
+vfpclassphz, 0x66, None, CpuAVX512_FP16, Modrm|EVex512|Masking=2|Space0F3A|VexW0|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Imm8, RegZMM|Unspecified|BaseIndex, RegMask }
+vfpclassphx, 0x66, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex128|Masking=2|Space0F3A|VexW0|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Imm8, RegXMM|Unspecified|BaseIndex, RegMask }
+vfpclassphy, 0x66, None, CpuAVX512_FP16|CpuAVX512VL, Modrm|EVex256|Masking=2|Space0F3A|VexW0|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Imm8, RegYMM|Unspecified|BaseIndex, RegMask }
+
+vfpclasssh, 0x67, None, CpuAVX512_FP16, Modrm|EVex128|Masking=2|Space0F3A|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Word|Unspecified|BaseIndex, RegMask }
+
+vgetmantph, 0x26, None, CpuAVX512_FP16, Modrm|Masking=3|Space0F3A|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vgetmantph, 0x26, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|Space0F3A|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegZMM, RegZMM }
+vgetmantsh, 0x27, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vgetmantsh, 0x27, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|Space0F3A|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegXMM, RegXMM, RegXMM }
+
+vmaxph, 0x5f, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vmaxph, 0x5f, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vmaxsh, 0xf35f, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmaxsh, 0xf35f, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vminph, 0x5d, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vminph, 0x5d, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vminsh, 0xf35d, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vminsh, 0xf35d, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vmovsh, 0xf310, None, CpuAVX512_FP16, D|Modrm|EVexLIG|Masking=3|SpaceEVexMap5|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|Unspecified|BaseIndex, RegXMM }
+vmovsh, 0xf310, None, CpuAVX512_FP16, D|Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
+
+vmovw, 0x666e, None, CpuAVX512_FP16, D|Modrm|EVex128|SpaceEVexMap5|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Reg64|Word|Unspecified|BaseIndex, RegXMM }
+
+vmulsh, 0xf359, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vmulsh, 0xf359, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmsub132ph, 0x669a, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsub132ph, 0x669a, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmsub213ph, 0x66aa, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsub213ph, 0x66aa, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmsub231ph, 0x66ba, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsub231ph, 0x66ba, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmsub132sh, 0x669b, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmsub132sh, 0x669b, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmsub213sh, 0x66ab, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmsub213sh, 0x66ab, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmsub231sh, 0x66bb, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfmsub231sh, 0x66bb, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfmsubadd132ph, 0x6697, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsubadd132ph, 0x6697, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmsubadd213ph, 0x66a7, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsubadd213ph, 0x66a7, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfmsubadd231ph, 0x66b7, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfmsubadd231ph, 0x66b7, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmsub132ph, 0x669e, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmsub132ph, 0x669e, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmsub213ph, 0x66ae, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmsub213ph, 0x66ae, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmsub231ph, 0x66be, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vfnmsub231ph, 0x66be, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vfnmsub132sh, 0x669f, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfnmsub132sh, 0x669f, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfnmsub213sh, 0x66af, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfnmsub213sh, 0x66af, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vfnmsub231sh, 0x66bf, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vfnmsub231sh, 0x66bf, None, CpuAVX512_FP16, Modrm|EVexLIG|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vgetexpph, 0x6642, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vgetexpph, 0x6642, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegZMM, RegZMM }
+
+vgetexpsh, 0x6643, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vgetexpsh, 0x6643, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vmulph, 0x59, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vmulph, 0x59, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vreduceph, 0x56, None, CpuAVX512_FP16, Modrm|Masking=3|Space0F3A|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vreduceph, 0x56, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|Space0F3A|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegZMM, RegZMM }
+
+vreducesh, 0x57, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vreducesh, 0x57, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|Space0F3A|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegXMM, RegXMM, RegXMM }
+
+vrndscaleph, 0x08, None, CpuAVX512_FP16, Modrm|Masking=3|Space0F3A|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vrndscaleph, 0x08, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|Space0F3A|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegZMM, RegZMM }
+
+vrndscalesh, 0x0a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|Space0F3A|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vrndscalesh, 0x0a, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|Space0F3A|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SAE, { Imm8, Imm8, RegXMM, RegXMM, RegXMM }
+
+vrcpph, 0x664c, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+
+vrcpsh, 0x664d, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+
+vrsqrtph, 0x664e, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+
+vrsqrtsh, 0x664f, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+
+vscalefph, 0x662c, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vscalefph, 0x662c, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap6|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vscalefsh, 0x662d, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vscalefsh, 0x662d, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap6|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vsqrtph, 0x51, None, CpuAVX512_FP16, Modrm|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }
+vsqrtph, 0x51, None, CpuAVX512_FP16, Modrm|EVex512|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM }
+
+vsqrtsh, 0xf351, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsqrtsh, 0xf351, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+vsubph, 0x5c, None, CpuAVX512_FP16, Modrm|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|Broadcast|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Word|RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
+vsubph, 0x5c, None, CpuAVX512_FP16, Modrm|EVex512|VexVVVV|Masking=3|SpaceEVexMap5|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegZMM, RegZMM, RegZMM }
+
+vsubsh, 0xf35c, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|Disp8MemShift=1|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
+vsubsh, 0xf35c, None, CpuAVX512_FP16, Modrm|EVexLIG|Masking=3|SpaceEVexMap5|VexVVVV|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|StaticRounding|SAE, { Imm8, RegXMM, RegXMM, RegXMM }
+
+// FP16 (HFNI) instructions end.