riscv: print .2byte or .4byte before an unknown instruction encoding

Message ID 20210918102408.89285-1-andrew.burgess@embecosm.com
State New
Headers show
Series
  • riscv: print .2byte or .4byte before an unknown instruction encoding
Related show

Commit Message

Andrew Burgess Sept. 18, 2021, 10:24 a.m.
When the RISC-V disassembler encounters an unknown instruction, it
currently just prints the value of the bytes, like this:

  Dump of assembler code for function custom_insn:
     0x00010132 <+0>:	addi	sp,sp,-16
     0x00010134 <+2>:	sw	s0,12(sp)
     0x00010136 <+4>:	addi	s0,sp,16
     0x00010138 <+6>:	0x52018b
     0x0001013c <+10>:	0x9c45

My proposal, in this patch, is to change the behaviour to this:

  Dump of assembler code for function custom_insn:
     0x00010132 <+0>:	addi	sp,sp,-16
     0x00010134 <+2>:	sw	s0,12(sp)
     0x00010136 <+4>:	addi	s0,sp,16
     0x00010138 <+6>:	.4byte	0x52018b
     0x0001013c <+10>:	.2byte	0x9c45

Adding the .4byte and .2byte opcodes.  The benefit that I see here is
that in the patched version of the tools, the disassembler output can
be fed back into the assembler and it should assemble to the same
binary format.  Before the patch, the disassembler output is invalid
assembly.

I've started a RISC-V specific test file under binutils so that I can
add a test for this change.

binutils/ChangeLog:

	* testsuite/binutils-all/riscv/riscv.exp: New file.
	* testsuite/binutils-all/riscv/unknown.d: New file.
	* testsuite/binutils-all/riscv/unknown.s: New file.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_disassemble_insn): Print a .%dbyte opcode
	before an unknown instruction, '%d' is replaced with the
	instruction length.
---
 binutils/ChangeLog                            |  6 ++++
 .../testsuite/binutils-all/riscv/riscv.exp    | 29 +++++++++++++++++++
 .../testsuite/binutils-all/riscv/unknown.d    | 11 +++++++
 .../testsuite/binutils-all/riscv/unknown.s    | 27 +++++++++++++++++
 opcodes/ChangeLog                             |  6 ++++
 opcodes/riscv-dis.c                           | 24 ++++++++++++++-
 6 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 binutils/testsuite/binutils-all/riscv/riscv.exp
 create mode 100644 binutils/testsuite/binutils-all/riscv/unknown.d
 create mode 100644 binutils/testsuite/binutils-all/riscv/unknown.s

-- 
2.25.4

Comments

Nelson Chu Sept. 18, 2021, 4:03 p.m. | #1
Hi Andrew,

On Sat, Sep 18, 2021 at 6:24 PM Andrew Burgess
<andrew.burgess@embecosm.com> wrote:
>

> When the RISC-V disassembler encounters an unknown instruction, it

> currently just prints the value of the bytes, like this:

>

>   Dump of assembler code for function custom_insn:

>      0x00010132 <+0>:   addi    sp,sp,-16

>      0x00010134 <+2>:   sw      s0,12(sp)

>      0x00010136 <+4>:   addi    s0,sp,16

>      0x00010138 <+6>:   0x52018b

>      0x0001013c <+10>:  0x9c45

>

> My proposal, in this patch, is to change the behaviour to this:

>

>   Dump of assembler code for function custom_insn:

>      0x00010132 <+0>:   addi    sp,sp,-16

>      0x00010134 <+2>:   sw      s0,12(sp)

>      0x00010136 <+4>:   addi    s0,sp,16

>      0x00010138 <+6>:   .4byte  0x52018b

>      0x0001013c <+10>:  .2byte  0x9c45

>

> Adding the .4byte and .2byte opcodes.  The benefit that I see here is

> that in the patched version of the tools, the disassembler output can

> be fed back into the assembler and it should assemble to the same

> binary format.  Before the patch, the disassembler output is invalid

> assembly.


Looks reasonable, this change is more friendly to tools, and should
get a better disassembler output.

> I've started a RISC-V specific test file under binutils so that I can

> add a test for this change.

>

> binutils/ChangeLog:

>

>         * testsuite/binutils-all/riscv/riscv.exp: New file.

>         * testsuite/binutils-all/riscv/unknown.d: New file.

>         * testsuite/binutils-all/riscv/unknown.s: New file.

>

> opcodes/ChangeLog:

>

>         * riscv-dis.c (riscv_disassemble_insn): Print a .%dbyte opcode

>         before an unknown instruction, '%d' is replaced with the

>         instruction length.

> ---

>  binutils/ChangeLog                            |  6 ++++

>  .../testsuite/binutils-all/riscv/riscv.exp    | 29 +++++++++++++++++++

>  .../testsuite/binutils-all/riscv/unknown.d    | 11 +++++++

>  .../testsuite/binutils-all/riscv/unknown.s    | 27 +++++++++++++++++

>  opcodes/ChangeLog                             |  6 ++++

>  opcodes/riscv-dis.c                           | 24 ++++++++++++++-

>  6 files changed, 102 insertions(+), 1 deletion(-)

>  create mode 100644 binutils/testsuite/binutils-all/riscv/riscv.exp

>  create mode 100644 binutils/testsuite/binutils-all/riscv/unknown.d

>  create mode 100644 binutils/testsuite/binutils-all/riscv/unknown.s

>

> diff --git a/binutils/testsuite/binutils-all/riscv/riscv.exp b/binutils/testsuite/binutils-all/riscv/riscv.exp

> new file mode 100644

> index 00000000000..8f0b7dc9766

> --- /dev/null

> +++ b/binutils/testsuite/binutils-all/riscv/riscv.exp

> @@ -0,0 +1,29 @@

> +# Copyright (C) 2021 Free Software Foundation, Inc.

> +

> +# This program is free software; you can redistribute it and/or modify

> +# it under the terms of the GNU General Public License as published by

> +# the Free Software Foundation; either version 3 of the License, or

> +# (at your option) any later version.

> +#

> +# This program is distributed in the hope that it will be useful,

> +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +# GNU General Public License for more details.

> +#

> +# You should have received a copy of the GNU General Public License

> +# along with this program; if not, write to the Free Software

> +# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.

> +

> +if ![istarget "riscv*-*-*"] then {

> +    return

> +}

> +

> +set tempfile tmpdir/riscvtemp.o

> +set copyfile tmpdir/riscvcopy

> +

> +set test_list [lsort [glob -nocomplain $srcdir/$subdir/*.d]]

> +foreach t $test_list {

> +    # We need to strip the ".d", but can leave the dirname.

> +    verbose [file rootname $t]

> +    run_dump_test [file rootname $t]

> +}

> diff --git a/binutils/testsuite/binutils-all/riscv/unknown.d b/binutils/testsuite/binutils-all/riscv/unknown.d

> new file mode 100644

> index 00000000000..64791169b2a

> --- /dev/null

> +++ b/binutils/testsuite/binutils-all/riscv/unknown.d

> @@ -0,0 +1,11 @@

> +#as: -march=rv32ic

> +#objdump: -d

> +# Test the disassembly of unknown instruction encodings, specifically,

> +# ensure that we generate a .?byte opcode.

> +

> +#...

> +Disassembly of section \.text:

> +

> +[0-9a-f]+ <\.text>:

> +   [0-9a-f]+:  0052018b                \.4byte 0x52018b

> +   [0-9a-f]+:  9c45                    \.2byte 0x9c45

> diff --git a/binutils/testsuite/binutils-all/riscv/unknown.s b/binutils/testsuite/binutils-all/riscv/unknown.s

> new file mode 100644

> index 00000000000..df929047fef

> --- /dev/null

> +++ b/binutils/testsuite/binutils-all/riscv/unknown.s

> @@ -0,0 +1,27 @@

> +/* Copyright (C) 2021 Free Software Foundation, Inc.

> +

> +   This program is free software; you can redistribute it and/or modify

> +   it under the terms of the GNU General Public License as published by

> +   the Free Software Foundation; either version 3 of the License, or

> +   (at your option) any later version.

> +

> +   This program is distributed in the hope that it will be useful,

> +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +   GNU General Public License for more details.

> +

> +   You should have received a copy of the GNU General Public License

> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

> +

> +        .text

> +       /* The following instruction is in the area set aside for

> +          custom instruction extensions.  As such it is unlikely that

> +           an upstream extension should ever clash with this.  */

> +       .insn r 0x0b, 0x0, 0x0, x3, x4, x5

> +        /* Unlike the above, the following is just a reserved

> +          instruction encoding.  This means that in the future an

> +          extension to the compressed instruction set might use this

> +          encoding.  If/when that happens we'll need to find a

> +          different unused encoding within the compressed instruction

> +          space.  */

> +       .insn ca 0x1, 0x27, 0x2, x8, x9


Fortunately there are few new compressed instructions, so this case
should be used for a while :)

> diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c

> index 57198c7b6b5..2e28ba77e60 100644

> --- a/opcodes/riscv-dis.c

> +++ b/opcodes/riscv-dis.c

> @@ -570,7 +570,29 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)

>

>    /* We did not find a match, so just print the instruction bits.  */

>    info->insn_type = dis_noninsn;

> -  (*info->fprintf_func) (info->stream, "0x%llx", (unsigned long long)word);

> +  switch (insnlen)

> +    {

> +    case 2:

> +    case 4:

> +    case 8:

> +      (*info->fprintf_func) (info->stream, ".%dbyte\t0x%llx",

> +                             insnlen, (unsigned long long) word);

> +      break;

> +    default:

> +      {

> +        int i;

> +        (*info->fprintf_func) (info->stream, ".byte\t");

> +        for (i = 0; i < insnlen; ++i)

> +          {

> +            if (i > 0)

> +              (*info->fprintf_func) (info->stream, ", ");

> +            (*info->fprintf_func) (info->stream, "0x%02x",

> +                                   (unsigned int) (word & 0xff));

> +            word >>= 8;

> +          }

> +      }

> +      break;

> +    }

>    return insnlen;

>  }


LGTM.  Please commit when you think it's time.

Thank you very much
Nelson
Andrew Burgess Sept. 20, 2021, 8:46 a.m. | #2
* Nelson Chu <nelson.chu@sifive.com> [2021-09-19 00:03:56 +0800]:

> Hi Andrew,

> 

> On Sat, Sep 18, 2021 at 6:24 PM Andrew Burgess

> <andrew.burgess@embecosm.com> wrote:

> >

> > When the RISC-V disassembler encounters an unknown instruction, it

> > currently just prints the value of the bytes, like this:

> >

> >   Dump of assembler code for function custom_insn:

> >      0x00010132 <+0>:   addi    sp,sp,-16

> >      0x00010134 <+2>:   sw      s0,12(sp)

> >      0x00010136 <+4>:   addi    s0,sp,16

> >      0x00010138 <+6>:   0x52018b

> >      0x0001013c <+10>:  0x9c45

> >

> > My proposal, in this patch, is to change the behaviour to this:

> >

> >   Dump of assembler code for function custom_insn:

> >      0x00010132 <+0>:   addi    sp,sp,-16

> >      0x00010134 <+2>:   sw      s0,12(sp)

> >      0x00010136 <+4>:   addi    s0,sp,16

> >      0x00010138 <+6>:   .4byte  0x52018b

> >      0x0001013c <+10>:  .2byte  0x9c45

> >

> > Adding the .4byte and .2byte opcodes.  The benefit that I see here is

> > that in the patched version of the tools, the disassembler output can

> > be fed back into the assembler and it should assemble to the same

> > binary format.  Before the patch, the disassembler output is invalid

> > assembly.

> 

> Looks reasonable, this change is more friendly to tools, and should

> get a better disassembler output.

> 

> > I've started a RISC-V specific test file under binutils so that I can

> > add a test for this change.

> >

> > binutils/ChangeLog:

> >

> >         * testsuite/binutils-all/riscv/riscv.exp: New file.

> >         * testsuite/binutils-all/riscv/unknown.d: New file.

> >         * testsuite/binutils-all/riscv/unknown.s: New file.

> >

> > opcodes/ChangeLog:

> >

> >         * riscv-dis.c (riscv_disassemble_insn): Print a .%dbyte opcode

> >         before an unknown instruction, '%d' is replaced with the

> >         instruction length.

> > ---

> >  binutils/ChangeLog                            |  6 ++++

> >  .../testsuite/binutils-all/riscv/riscv.exp    | 29 +++++++++++++++++++

> >  .../testsuite/binutils-all/riscv/unknown.d    | 11 +++++++

> >  .../testsuite/binutils-all/riscv/unknown.s    | 27 +++++++++++++++++

> >  opcodes/ChangeLog                             |  6 ++++

> >  opcodes/riscv-dis.c                           | 24 ++++++++++++++-

> >  6 files changed, 102 insertions(+), 1 deletion(-)

> >  create mode 100644 binutils/testsuite/binutils-all/riscv/riscv.exp

> >  create mode 100644 binutils/testsuite/binutils-all/riscv/unknown.d

> >  create mode 100644 binutils/testsuite/binutils-all/riscv/unknown.s

> >

> > diff --git a/binutils/testsuite/binutils-all/riscv/riscv.exp b/binutils/testsuite/binutils-all/riscv/riscv.exp

> > new file mode 100644

> > index 00000000000..8f0b7dc9766

> > --- /dev/null

> > +++ b/binutils/testsuite/binutils-all/riscv/riscv.exp

> > @@ -0,0 +1,29 @@

> > +# Copyright (C) 2021 Free Software Foundation, Inc.

> > +

> > +# This program is free software; you can redistribute it and/or modify

> > +# it under the terms of the GNU General Public License as published by

> > +# the Free Software Foundation; either version 3 of the License, or

> > +# (at your option) any later version.

> > +#

> > +# This program is distributed in the hope that it will be useful,

> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> > +# GNU General Public License for more details.

> > +#

> > +# You should have received a copy of the GNU General Public License

> > +# along with this program; if not, write to the Free Software

> > +# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.

> > +

> > +if ![istarget "riscv*-*-*"] then {

> > +    return

> > +}

> > +

> > +set tempfile tmpdir/riscvtemp.o

> > +set copyfile tmpdir/riscvcopy

> > +

> > +set test_list [lsort [glob -nocomplain $srcdir/$subdir/*.d]]

> > +foreach t $test_list {

> > +    # We need to strip the ".d", but can leave the dirname.

> > +    verbose [file rootname $t]

> > +    run_dump_test [file rootname $t]

> > +}

> > diff --git a/binutils/testsuite/binutils-all/riscv/unknown.d b/binutils/testsuite/binutils-all/riscv/unknown.d

> > new file mode 100644

> > index 00000000000..64791169b2a

> > --- /dev/null

> > +++ b/binutils/testsuite/binutils-all/riscv/unknown.d

> > @@ -0,0 +1,11 @@

> > +#as: -march=rv32ic

> > +#objdump: -d

> > +# Test the disassembly of unknown instruction encodings, specifically,

> > +# ensure that we generate a .?byte opcode.

> > +

> > +#...

> > +Disassembly of section \.text:

> > +

> > +[0-9a-f]+ <\.text>:

> > +   [0-9a-f]+:  0052018b                \.4byte 0x52018b

> > +   [0-9a-f]+:  9c45                    \.2byte 0x9c45

> > diff --git a/binutils/testsuite/binutils-all/riscv/unknown.s b/binutils/testsuite/binutils-all/riscv/unknown.s

> > new file mode 100644

> > index 00000000000..df929047fef

> > --- /dev/null

> > +++ b/binutils/testsuite/binutils-all/riscv/unknown.s

> > @@ -0,0 +1,27 @@

> > +/* Copyright (C) 2021 Free Software Foundation, Inc.

> > +

> > +   This program is free software; you can redistribute it and/or modify

> > +   it under the terms of the GNU General Public License as published by

> > +   the Free Software Foundation; either version 3 of the License, or

> > +   (at your option) any later version.

> > +

> > +   This program is distributed in the hope that it will be useful,

> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of

> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> > +   GNU General Public License for more details.

> > +

> > +   You should have received a copy of the GNU General Public License

> > +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

> > +

> > +        .text

> > +       /* The following instruction is in the area set aside for

> > +          custom instruction extensions.  As such it is unlikely that

> > +           an upstream extension should ever clash with this.  */

> > +       .insn r 0x0b, 0x0, 0x0, x3, x4, x5

> > +        /* Unlike the above, the following is just a reserved

> > +          instruction encoding.  This means that in the future an

> > +          extension to the compressed instruction set might use this

> > +          encoding.  If/when that happens we'll need to find a

> > +          different unused encoding within the compressed instruction

> > +          space.  */

> > +       .insn ca 0x1, 0x27, 0x2, x8, x9

> 

> Fortunately there are few new compressed instructions, so this case

> should be used for a while :)

> 

> > diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c

> > index 57198c7b6b5..2e28ba77e60 100644

> > --- a/opcodes/riscv-dis.c

> > +++ b/opcodes/riscv-dis.c

> > @@ -570,7 +570,29 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)

> >

> >    /* We did not find a match, so just print the instruction bits.  */

> >    info->insn_type = dis_noninsn;

> > -  (*info->fprintf_func) (info->stream, "0x%llx", (unsigned long long)word);

> > +  switch (insnlen)

> > +    {

> > +    case 2:

> > +    case 4:

> > +    case 8:

> > +      (*info->fprintf_func) (info->stream, ".%dbyte\t0x%llx",

> > +                             insnlen, (unsigned long long) word);

> > +      break;

> > +    default:

> > +      {

> > +        int i;

> > +        (*info->fprintf_func) (info->stream, ".byte\t");

> > +        for (i = 0; i < insnlen; ++i)

> > +          {

> > +            if (i > 0)

> > +              (*info->fprintf_func) (info->stream, ", ");

> > +            (*info->fprintf_func) (info->stream, "0x%02x",

> > +                                   (unsigned int) (word & 0xff));

> > +            word >>= 8;

> > +          }

> > +      }

> > +      break;

> > +    }

> >    return insnlen;

> >  }

> 

> LGTM.  Please commit when you think it's time.


Thanks, I pushed this.

Andrew

Patch

diff --git a/binutils/testsuite/binutils-all/riscv/riscv.exp b/binutils/testsuite/binutils-all/riscv/riscv.exp
new file mode 100644
index 00000000000..8f0b7dc9766
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/riscv.exp
@@ -0,0 +1,29 @@ 
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
+
+if ![istarget "riscv*-*-*"] then {
+    return
+}
+
+set tempfile tmpdir/riscvtemp.o
+set copyfile tmpdir/riscvcopy
+
+set test_list [lsort [glob -nocomplain $srcdir/$subdir/*.d]]
+foreach t $test_list {
+    # We need to strip the ".d", but can leave the dirname.
+    verbose [file rootname $t]
+    run_dump_test [file rootname $t]
+}
diff --git a/binutils/testsuite/binutils-all/riscv/unknown.d b/binutils/testsuite/binutils-all/riscv/unknown.d
new file mode 100644
index 00000000000..64791169b2a
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/unknown.d
@@ -0,0 +1,11 @@ 
+#as: -march=rv32ic
+#objdump: -d
+# Test the disassembly of unknown instruction encodings, specifically,
+# ensure that we generate a .?byte opcode.
+
+#...
+Disassembly of section \.text:
+
+[0-9a-f]+ <\.text>:
+   [0-9a-f]+:	0052018b          	\.4byte	0x52018b
+   [0-9a-f]+:	9c45                	\.2byte	0x9c45
diff --git a/binutils/testsuite/binutils-all/riscv/unknown.s b/binutils/testsuite/binutils-all/riscv/unknown.s
new file mode 100644
index 00000000000..df929047fef
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/unknown.s
@@ -0,0 +1,27 @@ 
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+        .text
+	/* The following instruction is in the area set aside for
+	   custom instruction extensions.  As such it is unlikely that
+           an upstream extension should ever clash with this.  */
+	.insn r 0x0b, 0x0, 0x0, x3, x4, x5
+        /* Unlike the above, the following is just a reserved
+	   instruction encoding.  This means that in the future an
+	   extension to the compressed instruction set might use this
+	   encoding.  If/when that happens we'll need to find a
+	   different unused encoding within the compressed instruction
+	   space.  */
+	.insn ca 0x1, 0x27, 0x2, x8, x9
diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 57198c7b6b5..2e28ba77e60 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -570,7 +570,29 @@  riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 
   /* We did not find a match, so just print the instruction bits.  */
   info->insn_type = dis_noninsn;
-  (*info->fprintf_func) (info->stream, "0x%llx", (unsigned long long)word);
+  switch (insnlen)
+    {
+    case 2:
+    case 4:
+    case 8:
+      (*info->fprintf_func) (info->stream, ".%dbyte\t0x%llx",
+                             insnlen, (unsigned long long) word);
+      break;
+    default:
+      {
+        int i;
+        (*info->fprintf_func) (info->stream, ".byte\t");
+        for (i = 0; i < insnlen; ++i)
+          {
+            if (i > 0)
+              (*info->fprintf_func) (info->stream, ", ");
+            (*info->fprintf_func) (info->stream, "0x%02x",
+                                   (unsigned int) (word & 0xff));
+            word >>= 8;
+          }
+      }
+      break;
+    }
   return insnlen;
 }