asm non-code template parts (alternative to asm inline)

Message ID alpine.LNX.2.20.13.1810142227590.13914@monopod.intra.ispras.ru
State New
Headers show
Series
  • asm non-code template parts (alternative to asm inline)
Related show

Commit Message

Alexander Monakov Oct. 14, 2018, 8:07 p.m.
Hello,

This is an alternative proposal to the "asm inline" feature.

Kernel developers have reported suboptimal optimization where use of asm
statements such as

  asm("ud2\n"
      ".pushsection foo\n"
      ...
      ".popsection\n" : : ...)

impacts inlining decisions badly, since GCC assumes cost of the asm to be
high, even though it emits just one instruction to the text section. I'd
like to point out that branch range optimization is also negatively affected.

I suggest we give asm writers a way to mark portions of the asm template
that should be ignored in cost estimation. This is a more fine-grained
mechanism compared to 'asm inline', and it also helps branch range optimization.

Specifically, I propose that in Extended asms, percent-backtick (%`) is
recognized as such region boundary. Percent sign is of course always special
in Extended asms, and backtick sign is not claimed by any backend.

For Basic asms, no similar mechanism is necessary since they are antithetical
to efficiency in the first place.

Kernels developers can then use this extension via

[if gcc-9 or compatible]
#define ASM_NONTEXT_BEGIN "%`\n"
[else]
#define ASM_NONTEXT_BEGIN "\n"
[endif]

#define ASM_NONTEXT_END ASM_NONTEXT_BEGIN

  asm("ud2\n"
      ASM_NONTEXT_BEGIN
      ".pushsection foo\n"
      ...
      ".popsection\n"
      ASM_NONTEXT_END : : ...)

How does this look?

	* doc/extend.texi (Extended Asm): Document %` in template.
	(Size of an Asm): Document intended use of %`.
	* final.c (asm_insn_count): Adjust.
	(asm_str_count): Add argument to distinguish basic and extended asms.
	In extended asms, ignore separators inside of %` ... %`.
	(output_asm_insn): Handle %`.
	* rtl.h (asm_str_count): Adjust prototype.
	* tree-inline.c (estimate_num_insns): Adjust.
	* config/arm/arm.c (arm_rtx_costs_internal): Adjust.

Comments

Richard Biener Oct. 15, 2018, 9:27 a.m. | #1
On Sun, Oct 14, 2018 at 10:07 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>

> Hello,

>

> This is an alternative proposal to the "asm inline" feature.

>

> Kernel developers have reported suboptimal optimization where use of asm

> statements such as

>

>   asm("ud2\n"

>       ".pushsection foo\n"

>       ...

>       ".popsection\n" : : ...)

>

> impacts inlining decisions badly, since GCC assumes cost of the asm to be

> high, even though it emits just one instruction to the text section. I'd

> like to point out that branch range optimization is also negatively affected.

>

> I suggest we give asm writers a way to mark portions of the asm template

> that should be ignored in cost estimation. This is a more fine-grained

> mechanism compared to 'asm inline', and it also helps branch range optimization.

>

> Specifically, I propose that in Extended asms, percent-backtick (%`) is

> recognized as such region boundary. Percent sign is of course always special

> in Extended asms, and backtick sign is not claimed by any backend.

>

> For Basic asms, no similar mechanism is necessary since they are antithetical

> to efficiency in the first place.

>

> Kernels developers can then use this extension via

>

> [if gcc-9 or compatible]

> #define ASM_NONTEXT_BEGIN "%`\n"

> [else]

> #define ASM_NONTEXT_BEGIN "\n"

> [endif]

>

> #define ASM_NONTEXT_END ASM_NONTEXT_BEGIN

>

>   asm("ud2\n"

>       ASM_NONTEXT_BEGIN

>       ".pushsection foo\n"

>       ...

>       ".popsection\n"

>       ASM_NONTEXT_END : : ...)

>

> How does this look?


I think it's sound but also note that I think it is logically independent of
asm inline ().  While it may work for the inlining issue for some kernel
examples to asm inline () is sth similar to always_inline for functions,
that is, even though an asm _does_ have non-negligible .text size
we do want to ignore that for the purpose of inlining (but not for the
purpose of branch size estimation).

Your idea is good to make convoluted asms more precise.

Note in your docs you refer to "non-code" sections but it should
equally apply to .text sections that are not the section of the asm
context (.text.cold, for example).  So better wording there would
be appreciated.

Note that I'm concerned about ignoring %` regions for inlining
purposes since iff .text.cold or .data stuff is ignored that sections
might explode in size.  In this context inlining is _not_ free.
So the question is whether we should add an argument to
asm_insn_count () for whether to ignore non-"code" parts
or not and I'd say we should _not_ ignore them for inlining.

Then there's the question if we want people to start writing

 "    %`.1:\n"
 "%`        jne .1\n"

thus, make GCC ignore lines with just labels or other
asm directives.  Or if we should add some (target / assembler)
specific magic to ignore those that are free.

Oh, and I personally find %` ugly ;)  What non-alnum chars
are taken by backends?

Richard.

>

>         * doc/extend.texi (Extended Asm): Document %` in template.

>         (Size of an Asm): Document intended use of %`.

>         * final.c (asm_insn_count): Adjust.

>         (asm_str_count): Add argument to distinguish basic and extended asms.

>         In extended asms, ignore separators inside of %` ... %`.

>         (output_asm_insn): Handle %`.

>         * rtl.h (asm_str_count): Adjust prototype.

>         * tree-inline.c (estimate_num_insns): Adjust.

>         * config/arm/arm.c (arm_rtx_costs_internal): Adjust.

>

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi

> index cfe6a8e5bb8..798d310061c 100644

> --- a/gcc/doc/extend.texi

> +++ b/gcc/doc/extend.texi

> @@ -8613,6 +8613,11 @@ generates multiple assembler instructions.

>  Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively)

>  into the assembler code.  When unescaped, these characters have special

>  meaning to indicate multiple assembler dialects, as described below.

> +

> +@item %`

> +Signifies a boundary of a region where instruction separators are not

> +counted towards its cost (@pxref{Size of an asm}). Must be followed by

> +a whitespace character.

>  @end table

>

>  @subsubheading Multiple assembler dialects in @code{asm} templates

> @@ -9821,7 +9826,7 @@ does this by counting the number of instructions in the pattern of the

>  @code{asm} and multiplying that by the length of the longest

>  instruction supported by that processor.  (When working out the number

>  of instructions, it assumes that any occurrence of a newline or of

> -whatever statement separator character is supported by the assembler --

> +whatever statement separator character is supported by the assembler ---

>  typically @samp{;} --- indicates the end of an instruction.)

>

>  Normally, GCC's estimate is adequate to ensure that correct

> @@ -9832,6 +9837,15 @@ space in the object file than is needed for a single instruction.

>  If this happens then the assembler may produce a diagnostic saying that

>  a label is unreachable.

>

> +Likewise, it is possible for GCC to significantly over-estimate the

> +number of instructions in an @code{asm}, resulting in suboptimal decisions

> +when the estimate is used during inlining and branch range optimization.

> +This can happen if the @code{asm} template has many lines that do not

> +correspond to instructions, for example when the @samp{.pushsection}

> +directive is used to emit auxiliary data in a non-code section.

> +For Extended @code{asm} statements, you can improve the estimate by

> +wrapping the non-code portion in @samp{%` ... %`} delimiters.

> +

>  @node Alternate Keywords

>  @section Alternate Keywords

>  @cindex alternate keywords

> diff --git a/gcc/final.c b/gcc/final.c

> index 6943c073d9b..dc474d7f6e1 100644

> --- a/gcc/final.c

> +++ b/gcc/final.c

> @@ -1408,29 +1408,40 @@ static int

>  asm_insn_count (rtx body)

>  {

>    const char *templ;

> +  bool basic = GET_CODE (body) == ASM_INPUT;

>

> -  if (GET_CODE (body) == ASM_INPUT)

> +  if (basic)

>      templ = XSTR (body, 0);

>    else

>      templ = decode_asm_operands (body, NULL, NULL, NULL, NULL, NULL);

>

> -  return asm_str_count (templ);

> +  return asm_str_count (templ, basic);

>  }

>

>  /* Return the number of machine instructions likely to be generated for the

> -   inline-asm template. */

> +   inline-asm template.  BASIC indicates if it is used in a basic asm.  */

>  int

> -asm_str_count (const char *templ)

> +asm_str_count (const char *templ, bool basic)

>  {

>    int count = 1;

> +  bool in_backticks = false;

>

>    if (!*templ)

>      return 0;

>

>    for (; *templ; templ++)

> -    if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)

> -       || *templ == '\n')

> -      count++;

> +    if (*templ == '%' && !basic)

> +      {

> +       templ++;

> +       if (!*templ)

> +         break;

> +       /* Separators inside %` ... %` are not counted.  */

> +       if (*templ == '`')

> +         in_backticks = !in_backticks;

> +      }

> +    else if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)

> +            || *templ == '\n')

> +      count += !in_backticks;

>

>    return count;

>  }

> @@ -3836,6 +3847,7 @@ output_asm_insn (const char *templ, rtx *operands)

>    int oporder[MAX_RECOG_OPERANDS];

>    char opoutput[MAX_RECOG_OPERANDS];

>    int ops = 0;

> +  bool in_backticks = false;

>

>    /* An insn may return a null string template

>       in a case where no assembler code is needed.  */

> @@ -3902,6 +3914,17 @@ output_asm_insn (const char *templ, rtx *operands)

>             p++;

>             fprintf (asm_out_file, "%d", insn_counter);

>           }

> +       /* %` guards parts of the template that should not participate in

> +          estimating its cost, e.g. where it switches to a non-text section.

> +          It does not result in any output.  */

> +       else if (*p == '`')

> +         {

> +           p++;

> +           in_backticks = !in_backticks;

> +           /* Leave room for future extensions.  */

> +           if (*p && !ISSPACE (*p))

> +             output_operand_lossage ("%%` must be followed by whitespace");

> +         }

>         /* % followed by a letter and some digits

>            outputs an operand in a special way depending on the letter.

>            Letters `acln' are implemented directly.

> @@ -3995,6 +4018,9 @@ output_asm_insn (const char *templ, rtx *operands)

>      output_asm_name ();

>

>    putc ('\n', asm_out_file);

> +

> +  if (in_backticks)

> +    output_operand_lossage ("missing closing %%`");

>  }

>

>  /* Output a LABEL_REF, or a bare CODE_LABEL, as an assembler symbol.  */

> diff --git a/gcc/rtl.h b/gcc/rtl.h

> index 68d3ceab29f..d29229d2817 100644

> --- a/gcc/rtl.h

> +++ b/gcc/rtl.h

> @@ -4288,7 +4288,7 @@ extern void simplify_using_condition (rtx, rtx *, bitmap);

>  /* In final.c  */

>  extern unsigned int compute_alignments (void);

>  extern void update_alignments (vec<rtx> &);

> -extern int asm_str_count (const char *templ);

> +extern int asm_str_count (const char *, bool);

>

>  struct rtl_hooks

>  {

> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c

> index 9352acc8af6..890df5f86de 100644

> --- a/gcc/tree-inline.c

> +++ b/gcc/tree-inline.c

> @@ -4103,7 +4103,9 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)

>

>      case GIMPLE_ASM:

>        {

> -       int count = asm_str_count (gimple_asm_string (as_a <gasm *> (stmt)));

> +       gasm *asm_stmt = as_a <gasm *> (stmt);

> +       const char *templ = gimple_asm_string (asm_stmt);

> +       int count = asm_str_count (templ, gimple_asm_input_p (asm_stmt));

>         /* 1000 means infinity. This avoids overflows later

>            with very long asm statements.  */

>         if (count > 1000)

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index 6332e68df05..f48129ae076 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -11026,7 +11026,8 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,

>        /* Just a guess.  Guess number of instructions in the asm

>           plus one insn per input.  Always a minimum of COSTS_N_INSNS (1)

>           though (see PR60663).  */

> -        int asm_length = MAX (1, asm_str_count (ASM_OPERANDS_TEMPLATE (x)));

> +        const char *templ = ASM_OPERANDS_TEMPLATE (x);

> +        int asm_length = MAX (1, asm_str_count (templ, false));

>          int num_operands = ASM_OPERANDS_INPUT_LENGTH (x);

>

>          *cost = COSTS_N_INSNS (asm_length + num_operands);
Alexander Monakov Oct. 15, 2018, 10:36 a.m. | #2
On Mon, 15 Oct 2018, Richard Biener wrote:
> 

> Oh, and I personally find %` ugly ;)  What non-alnum chars

> are taken by backends?


I think only double quote, backslash, backtick remain unclaimed. And of course
ASCII \0 through \040 and \177 ;)

Alexander
Jakub Jelinek Oct. 15, 2018, 10:45 a.m. | #3
On Mon, Oct 15, 2018 at 01:36:36PM +0300, Alexander Monakov wrote:
> On Mon, 15 Oct 2018, Richard Biener wrote:

> > 

> > Oh, and I personally find %` ugly ;)  What non-alnum chars

> > are taken by backends?

> 

> I think only double quote, backslash, backtick remain unclaimed. And of course

> ASCII \0 through \040 and \177 ;)


As has been said, the way microblaze claims non-alnum characters it doesn't
support is just bogus, so we shouldn't consider them to be taken.

	Jakub
Alexander Monakov Oct. 15, 2018, 10:53 a.m. | #4
On Mon, 15 Oct 2018, Jakub Jelinek wrote:

> On Mon, Oct 15, 2018 at 01:36:36PM +0300, Alexander Monakov wrote:

> > On Mon, 15 Oct 2018, Richard Biener wrote:

> > > 

> > > Oh, and I personally find %` ugly ;)  What non-alnum chars

> > > are taken by backends?

> > 

> > I think only double quote, backslash, backtick remain unclaimed. And of course

> > ASCII \0 through \040 and \177 ;)

> 

> As has been said, the way microblaze claims non-alnum characters it doesn't

> support is just bogus, so we shouldn't consider them to be taken.


I understand - I've made an effort to manually go through the backends and
find characters they meaningfully handle in their print_operand hooks. In
particular MIPS handles all of []()<> (but %[ is special anyway, for
%[name] substitution).

Alexander
Jakub Jelinek Oct. 15, 2018, 10:56 a.m. | #5
On Mon, Oct 15, 2018 at 01:53:09PM +0300, Alexander Monakov wrote:
> On Mon, 15 Oct 2018, Jakub Jelinek wrote:

> 

> > On Mon, Oct 15, 2018 at 01:36:36PM +0300, Alexander Monakov wrote:

> > > On Mon, 15 Oct 2018, Richard Biener wrote:

> > > > 

> > > > Oh, and I personally find %` ugly ;)  What non-alnum chars

> > > > are taken by backends?

> > > 

> > > I think only double quote, backslash, backtick remain unclaimed. And of course

> > > ASCII \0 through \040 and \177 ;)

> > 

> > As has been said, the way microblaze claims non-alnum characters it doesn't

> > support is just bogus, so we shouldn't consider them to be taken.

> 

> I understand - I've made an effort to manually go through the backends and

> find characters they meaningfully handle in their print_operand hooks. In

> particular MIPS handles all of []()<> (but %[ is special anyway, for

> %[name] substitution).


Ugh.  Wonder how %[name] then works on mips or if its %[ something %] works.

	Jakub
Richard Sandiford Oct. 15, 2018, 11:28 a.m. | #6
Jakub Jelinek <jakub@redhat.com> writes:

> On Mon, Oct 15, 2018 at 01:53:09PM +0300, Alexander Monakov wrote:

>> On Mon, 15 Oct 2018, Jakub Jelinek wrote:

>> 

>> > On Mon, Oct 15, 2018 at 01:36:36PM +0300, Alexander Monakov wrote:

>> > > On Mon, 15 Oct 2018, Richard Biener wrote:

>> > > > 

>> > > > Oh, and I personally find %` ugly ;)  What non-alnum chars

>> > > > are taken by backends?

>> > > 

>> > > I think only double quote, backslash, backtick remain

>> > > unclaimed. And of course

>> > > ASCII \0 through \040 and \177 ;)

>> > 

>> > As has been said, the way microblaze claims non-alnum characters it doesn't

>> > support is just bogus, so we shouldn't consider them to be taken.

>> 

>> I understand - I've made an effort to manually go through the backends and

>> find characters they meaningfully handle in their print_operand hooks. In

>> particular MIPS handles all of []()<> (but %[ is special anyway, for

>> %[name] substitution).

>

> Ugh.  Wonder how %[name] then works on mips or if its %[ something %] works.


It's only provided for .md patterns (and probably predates the
named operands in inline asms), so in practice there's no problem.

Richard
Richard Biener Oct. 15, 2018, 1:31 p.m. | #7
On Mon, Oct 15, 2018 at 12:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>

> On Mon, 15 Oct 2018, Richard Biener wrote:

> >

> > Oh, and I personally find %` ugly ;)  What non-alnum chars

> > are taken by backends?

>

> I think only double quote, backslash, backtick remain unclaimed. And of course

> ASCII \0 through \040 and \177 ;)


I see.  Apart from using some of the traditional begin-end sequences we
could use %; or similar on each line to "comment" it?

Richard.

> Alexander
Alexander Monakov Oct. 15, 2018, 2:01 p.m. | #8
> > I think only double quote, backslash, backtick remain unclaimed. And of course

> > ASCII \0 through \040 and \177 ;)

> 

> I see.  Apart from using some of the traditional begin-end sequences we

> could use %; or similar on each line to "comment" it?


I guess in theory we could define percent-backslash-separator to not count,
but wouldn't that go a bit into micro-management territory?  In the
kernel usecase the main goal would be to "comment" one block of lines, not
meticulously mark up each and every non-insn line.

Alexander
Segher Boessenkool Oct. 15, 2018, 5:33 p.m. | #9
On Sun, Oct 14, 2018 at 11:07:20PM +0300, Alexander Monakov wrote:
> impacts inlining decisions badly, since GCC assumes cost of the asm to be

> high, even though it emits just one instruction to the text section. I'd

> like to point out that branch range optimization is also negatively affected.


The "length" of an asm is currently calculated as a conservatively correct
number (which you can trick of course, but that aside).  This is on purpose.
This is also documented.  And you are destroying this.

> Kernels developers can then use this extension via

> 

> [if gcc-9 or compatible]

> #define ASM_NONTEXT_BEGIN "%`\n"

> [else]

> #define ASM_NONTEXT_BEGIN "\n"

> [endif]

> 

> #define ASM_NONTEXT_END ASM_NONTEXT_BEGIN


Have you tested this?  This counts everything as being longer than you
supposedly want.

> +@item %`

> +Signifies a boundary of a region where instruction separators are not

> +counted towards its cost (@pxref{Size of an asm}). Must be followed by

> +a whitespace character.

>  @end table


It's not cost.

> +Likewise, it is possible for GCC to significantly over-estimate the

> +number of instructions in an @code{asm}, resulting in suboptimal decisions

> +when the estimate is used during inlining and branch range optimization.


GCC estimates the size of an asm conservatively.  Without this you get
ICEs all over the place.  Any proposed "adjust inlining cost of inline asm"
extension that adjusts instruction length instead (and can change it to too
low) will cause us too much trouble, and users a bad experience using the
compiler.  IMO of course.

> +	    /* Leave room for future extensions.  */

> +	    if (*p && !ISSPACE (*p))

> +	      output_operand_lossage ("%%` must be followed by whitespace");


What does that mean?  And, why?

This isn't documented either.

> +  if (in_backticks)

> +    output_operand_lossage ("missing closing %%`");


Why?


Segher
Segher Boessenkool Oct. 15, 2018, 6:21 p.m. | #10
On Sun, Oct 14, 2018 at 11:07:20PM +0300, Alexander Monakov wrote:
> For Basic asms, no similar mechanism is necessary since they are antithetical

> to efficiency in the first place.


I missed this part.

  asm("bla");

means almost the same as

  asm("bla" : );

and there is nothing in there that is bad for optimisation.  It's not like
adding some garbage input arg will help.

Also, very many inline asm "in the field" are written as basic inline asm.
Including many of the problem cases.


Segher
Alexander Monakov Oct. 15, 2018, 7:45 p.m. | #11
On Mon, 15 Oct 2018, Segher Boessenkool wrote:
> On Sun, Oct 14, 2018 at 11:07:20PM +0300, Alexander Monakov wrote:

> > For Basic asms, no similar mechanism is necessary since they are antithetical

> > to efficiency in the first place.

> 

> I missed this part.

> 

>   asm("bla");

> 

> means almost the same as

> 

>   asm("bla" : );

> 

> and there is nothing in there that is bad for optimisation.


The extended asm does not clobber all memory, unlike its basic counterpart.

Alexander
Segher Boessenkool Oct. 15, 2018, 8:15 p.m. | #12
On Mon, Oct 15, 2018 at 10:45:08PM +0300, Alexander Monakov wrote:
> On Mon, 15 Oct 2018, Segher Boessenkool wrote:

> > On Sun, Oct 14, 2018 at 11:07:20PM +0300, Alexander Monakov wrote:

> > > For Basic asms, no similar mechanism is necessary since they are antithetical

> > > to efficiency in the first place.

> > 

> > I missed this part.

> > 

> >   asm("bla");

> > 

> > means almost the same as

> > 

> >   asm("bla" : );

> > 

> > and there is nothing in there that is bad for optimisation.

> 

> The extended asm does not clobber all memory, unlike its basic counterpart.


Yeah, that's new in GCC 7, and I keep forgetting.  I'm still in the
denial phase for this.


Segher
Alexander Monakov Oct. 31, 2018, 3:39 p.m. | #13
On Mon, 15 Oct 2018, Richard Biener wrote:
> I think it's sound but also note that I think it is logically independent of

> asm inline ().  While it may work for the inlining issue for some kernel

> examples to asm inline () is sth similar to always_inline for functions,

> that is, even though an asm _does_ have non-negligible .text size

> we do want to ignore that for the purpose of inlining (but not for the

> purpose of branch size estimation).


My understanding is that kernel folks are not demanding "always_inline"
semantics though; they were unhappy with inlining cost mis-estimation.

FWIW they went ahead and worked around the problem on their end via
assembler macros (and Borislav's ping has got no response so far).

> Note in your docs you refer to "non-code" sections but it should

> equally apply to .text sections that are not the section of the asm

> context (.text.cold, for example).  So better wording there would

> be appreciated.


I've tried to address this with the following incremental change. I think
it was the only suggestion, so the rest of the patch is unchanged.

@@ -9849,11 +9849,11 @@ a label is unreachable.
 Likewise, it is possible for GCC to significantly over-estimate the
 number of instructions in an @code{asm}, resulting in suboptimal decisions
 when the estimate is used during inlining and branch range optimization.
-This can happen if the @code{asm} template has many lines that do not
-correspond to instructions, for example when the @samp{.pushsection}
-directive is used to emit auxiliary data in a non-code section.
+For instance, this can happen if a @samp{.pushsection} directive is used to
+temporarily switch to another section and emit auxiliary code or data there.
 For Extended @code{asm} statements, you can improve the estimate by
-wrapping the non-code portion in @samp{%` ... %`} delimiters.
+wrapping the parts where newlines and separators should not be counted in
+@samp{%` ... %`} delimiters.


        * doc/extend.texi (Extended Asm): Document %` in template.
        (Size of an Asm): Document intended use of %`.
        * final.c (asm_insn_count): Adjust.
        (asm_str_count): Add argument to distinguish basic and extended asms.
        In extended asms, ignore separators inside of %` ... %`.
        (output_asm_insn): Handle %`.
        * rtl.h (asm_str_count): Adjust prototype.
        * tree-inline.c (estimate_num_insns): Adjust.
        * config/arm/arm.c (arm_rtx_costs_internal): Adjust.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 34ecc4f8d14..dac4728375b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8622,6 +8622,11 @@ generates multiple assembler instructions.
 Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively)
 into the assembler code.  When unescaped, these characters have special
 meaning to indicate multiple assembler dialects, as described below.
+
+@item %`
+Signifies a boundary of a region where instruction separators are not
+counted towards its size (@pxref{Size of an asm}). Must be followed by
+a whitespace character.
 @end table
 
 @subsubheading Multiple assembler dialects in @code{asm} templates
@@ -9830,7 +9835,7 @@ does this by counting the number of instructions in the pattern of the
 @code{asm} and multiplying that by the length of the longest
 instruction supported by that processor.  (When working out the number
 of instructions, it assumes that any occurrence of a newline or of
-whatever statement separator character is supported by the assembler --
+whatever statement separator character is supported by the assembler ---
 typically @samp{;} --- indicates the end of an instruction.)
 
 Normally, GCC's estimate is adequate to ensure that correct
@@ -9841,6 +9846,15 @@ space in the object file than is needed for a single instruction.
 If this happens then the assembler may produce a diagnostic saying that
 a label is unreachable.
 
+Likewise, it is possible for GCC to significantly over-estimate the
+number of instructions in an @code{asm}, resulting in suboptimal decisions
+when the estimate is used during inlining and branch range optimization.
+For instance, this can happen if a @samp{.pushsection} directive is used to
+temporarily switch to another section and emit auxiliary code or data there.
+For Extended @code{asm} statements, you can improve the estimate by
+wrapping the parts where newlines and separators should not be counted in
+@samp{%` ... %`} delimiters.
+
 @node Alternate Keywords
 @section Alternate Keywords
 @cindex alternate keywords
diff --git a/gcc/final.c b/gcc/final.c
index 6e61f1e17a8..1953293d379 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -1408,29 +1408,40 @@ static int
 asm_insn_count (rtx body)
 {
   const char *templ;
+  bool basic = GET_CODE (body) == ASM_INPUT;
 
-  if (GET_CODE (body) == ASM_INPUT)
+  if (basic)
     templ = XSTR (body, 0);
   else
     templ = decode_asm_operands (body, NULL, NULL, NULL, NULL, NULL);
 
-  return asm_str_count (templ);
+  return asm_str_count (templ, basic);
 }
 
 /* Return the number of machine instructions likely to be generated for the
-   inline-asm template. */
+   inline-asm template.  BASIC indicates if it is used in a basic asm.  */
 int
-asm_str_count (const char *templ)
+asm_str_count (const char *templ, bool basic)
 {
   int count = 1;
+  bool in_backticks = false;
 
   if (!*templ)
     return 0;
 
   for (; *templ; templ++)
-    if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)
-	|| *templ == '\n')
-      count++;
+    if (*templ == '%' && !basic)
+      {
+	templ++;
+	if (!*templ)
+	  break;
+	/* Separators inside %` ... %` are not counted.  */
+	if (*templ == '`')
+	  in_backticks = !in_backticks;
+      }
+    else if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)
+	     || *templ == '\n')
+      count += !in_backticks;
 
   return count;
 }
@@ -3825,6 +3836,7 @@ output_asm_insn (const char *templ, rtx *operands)
   int oporder[MAX_RECOG_OPERANDS];
   char opoutput[MAX_RECOG_OPERANDS];
   int ops = 0;
+  bool in_backticks = false;
 
   /* An insn may return a null string template
      in a case where no assembler code is needed.  */
@@ -3891,6 +3903,17 @@ output_asm_insn (const char *templ, rtx *operands)
 	    p++;
 	    fprintf (asm_out_file, "%d", insn_counter);
 	  }
+	/* %` guards parts of the template that should not participate in
+	   estimating its cost, e.g. where it switches to a non-text section.
+	   It does not result in any output.  */
+	else if (*p == '`')
+	  {
+	    p++;
+	    in_backticks = !in_backticks;
+	    /* Leave room for future extensions.  */
+	    if (*p && !ISSPACE (*p))
+	      output_operand_lossage ("%%` must be followed by whitespace");
+	  }
 	/* % followed by a letter and some digits
 	   outputs an operand in a special way depending on the letter.
 	   Letters `acln' are implemented directly.
@@ -3984,6 +4007,9 @@ output_asm_insn (const char *templ, rtx *operands)
     output_asm_name ();
 
   putc ('\n', asm_out_file);
+
+  if (in_backticks)
+    output_operand_lossage ("missing closing %%`");
 }
 
 /* Output a LABEL_REF, or a bare CODE_LABEL, as an assembler symbol.  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 68d3ceab29f..d29229d2817 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4288,7 +4288,7 @@ extern void simplify_using_condition (rtx, rtx *, bitmap);
 /* In final.c  */
 extern unsigned int compute_alignments (void);
 extern void update_alignments (vec<rtx> &);
-extern int asm_str_count (const char *templ);
+extern int asm_str_count (const char *, bool);
 
 struct rtl_hooks
 {
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 913425394e0..70004528339 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4103,7 +4103,9 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
 
     case GIMPLE_ASM:
       {
-	int count = asm_str_count (gimple_asm_string (as_a <gasm *> (stmt)));
+	gasm *asm_stmt = as_a <gasm *> (stmt);
+	const char *templ = gimple_asm_string (asm_stmt);
+	int count = asm_str_count (templ, gimple_asm_input_p (asm_stmt));
 	/* 1000 means infinity. This avoids overflows later
 	   with very long asm statements.  */
 	if (count > 1000)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..63c35400dc9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -11026,7 +11026,8 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
       /* Just a guess.  Guess number of instructions in the asm
          plus one insn per input.  Always a minimum of COSTS_N_INSNS (1)
          though (see PR60663).  */
-        int asm_length = MAX (1, asm_str_count (ASM_OPERANDS_TEMPLATE (x)));
+        const char *templ = ASM_OPERANDS_TEMPLATE (x);
+        int asm_length = MAX (1, asm_str_count (templ, false));
         int num_operands = ASM_OPERANDS_INPUT_LENGTH (x);
 
         *cost = COSTS_N_INSNS (asm_length + num_operands);
Segher Boessenkool Oct. 31, 2018, 10:26 p.m. | #14
On Wed, Oct 31, 2018 at 06:39:31PM +0300, Alexander Monakov wrote:
> FWIW they went ahead and worked around the problem on their end via

> assembler macros


They have to anyway; they have to support GCC < 9 for a while.  At the
very least until GCC 9 is released!  ;-)

> (and Borislav's ping has got no response so far).


What do you want to see?  I've talked to him, and I've posted a new version
of the asm inline patch.  Is there any open issue?


Segher
Alexander Monakov Nov. 29, 2018, 4:44 p.m. | #15
> On Mon, 15 Oct 2018, Richard Biener wrote:

> > I think it's sound but also note that I think it is logically independent of

> > asm inline ().  While it may work for the inlining issue for some kernel

> > examples to asm inline () is sth similar to always_inline for functions,

> > that is, even though an asm _does_ have non-negligible .text size

> > we do want to ignore that for the purpose of inlining (but not for the

> > purpose of branch size estimation).

> 

> My understanding is that kernel folks are not demanding "always_inline"

> semantics though; they were unhappy with inlining cost mis-estimation.


Ping - as I think this approach addresses the root of the problem, I wouldn't
like it to be forgotten.

> > Note in your docs you refer to "non-code" sections but it should

> > equally apply to .text sections that are not the section of the asm

> > context (.text.cold, for example).  So better wording there would

> > be appreciated.

> 

> I've tried to address this with the following incremental change. I think

> it was the only suggestion, so the rest of the patch is unchanged.

> 

> @@ -9849,11 +9849,11 @@ a label is unreachable.

>  Likewise, it is possible for GCC to significantly over-estimate the

>  number of instructions in an @code{asm}, resulting in suboptimal decisions

>  when the estimate is used during inlining and branch range optimization.

> -This can happen if the @code{asm} template has many lines that do not

> -correspond to instructions, for example when the @samp{.pushsection}

> -directive is used to emit auxiliary data in a non-code section.

> +For instance, this can happen if a @samp{.pushsection} directive is used to

> +temporarily switch to another section and emit auxiliary code or data there.

>  For Extended @code{asm} statements, you can improve the estimate by

> -wrapping the non-code portion in @samp{%` ... %`} delimiters.

> +wrapping the parts where newlines and separators should not be counted in

> +@samp{%` ... %`} delimiters.

> 

> 

>         * doc/extend.texi (Extended Asm): Document %` in template.

>         (Size of an Asm): Document intended use of %`.

>         * final.c (asm_insn_count): Adjust.

>         (asm_str_count): Add argument to distinguish basic and extended asms.

>         In extended asms, ignore separators inside of %` ... %`.

>         (output_asm_insn): Handle %`.

>         * rtl.h (asm_str_count): Adjust prototype.

>         * tree-inline.c (estimate_num_insns): Adjust.

>         * config/arm/arm.c (arm_rtx_costs_internal): Adjust.

> 

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi

> index 34ecc4f8d14..dac4728375b 100644

> --- a/gcc/doc/extend.texi

> +++ b/gcc/doc/extend.texi

> @@ -8622,6 +8622,11 @@ generates multiple assembler instructions.

>  Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively)

>  into the assembler code.  When unescaped, these characters have special

>  meaning to indicate multiple assembler dialects, as described below.

> +

> +@item %`

> +Signifies a boundary of a region where instruction separators are not

> +counted towards its size (@pxref{Size of an asm}). Must be followed by

> +a whitespace character.

>  @end table

>  

>  @subsubheading Multiple assembler dialects in @code{asm} templates

> @@ -9830,7 +9835,7 @@ does this by counting the number of instructions in the pattern of the

>  @code{asm} and multiplying that by the length of the longest

>  instruction supported by that processor.  (When working out the number

>  of instructions, it assumes that any occurrence of a newline or of

> -whatever statement separator character is supported by the assembler --

> +whatever statement separator character is supported by the assembler ---

>  typically @samp{;} --- indicates the end of an instruction.)

>  

>  Normally, GCC's estimate is adequate to ensure that correct

> @@ -9841,6 +9846,15 @@ space in the object file than is needed for a single instruction.

>  If this happens then the assembler may produce a diagnostic saying that

>  a label is unreachable.

>  

> +Likewise, it is possible for GCC to significantly over-estimate the

> +number of instructions in an @code{asm}, resulting in suboptimal decisions

> +when the estimate is used during inlining and branch range optimization.

> +For instance, this can happen if a @samp{.pushsection} directive is used to

> +temporarily switch to another section and emit auxiliary code or data there.

> +For Extended @code{asm} statements, you can improve the estimate by

> +wrapping the parts where newlines and separators should not be counted in

> +@samp{%` ... %`} delimiters.

> +

>  @node Alternate Keywords

>  @section Alternate Keywords

>  @cindex alternate keywords

> diff --git a/gcc/final.c b/gcc/final.c

> index 6e61f1e17a8..1953293d379 100644

> --- a/gcc/final.c

> +++ b/gcc/final.c

> @@ -1408,29 +1408,40 @@ static int

>  asm_insn_count (rtx body)

>  {

>    const char *templ;

> +  bool basic = GET_CODE (body) == ASM_INPUT;

>  

> -  if (GET_CODE (body) == ASM_INPUT)

> +  if (basic)

>      templ = XSTR (body, 0);

>    else

>      templ = decode_asm_operands (body, NULL, NULL, NULL, NULL, NULL);

>  

> -  return asm_str_count (templ);

> +  return asm_str_count (templ, basic);

>  }

>  

>  /* Return the number of machine instructions likely to be generated for the

> -   inline-asm template. */

> +   inline-asm template.  BASIC indicates if it is used in a basic asm.  */

>  int

> -asm_str_count (const char *templ)

> +asm_str_count (const char *templ, bool basic)

>  {

>    int count = 1;

> +  bool in_backticks = false;

>  

>    if (!*templ)

>      return 0;

>  

>    for (; *templ; templ++)

> -    if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)

> -	|| *templ == '\n')

> -      count++;

> +    if (*templ == '%' && !basic)

> +      {

> +	templ++;

> +	if (!*templ)

> +	  break;

> +	/* Separators inside %` ... %` are not counted.  */

> +	if (*templ == '`')

> +	  in_backticks = !in_backticks;

> +      }

> +    else if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)

> +	     || *templ == '\n')

> +      count += !in_backticks;

>  

>    return count;

>  }

> @@ -3825,6 +3836,7 @@ output_asm_insn (const char *templ, rtx *operands)

>    int oporder[MAX_RECOG_OPERANDS];

>    char opoutput[MAX_RECOG_OPERANDS];

>    int ops = 0;

> +  bool in_backticks = false;

>  

>    /* An insn may return a null string template

>       in a case where no assembler code is needed.  */

> @@ -3891,6 +3903,17 @@ output_asm_insn (const char *templ, rtx *operands)

>  	    p++;

>  	    fprintf (asm_out_file, "%d", insn_counter);

>  	  }

> +	/* %` guards parts of the template that should not participate in

> +	   estimating its cost, e.g. where it switches to a non-text section.

> +	   It does not result in any output.  */

> +	else if (*p == '`')

> +	  {

> +	    p++;

> +	    in_backticks = !in_backticks;

> +	    /* Leave room for future extensions.  */

> +	    if (*p && !ISSPACE (*p))

> +	      output_operand_lossage ("%%` must be followed by whitespace");

> +	  }

>  	/* % followed by a letter and some digits

>  	   outputs an operand in a special way depending on the letter.

>  	   Letters `acln' are implemented directly.

> @@ -3984,6 +4007,9 @@ output_asm_insn (const char *templ, rtx *operands)

>      output_asm_name ();

>  

>    putc ('\n', asm_out_file);

> +

> +  if (in_backticks)

> +    output_operand_lossage ("missing closing %%`");

>  }

>  

>  /* Output a LABEL_REF, or a bare CODE_LABEL, as an assembler symbol.  */

> diff --git a/gcc/rtl.h b/gcc/rtl.h

> index 68d3ceab29f..d29229d2817 100644

> --- a/gcc/rtl.h

> +++ b/gcc/rtl.h

> @@ -4288,7 +4288,7 @@ extern void simplify_using_condition (rtx, rtx *, bitmap);

>  /* In final.c  */

>  extern unsigned int compute_alignments (void);

>  extern void update_alignments (vec<rtx> &);

> -extern int asm_str_count (const char *templ);

> +extern int asm_str_count (const char *, bool);

>  

>  struct rtl_hooks

>  {

> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c

> index 913425394e0..70004528339 100644

> --- a/gcc/tree-inline.c

> +++ b/gcc/tree-inline.c

> @@ -4103,7 +4103,9 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)

>  

>      case GIMPLE_ASM:

>        {

> -	int count = asm_str_count (gimple_asm_string (as_a <gasm *> (stmt)));

> +	gasm *asm_stmt = as_a <gasm *> (stmt);

> +	const char *templ = gimple_asm_string (asm_stmt);

> +	int count = asm_str_count (templ, gimple_asm_input_p (asm_stmt));

>  	/* 1000 means infinity. This avoids overflows later

>  	   with very long asm statements.  */

>  	if (count > 1000)

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index 8810df53aa3..63c35400dc9 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -11026,7 +11026,8 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,

>        /* Just a guess.  Guess number of instructions in the asm

>           plus one insn per input.  Always a minimum of COSTS_N_INSNS (1)

>           though (see PR60663).  */

> -        int asm_length = MAX (1, asm_str_count (ASM_OPERANDS_TEMPLATE (x)));

> +        const char *templ = ASM_OPERANDS_TEMPLATE (x);

> +        int asm_length = MAX (1, asm_str_count (templ, false));

>          int num_operands = ASM_OPERANDS_INPUT_LENGTH (x);

>  

>          *cost = COSTS_N_INSNS (asm_length + num_operands);

>
Richard Biener Nov. 30, 2018, 7:34 a.m. | #16
On Thu, Nov 29, 2018 at 5:44 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>

> > On Mon, 15 Oct 2018, Richard Biener wrote:

> > > I think it's sound but also note that I think it is logically independent of

> > > asm inline ().  While it may work for the inlining issue for some kernel

> > > examples to asm inline () is sth similar to always_inline for functions,

> > > that is, even though an asm _does_ have non-negligible .text size

> > > we do want to ignore that for the purpose of inlining (but not for the

> > > purpose of branch size estimation).

> >

> > My understanding is that kernel folks are not demanding "always_inline"

> > semantics though; they were unhappy with inlining cost mis-estimation.

>

> Ping - as I think this approach addresses the root of the problem, I wouldn't

> like it to be forgotten.


I agree this is also useful but it addresses another issue (that may appear to
be related).  asm inline is really a hint to the inliner estimates (with no way
to get semantics botched) while marking off-section parts is making the
asm text more precise also affecting code generation and thus has the
possibility to cause correctness issues (if you say mark everything as
off-section just to make it inline better).

I'm sympathtetic to both patches but clearly the kernel folks have shown
need for the inline hint (arguably the kernel folks are the ones we could
expect to get usages of %` correct ...).  I haven't seen any comments
from possible users of %`

Richard.

> > > Note in your docs you refer to "non-code" sections but it should

> > > equally apply to .text sections that are not the section of the asm

> > > context (.text.cold, for example).  So better wording there would

> > > be appreciated.

> >

> > I've tried to address this with the following incremental change. I think

> > it was the only suggestion, so the rest of the patch is unchanged.

> >

> > @@ -9849,11 +9849,11 @@ a label is unreachable.

> >  Likewise, it is possible for GCC to significantly over-estimate the

> >  number of instructions in an @code{asm}, resulting in suboptimal decisions

> >  when the estimate is used during inlining and branch range optimization.

> > -This can happen if the @code{asm} template has many lines that do not

> > -correspond to instructions, for example when the @samp{.pushsection}

> > -directive is used to emit auxiliary data in a non-code section.

> > +For instance, this can happen if a @samp{.pushsection} directive is used to

> > +temporarily switch to another section and emit auxiliary code or data there.

> >  For Extended @code{asm} statements, you can improve the estimate by

> > -wrapping the non-code portion in @samp{%` ... %`} delimiters.

> > +wrapping the parts where newlines and separators should not be counted in

> > +@samp{%` ... %`} delimiters.

> >

> >

> >         * doc/extend.texi (Extended Asm): Document %` in template.

> >         (Size of an Asm): Document intended use of %`.

> >         * final.c (asm_insn_count): Adjust.

> >         (asm_str_count): Add argument to distinguish basic and extended asms.

> >         In extended asms, ignore separators inside of %` ... %`.

> >         (output_asm_insn): Handle %`.

> >         * rtl.h (asm_str_count): Adjust prototype.

> >         * tree-inline.c (estimate_num_insns): Adjust.

> >         * config/arm/arm.c (arm_rtx_costs_internal): Adjust.

> >

> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi

> > index 34ecc4f8d14..dac4728375b 100644

> > --- a/gcc/doc/extend.texi

> > +++ b/gcc/doc/extend.texi

> > @@ -8622,6 +8622,11 @@ generates multiple assembler instructions.

> >  Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively)

> >  into the assembler code.  When unescaped, these characters have special

> >  meaning to indicate multiple assembler dialects, as described below.

> > +

> > +@item %`

> > +Signifies a boundary of a region where instruction separators are not

> > +counted towards its size (@pxref{Size of an asm}). Must be followed by

> > +a whitespace character.

> >  @end table

> >

> >  @subsubheading Multiple assembler dialects in @code{asm} templates

> > @@ -9830,7 +9835,7 @@ does this by counting the number of instructions in the pattern of the

> >  @code{asm} and multiplying that by the length of the longest

> >  instruction supported by that processor.  (When working out the number

> >  of instructions, it assumes that any occurrence of a newline or of

> > -whatever statement separator character is supported by the assembler --

> > +whatever statement separator character is supported by the assembler ---

> >  typically @samp{;} --- indicates the end of an instruction.)

> >

> >  Normally, GCC's estimate is adequate to ensure that correct

> > @@ -9841,6 +9846,15 @@ space in the object file than is needed for a single instruction.

> >  If this happens then the assembler may produce a diagnostic saying that

> >  a label is unreachable.

> >

> > +Likewise, it is possible for GCC to significantly over-estimate the

> > +number of instructions in an @code{asm}, resulting in suboptimal decisions

> > +when the estimate is used during inlining and branch range optimization.

> > +For instance, this can happen if a @samp{.pushsection} directive is used to

> > +temporarily switch to another section and emit auxiliary code or data there.

> > +For Extended @code{asm} statements, you can improve the estimate by

> > +wrapping the parts where newlines and separators should not be counted in

> > +@samp{%` ... %`} delimiters.

> > +

> >  @node Alternate Keywords

> >  @section Alternate Keywords

> >  @cindex alternate keywords

> > diff --git a/gcc/final.c b/gcc/final.c

> > index 6e61f1e17a8..1953293d379 100644

> > --- a/gcc/final.c

> > +++ b/gcc/final.c

> > @@ -1408,29 +1408,40 @@ static int

> >  asm_insn_count (rtx body)

> >  {

> >    const char *templ;

> > +  bool basic = GET_CODE (body) == ASM_INPUT;

> >

> > -  if (GET_CODE (body) == ASM_INPUT)

> > +  if (basic)

> >      templ = XSTR (body, 0);

> >    else

> >      templ = decode_asm_operands (body, NULL, NULL, NULL, NULL, NULL);

> >

> > -  return asm_str_count (templ);

> > +  return asm_str_count (templ, basic);

> >  }

> >

> >  /* Return the number of machine instructions likely to be generated for the

> > -   inline-asm template. */

> > +   inline-asm template.  BASIC indicates if it is used in a basic asm.  */

> >  int

> > -asm_str_count (const char *templ)

> > +asm_str_count (const char *templ, bool basic)

> >  {

> >    int count = 1;

> > +  bool in_backticks = false;

> >

> >    if (!*templ)

> >      return 0;

> >

> >    for (; *templ; templ++)

> > -    if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)

> > -     || *templ == '\n')

> > -      count++;

> > +    if (*templ == '%' && !basic)

> > +      {

> > +     templ++;

> > +     if (!*templ)

> > +       break;

> > +     /* Separators inside %` ... %` are not counted.  */

> > +     if (*templ == '`')

> > +       in_backticks = !in_backticks;

> > +      }

> > +    else if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)

> > +          || *templ == '\n')

> > +      count += !in_backticks;

> >

> >    return count;

> >  }

> > @@ -3825,6 +3836,7 @@ output_asm_insn (const char *templ, rtx *operands)

> >    int oporder[MAX_RECOG_OPERANDS];

> >    char opoutput[MAX_RECOG_OPERANDS];

> >    int ops = 0;

> > +  bool in_backticks = false;

> >

> >    /* An insn may return a null string template

> >       in a case where no assembler code is needed.  */

> > @@ -3891,6 +3903,17 @@ output_asm_insn (const char *templ, rtx *operands)

> >           p++;

> >           fprintf (asm_out_file, "%d", insn_counter);

> >         }

> > +     /* %` guards parts of the template that should not participate in

> > +        estimating its cost, e.g. where it switches to a non-text section.

> > +        It does not result in any output.  */

> > +     else if (*p == '`')

> > +       {

> > +         p++;

> > +         in_backticks = !in_backticks;

> > +         /* Leave room for future extensions.  */

> > +         if (*p && !ISSPACE (*p))

> > +           output_operand_lossage ("%%` must be followed by whitespace");

> > +       }

> >       /* % followed by a letter and some digits

> >          outputs an operand in a special way depending on the letter.

> >          Letters `acln' are implemented directly.

> > @@ -3984,6 +4007,9 @@ output_asm_insn (const char *templ, rtx *operands)

> >      output_asm_name ();

> >

> >    putc ('\n', asm_out_file);

> > +

> > +  if (in_backticks)

> > +    output_operand_lossage ("missing closing %%`");

> >  }

> >

> >  /* Output a LABEL_REF, or a bare CODE_LABEL, as an assembler symbol.  */

> > diff --git a/gcc/rtl.h b/gcc/rtl.h

> > index 68d3ceab29f..d29229d2817 100644

> > --- a/gcc/rtl.h

> > +++ b/gcc/rtl.h

> > @@ -4288,7 +4288,7 @@ extern void simplify_using_condition (rtx, rtx *, bitmap);

> >  /* In final.c  */

> >  extern unsigned int compute_alignments (void);

> >  extern void update_alignments (vec<rtx> &);

> > -extern int asm_str_count (const char *templ);

> > +extern int asm_str_count (const char *, bool);

> >

> >  struct rtl_hooks

> >  {

> > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c

> > index 913425394e0..70004528339 100644

> > --- a/gcc/tree-inline.c

> > +++ b/gcc/tree-inline.c

> > @@ -4103,7 +4103,9 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)

> >

> >      case GIMPLE_ASM:

> >        {

> > -     int count = asm_str_count (gimple_asm_string (as_a <gasm *> (stmt)));

> > +     gasm *asm_stmt = as_a <gasm *> (stmt);

> > +     const char *templ = gimple_asm_string (asm_stmt);

> > +     int count = asm_str_count (templ, gimple_asm_input_p (asm_stmt));

> >       /* 1000 means infinity. This avoids overflows later

> >          with very long asm statements.  */

> >       if (count > 1000)

> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> > index 8810df53aa3..63c35400dc9 100644

> > --- a/gcc/config/arm/arm.c

> > +++ b/gcc/config/arm/arm.c

> > @@ -11026,7 +11026,8 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,

> >        /* Just a guess.  Guess number of instructions in the asm

> >           plus one insn per input.  Always a minimum of COSTS_N_INSNS (1)

> >           though (see PR60663).  */

> > -        int asm_length = MAX (1, asm_str_count (ASM_OPERANDS_TEMPLATE (x)));

> > +        const char *templ = ASM_OPERANDS_TEMPLATE (x);

> > +        int asm_length = MAX (1, asm_str_count (templ, false));

> >          int num_operands = ASM_OPERANDS_INPUT_LENGTH (x);

> >

> >          *cost = COSTS_N_INSNS (asm_length + num_operands);

> >
Alexander Monakov Nov. 30, 2018, 8:33 a.m. | #17
On Fri, 30 Nov 2018, Richard Biener wrote:
> > Ping - as I think this approach addresses the root of the problem, I wouldn't

> > like it to be forgotten.

> 

> I agree this is also useful but it addresses another issue (that may appear to

> be related).  asm inline is really a hint to the inliner estimates (with no way

> to get semantics botched) while marking off-section parts is making the

> asm text more precise also affecting code generation and thus has the

> possibility to cause correctness issues (if you say mark everything as

> off-section just to make it inline better).


I don't think that's true: if the user marks too much of the template as
off-section and makes GCC under-estimate branch ranges, they may receive an
error from the assembler. But that's a build failure, not a correctness
issue. Surely build error is a reasonable outcome from misuse of inline asm.

> I'm sympathtetic to both patches but clearly the kernel folks have shown

> need for the inline hint (arguably the kernel folks are the ones we could

> expect to get usages of %` correct ...).  I haven't seen any comments

> from possible users of %`


FWIW, when I raised the idea in the kernel thread, the response from Borislav
was:

>> I don't mind it but I see you guys are still discussing what would be

>> the better solution here, on the gcc ML. And we, as one user, are a

>> happy camper as long as it does what it is meant to do. But how the

>> feature looks like syntactically is something for gcc folk to decide as

>> they're going to support it for the foreseeable future and I'm very well

>> aware of how important it is for a supportable feature to be designed

>> properly.


Alexander
Richard Biener Nov. 30, 2018, 1:06 p.m. | #18
On Fri, Nov 30, 2018 at 9:33 AM Alexander Monakov <amonakov@ispras.ru> wrote:
>

> On Fri, 30 Nov 2018, Richard Biener wrote:

> > > Ping - as I think this approach addresses the root of the problem, I wouldn't

> > > like it to be forgotten.

> >

> > I agree this is also useful but it addresses another issue (that may appear to

> > be related).  asm inline is really a hint to the inliner estimates (with no way

> > to get semantics botched) while marking off-section parts is making the

> > asm text more precise also affecting code generation and thus has the

> > possibility to cause correctness issues (if you say mark everything as

> > off-section just to make it inline better).

>

> I don't think that's true: if the user marks too much of the template as

> off-section and makes GCC under-estimate branch ranges, they may receive an

> error from the assembler. But that's a build failure, not a correctness

> issue. Surely build error is a reasonable outcome from misuse of inline asm.


Yes, but I say they can't mark all of the asm test off-section to make it more
likely be inlined.  Because that might fire back (only on some weird targets
in weird circumstances).  But they can use asm inline for that.

> > I'm sympathtetic to both patches but clearly the kernel folks have shown

> > need for the inline hint (arguably the kernel folks are the ones we could

> > expect to get usages of %` correct ...).  I haven't seen any comments

> > from possible users of %`

>

> FWIW, when I raised the idea in the kernel thread, the response from Borislav

> was:

>

> >> I don't mind it but I see you guys are still discussing what would be

> >> the better solution here, on the gcc ML. And we, as one user, are a

> >> happy camper as long as it does what it is meant to do. But how the

> >> feature looks like syntactically is something for gcc folk to decide as

> >> they're going to support it for the foreseeable future and I'm very well

> >> aware of how important it is for a supportable feature to be designed

> >> properly.

>

> Alexander

Patch

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cfe6a8e5bb8..798d310061c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8613,6 +8613,11 @@  generates multiple assembler instructions.
 Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively)
 into the assembler code.  When unescaped, these characters have special
 meaning to indicate multiple assembler dialects, as described below.
+
+@item %`
+Signifies a boundary of a region where instruction separators are not
+counted towards its cost (@pxref{Size of an asm}). Must be followed by
+a whitespace character.
 @end table
 
 @subsubheading Multiple assembler dialects in @code{asm} templates
@@ -9821,7 +9826,7 @@  does this by counting the number of instructions in the pattern of the
 @code{asm} and multiplying that by the length of the longest
 instruction supported by that processor.  (When working out the number
 of instructions, it assumes that any occurrence of a newline or of
-whatever statement separator character is supported by the assembler --
+whatever statement separator character is supported by the assembler ---
 typically @samp{;} --- indicates the end of an instruction.)
 
 Normally, GCC's estimate is adequate to ensure that correct
@@ -9832,6 +9837,15 @@  space in the object file than is needed for a single instruction.
 If this happens then the assembler may produce a diagnostic saying that
 a label is unreachable.
 
+Likewise, it is possible for GCC to significantly over-estimate the
+number of instructions in an @code{asm}, resulting in suboptimal decisions
+when the estimate is used during inlining and branch range optimization.
+This can happen if the @code{asm} template has many lines that do not
+correspond to instructions, for example when the @samp{.pushsection}
+directive is used to emit auxiliary data in a non-code section.
+For Extended @code{asm} statements, you can improve the estimate by
+wrapping the non-code portion in @samp{%` ... %`} delimiters.
+
 @node Alternate Keywords
 @section Alternate Keywords
 @cindex alternate keywords
diff --git a/gcc/final.c b/gcc/final.c
index 6943c073d9b..dc474d7f6e1 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -1408,29 +1408,40 @@  static int
 asm_insn_count (rtx body)
 {
   const char *templ;
+  bool basic = GET_CODE (body) == ASM_INPUT;
 
-  if (GET_CODE (body) == ASM_INPUT)
+  if (basic)
     templ = XSTR (body, 0);
   else
     templ = decode_asm_operands (body, NULL, NULL, NULL, NULL, NULL);
 
-  return asm_str_count (templ);
+  return asm_str_count (templ, basic);
 }
 
 /* Return the number of machine instructions likely to be generated for the
-   inline-asm template. */
+   inline-asm template.  BASIC indicates if it is used in a basic asm.  */
 int
-asm_str_count (const char *templ)
+asm_str_count (const char *templ, bool basic)
 {
   int count = 1;
+  bool in_backticks = false;
 
   if (!*templ)
     return 0;
 
   for (; *templ; templ++)
-    if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)
-	|| *templ == '\n')
-      count++;
+    if (*templ == '%' && !basic)
+      {
+	templ++;
+	if (!*templ)
+	  break;
+	/* Separators inside %` ... %` are not counted.  */
+	if (*templ == '`')
+	  in_backticks = !in_backticks;
+      }
+    else if (IS_ASM_LOGICAL_LINE_SEPARATOR (*templ, templ)
+	     || *templ == '\n')
+      count += !in_backticks;
 
   return count;
 }
@@ -3836,6 +3847,7 @@  output_asm_insn (const char *templ, rtx *operands)
   int oporder[MAX_RECOG_OPERANDS];
   char opoutput[MAX_RECOG_OPERANDS];
   int ops = 0;
+  bool in_backticks = false;
 
   /* An insn may return a null string template
      in a case where no assembler code is needed.  */
@@ -3902,6 +3914,17 @@  output_asm_insn (const char *templ, rtx *operands)
 	    p++;
 	    fprintf (asm_out_file, "%d", insn_counter);
 	  }
+	/* %` guards parts of the template that should not participate in
+	   estimating its cost, e.g. where it switches to a non-text section.
+	   It does not result in any output.  */
+	else if (*p == '`')
+	  {
+	    p++;
+	    in_backticks = !in_backticks;
+	    /* Leave room for future extensions.  */
+	    if (*p && !ISSPACE (*p))
+	      output_operand_lossage ("%%` must be followed by whitespace");
+	  }
 	/* % followed by a letter and some digits
 	   outputs an operand in a special way depending on the letter.
 	   Letters `acln' are implemented directly.
@@ -3995,6 +4018,9 @@  output_asm_insn (const char *templ, rtx *operands)
     output_asm_name ();
 
   putc ('\n', asm_out_file);
+
+  if (in_backticks)
+    output_operand_lossage ("missing closing %%`");
 }
 
 /* Output a LABEL_REF, or a bare CODE_LABEL, as an assembler symbol.  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 68d3ceab29f..d29229d2817 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4288,7 +4288,7 @@  extern void simplify_using_condition (rtx, rtx *, bitmap);
 /* In final.c  */
 extern unsigned int compute_alignments (void);
 extern void update_alignments (vec<rtx> &);
-extern int asm_str_count (const char *templ);
+extern int asm_str_count (const char *, bool);
 
 struct rtl_hooks
 {
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 9352acc8af6..890df5f86de 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4103,7 +4103,9 @@  estimate_num_insns (gimple *stmt, eni_weights *weights)
 
     case GIMPLE_ASM:
       {
-	int count = asm_str_count (gimple_asm_string (as_a <gasm *> (stmt)));
+	gasm *asm_stmt = as_a <gasm *> (stmt);
+	const char *templ = gimple_asm_string (asm_stmt);
+	int count = asm_str_count (templ, gimple_asm_input_p (asm_stmt));
 	/* 1000 means infinity. This avoids overflows later
 	   with very long asm statements.  */
 	if (count > 1000)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6332e68df05..f48129ae076 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -11026,7 +11026,8 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
       /* Just a guess.  Guess number of instructions in the asm
          plus one insn per input.  Always a minimum of COSTS_N_INSNS (1)
          though (see PR60663).  */
-        int asm_length = MAX (1, asm_str_count (ASM_OPERANDS_TEMPLATE (x)));
+        const char *templ = ASM_OPERANDS_TEMPLATE (x);
+        int asm_length = MAX (1, asm_str_count (templ, false));
         int num_operands = ASM_OPERANDS_INPUT_LENGTH (x);
 
         *cost = COSTS_N_INSNS (asm_length + num_operands);