[4/5] MSP430: Implement TARGET_INSN_COST

Message ID 20200723155614.jqjgkqtdrnwjahxo@jozef-acer-manjaro
State New
Headers show
Series
  • MSP430: Implement macros to describe relative costs of operations
Related show

Commit Message

Jozef Lawrynowicz July 23, 2020, 3:56 p.m.
The length of an insn can be used to calculate its cost, when optimizing for
size. When optimizing for speed, this is a good estimate, since the cycle cost
of an MSP430 instruction increases with its length.
From e4c5f9c3f567489f89b41a0d96e321acb5d18152 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz <jozef.l@mittosystems.com>

Date: Thu, 16 Jul 2020 11:34:50 +0100
Subject: [PATCH 4/5] MSP430: Implement TARGET_INSN_COST

The length of an insn can be used to calculate its cost, when optimizing for
size. When optimizing for speed, this is a good estimate, since the cycle cost
of an MSP430 instruction increases with its length.

gcc/ChangeLog:

	* config/msp430/msp430.c (TARGET_INSN_COST): Define.
	(msp430_insn_cost): New function.
	* config/msp430/msp430.opt: Add -mdebug-insn-costs option.

gcc/testsuite/ChangeLog:

	* gcc.target/msp430/rtx-cost-O3-default.c: New test.
	* gcc.target/msp430/rtx-cost-O3-f5series.c: New test.
	* gcc.target/msp430/rtx-cost-Os-default.c: New test.
	* gcc.target/msp430/rtx-cost-Os-f5series.c: New test.
---
 gcc/config/msp430/msp430.c                    | 29 +++++++++++++
 gcc/config/msp430/msp430.opt                  |  4 ++
 .../gcc.target/msp430/rtx-cost-O3-default.c   | 42 ++++++++++++++++++
 .../gcc.target/msp430/rtx-cost-O3-f5series.c  | 38 ++++++++++++++++
 .../gcc.target/msp430/rtx-cost-Os-default.c   | 43 +++++++++++++++++++
 .../gcc.target/msp430/rtx-cost-Os-f5series.c  | 38 ++++++++++++++++
 6 files changed, 194 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c

-- 
2.27.0

Comments

Segher Boessenkool July 23, 2020, 6:34 p.m. | #1
Hi!

On Thu, Jul 23, 2020 at 04:56:14PM +0100, Jozef Lawrynowicz wrote:
> +static int

> +msp430_insn_cost (rtx_insn *insn, bool speed ATTRIBUTE_UNUSED)

> +{

> +  int cost;

> +

> +  if (recog_memoized (insn) < 0)

> +    return 0;

> +

> +  cost = get_attr_length (insn);

> +  if (TARGET_DEBUG_INSN_COSTS)

> +    {

> +      fprintf (stderr, "cost %d for insn:\n", cost);

> +      debug_rtx (insn);

> +    }

> +

> +  /* The returned cost must be relative to COSTS_N_INSNS (1). An insn with a

> +     length of 2 bytes is the smallest possible size and so must be equivalent

> +     to COSTS_N_INSNS (1).  */

> +  return COSTS_N_INSNS (cost) / (2 * COSTS_N_INSNS (1));


This is the same as "cost / 2", so "length / 2" here, which doesn't look
right.  The returned value should have the same "unit" as COSTS_N_INSNS
does, so maybe you want  COSTS_N_INSNS (length / 2)  ?

> +  /* FIXME Add more detailed costs when optimizing for speed.

> +     For now the length of the instruction is a good approximiation and roughly

> +     correlates with cycle cost.  *


COSTS_N_INSNS (1) is 4, so that you can make things cost 5, 6, 7 to be a
cost intermediate to COSTS_N_INSNS (1) and COSTS_N_INSNS (2).  This is
very useful, scaling down the costs destroys that.

> +mdebug-insn-costs

> +Target Report Mask(DEBUG_INSN_COSTS)

> +Print insns and their costs as calculated by TARGET_INSN_COSTS.


It is already printed in the generated asm with -dp?  Not sure if you
want more detail than that.

     '-dp'
          Annotate the assembler output with a comment indicating which
          pattern and alternative is used.  The length and cost of each
          instruction are also printed.


Segher
Jozef Lawrynowicz July 24, 2020, 11:50 a.m. | #2
Hi Segher,

Thanks for having a look at the patch.

On Thu, Jul 23, 2020 at 01:34:22PM -0500, Segher Boessenkool wrote:
> Hi!

> 

> On Thu, Jul 23, 2020 at 04:56:14PM +0100, Jozef Lawrynowicz wrote:

> > +static int

> > +msp430_insn_cost (rtx_insn *insn, bool speed ATTRIBUTE_UNUSED)

> > +{

> > +  int cost;

> > +

> > +  if (recog_memoized (insn) < 0)

> > +    return 0;

> > +

> > +  cost = get_attr_length (insn);

> > +  if (TARGET_DEBUG_INSN_COSTS)

> > +    {

> > +      fprintf (stderr, "cost %d for insn:\n", cost);

> > +      debug_rtx (insn);

> > +    }

> > +

> > +  /* The returned cost must be relative to COSTS_N_INSNS (1). An insn with a

> > +     length of 2 bytes is the smallest possible size and so must be equivalent

> > +     to COSTS_N_INSNS (1).  */

> > +  return COSTS_N_INSNS (cost) / (2 * COSTS_N_INSNS (1));

> 

> This is the same as "cost / 2", so "length / 2" here, which doesn't look

> right.  The returned value should have the same "unit" as COSTS_N_INSNS

> does, so maybe you want  COSTS_N_INSNS (length / 2)  ?


Indeed it looks like I made a thinko in that calculation in TARGET_INSN_COSTS;
trying to make it verbose to show the thought behind the calculation backfired
:)

Fixing it to return "COSTS_N_INSNS (length / 2)" actually made codesize
noticeably worse for most of my benchmarks.
I had to define BRANCH_COST to indicate branches are not cheap.

In the original patch a cheap instruction would have a cost of 1.
When using the default BRANCH_COST of 1 to calculate the cost of a branch
compared to an insn (e.g. in ifcvt.c), BRANCH_COST would be wrapped in
COSTS_N_INSNS, scaling the cost to 4, which suitably disparaged
it vs the cheap insn cost of 1.

With the fixed insn_cost calculation, a cheap instruction would cost 4
with the COSTS_N_INSNS scaling, and a branch would cost the same, which is not
right.

Fixed in the attached patch.

> 

> > +  /* FIXME Add more detailed costs when optimizing for speed.

> > +     For now the length of the instruction is a good approximiation and roughly

> > +     correlates with cycle cost.  *

> 

> COSTS_N_INSNS (1) is 4, so that you can make things cost 5, 6, 7 to be a

> cost intermediate to COSTS_N_INSNS (1) and COSTS_N_INSNS (2).  This is

> very useful, scaling down the costs destroys that.

> 

> > +mdebug-insn-costs

> > +Target Report Mask(DEBUG_INSN_COSTS)

> > +Print insns and their costs as calculated by TARGET_INSN_COSTS.

> 

> It is already printed in the generated asm with -dp?  Not sure if you

> want more detail than that.

> 

>      '-dp'

>           Annotate the assembler output with a comment indicating which

>           pattern and alternative is used.  The length and cost of each

>           instruction are also printed.

> 


During development I found it useful to see the insns in RTL format and their
costs alongside that.  In hindsight, it doesn't really have much use in the
finalized patch, so I've removed it.

Thanks!
Jozef

> 

> Segher
From 8b69c5a38006d30d001561d47f7ecbd9bd3ead78 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz <jozef.l@mittosystems.com>

Date: Thu, 16 Jul 2020 11:34:50 +0100
Subject: [PATCH 4/5] MSP430: Implement TARGET_INSN_COST

The length of an insn can be used to calculate its cost, when optimizing for
size. When optimizing for speed, this is a good estimate, since the cycle cost
of an MSP430 instruction increases with its length.

gcc/ChangeLog:

	* config/msp430/msp430.c (TARGET_INSN_COST): Define.
	(msp430_insn_cost): New function.
	* config/msp430/msp430.opt: Add -mdebug-insn-costs option.
	* config/msp430/msp430.h (BRANCH_COST): Define.
	(LOGICAL_OP_NON_SHORT_CIRCUIT): Define.

gcc/testsuite/ChangeLog:

	* gcc.target/msp430/rtx-cost-O3-default.c: New test.
	* gcc.target/msp430/rtx-cost-O3-f5series.c: New test.
	* gcc.target/msp430/rtx-cost-Os-default.c: New test.
	* gcc.target/msp430/rtx-cost-Os-f5series.c: New test.
---
 gcc/config/msp430/msp430.c                    | 25 ++++++++---
 gcc/config/msp430/msp430.h                    |  8 ++++
 .../gcc.target/msp430/rtx-cost-O3-default.c   | 42 ++++++++++++++++++
 .../gcc.target/msp430/rtx-cost-O3-f5series.c  | 38 ++++++++++++++++
 .../gcc.target/msp430/rtx-cost-Os-default.c   | 43 +++++++++++++++++++
 .../gcc.target/msp430/rtx-cost-Os-f5series.c  | 38 ++++++++++++++++
 6 files changed, 189 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index b7daafcc11a..35ccd2817ca 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -1195,11 +1195,6 @@ msp430_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
   return 2 * cost;
 }
 
-/* BRANCH_COST
-   Changing from the default of 1 doesn't affect code generation, presumably
-   because there are no conditional move insns - when a condition is involved,
-   the only option is to use a cbranch.  */
-
 /* For X, which must be a MEM RTX, return TRUE if it is an indirect memory
    reference, @Rn or @Rn+.  */
 static bool
@@ -1711,6 +1706,26 @@ msp430_rtx_costs (rtx x,
       return false;
     }
 }
+
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST msp430_insn_cost
+
+static int
+msp430_insn_cost (rtx_insn *insn, bool speed ATTRIBUTE_UNUSED)
+{
+  if (recog_memoized (insn) < 0)
+    return 0;
+
+  /* The returned cost must be relative to COSTS_N_INSNS (1). An insn with a
+     length of 2 bytes is the smallest possible size and so must be equivalent
+     to COSTS_N_INSNS (1).  */
+  return COSTS_N_INSNS (get_attr_length (insn) / 2);
+
+  /* FIXME Add more detailed costs when optimizing for speed.
+     For now the length of the instruction is a good approximiation and roughly
+     correlates with cycle cost.  */
+}
+
 
 /* Function Entry and Exit */
 
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index b813e825311..830101408a5 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -245,6 +245,14 @@ extern const char *msp430_get_linker_devices_include_path (int, const char **);
 #define HAS_LONG_COND_BRANCH		0
 #define HAS_LONG_UNCOND_BRANCH		0
 
+/* The cost of a branch sequence is roughly 3 "cheap" instructions.  */
+#define BRANCH_COST(speed_p, predictable_p) 3
+
+/* Override the default BRANCH_COST heuristic to indicate that it is preferable
+   to retain short-circuit operations, this results in significantly better
+   codesize and performance.  */
+#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
+
 #define LOAD_EXTEND_OP(M)		ZERO_EXTEND
 #define WORD_REGISTER_OPERATIONS	1
 
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c
new file mode 100644
index 00000000000..1bd6a142002
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and hwmult (none), when compiling at -O3.  */
+
+char arr[2];
+char a;
+char *ptr;
+
+/*
+** foo:
+** ...
+**	MOV.B	\&a, \&arr\+1
+**	MOV.*	#arr\+2, \&ptr
+** ...
+*/
+
+void
+foo (void)
+{
+  arr[1] = a;
+  ptr = arr + 2;
+}
+
+extern void ext (void);
+
+/*
+** bar:
+** ...
+**	CALL.*	#ext
+**	CALL.*	#ext
+** ...
+*/
+
+void
+bar (void)
+{
+  ext ();
+  ext ();
+}
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c
new file mode 100644
index 00000000000..1e48625f2e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mhwmult=f5series" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and f5series hwmult, when compiling at -O3.  */
+
+volatile unsigned long a;
+volatile unsigned int b;
+volatile unsigned long c;
+unsigned long res1;
+unsigned long res2;
+unsigned long res3;
+
+/*
+** foo:
+** ...
+**	MOV.B	#16, R14
+**	CALL.*	#__mspabi_slll
+** ...
+**	MOV.*	\&res2.*
+** ...
+**	RLA.*RLC.*
+** ...
+**	MOV.*	\&res3.*
+** ...
+**	RLA.*RLC.*
+** ...
+*/
+void foo (void)
+{
+  /* Use the shift library function for this.  */
+  res1 = (a << 16) | b;
+  /* Emit 7 inline shifts for this.  */
+  res2 *= 128;
+  /* Perform this multiplication inline, using addition and shifts.  */
+  res3 *= 100;
+}
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c
new file mode 100644
index 00000000000..8f3d1b28049
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and hwmult (none), when compiling at -Os.  */
+
+char arr[2];
+char a;
+char *ptr;
+
+/*
+** foo:
+** ...
+**	MOV.B	\&a, \&arr\+1
+**	MOV.*	#arr\+2, \&ptr
+** ...
+*/
+
+void
+foo (void)
+{
+  arr[1] = a;
+  ptr = arr + 2;
+}
+
+extern void ext (void);
+
+/*
+** bar:
+** ...
+**	MOV.*	#ext, R10
+**	CALL.*	R10
+**	CALL.*	R10
+** ...
+*/
+
+void
+bar (void)
+{
+  ext ();
+  ext ();
+}
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c
new file mode 100644
index 00000000000..bb37f9083d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -mhwmult=f5series" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and f5series hwmult, when compiling at -Os.  */
+
+volatile unsigned long a;
+volatile unsigned int b;
+volatile unsigned long c;
+unsigned long res1;
+unsigned long res2;
+unsigned long res3;
+
+/*
+** foo:
+** ...
+**	MOV.B	#16, R14
+**	CALL.*	#__mspabi_slll
+** ...
+**	MOV.B	#7, R14
+**	CALL.*	#__mspabi_slll
+** ...
+**	MOV.B	#100, R14
+**	MOV.B	#0, R15
+** ...
+**	CALL.*	#__mulsi2_f5
+** ...
+*/
+void foo (void)
+{
+  /* Use the shift library function for this.  */
+  res1 = (a << 16) | b;
+  /* Likewise.  */
+  res2 *= 128;
+  /* Use the hardware multiply library function for this.  */
+  res3 *= 100;
+}
-- 
2.27.0
Segher Boessenkool July 24, 2020, 12:25 p.m. | #3
Hi Jozef,

On Fri, Jul 24, 2020 at 12:50:48PM +0100, Jozef Lawrynowicz wrote:
> On Thu, Jul 23, 2020 at 01:34:22PM -0500, Segher Boessenkool wrote:

> > On Thu, Jul 23, 2020 at 04:56:14PM +0100, Jozef Lawrynowicz wrote:

> > > +  /* The returned cost must be relative to COSTS_N_INSNS (1). An insn with a

> > > +     length of 2 bytes is the smallest possible size and so must be equivalent

> > > +     to COSTS_N_INSNS (1).  */

> > > +  return COSTS_N_INSNS (cost) / (2 * COSTS_N_INSNS (1));

> > 

> > This is the same as "cost / 2", so "length / 2" here, which doesn't look

> > right.  The returned value should have the same "unit" as COSTS_N_INSNS

> > does, so maybe you want  COSTS_N_INSNS (length / 2)  ?

> 

> Indeed it looks like I made a thinko in that calculation in TARGET_INSN_COSTS;

> trying to make it verbose to show the thought behind the calculation backfired

> :)

> 

> Fixing it to return "COSTS_N_INSNS (length / 2)" actually made codesize

> noticeably worse for most of my benchmarks.

> I had to define BRANCH_COST to indicate branches are not cheap.

> 

> In the original patch a cheap instruction would have a cost of 1.

> When using the default BRANCH_COST of 1 to calculate the cost of a branch

> compared to an insn (e.g. in ifcvt.c), BRANCH_COST would be wrapped in

> COSTS_N_INSNS, scaling the cost to 4, which suitably disparaged

> it vs the cheap insn cost of 1.

> 

> With the fixed insn_cost calculation, a cheap instruction would cost 4

> with the COSTS_N_INSNS scaling, and a branch would cost the same, which is not

> right.


There isn't much you can do to battle the "default" cost of 4 -- this is
pervasive throughout the compiler -- so it is much easier to go with the
flow.

> > It is already printed in the generated asm with -dp?  Not sure if you

> > want more detail than that.

> > 

> >      '-dp'

> >           Annotate the assembler output with a comment indicating which

> >           pattern and alternative is used.  The length and cost of each

> >           instruction are also printed.

> > 

> 

> During development I found it useful to see the insns in RTL format and their

> costs alongside that.  In hindsight, it doesn't really have much use in the

> finalized patch, so I've removed it.


There is -dP for that (capital P) :-)  It isn't very pretty, not sure
how that could be improved?

> +/* The cost of a branch sequence is roughly 3 "cheap" instructions.  */

> +#define BRANCH_COST(speed_p, predictable_p) 3

> +

> +/* Override the default BRANCH_COST heuristic to indicate that it is preferable

> +   to retain short-circuit operations, this results in significantly better

> +   codesize and performance.  */

> +#define LOGICAL_OP_NON_SHORT_CIRCUIT 0


That looks just fine :-)


Segher

Patch

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index b7daafcc11a..7ef651fb324 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -1711,6 +1711,35 @@  msp430_rtx_costs (rtx x,
       return false;
     }
 }
+
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST msp430_insn_cost
+
+static int
+msp430_insn_cost (rtx_insn *insn, bool speed ATTRIBUTE_UNUSED)
+{
+  int cost;
+
+  if (recog_memoized (insn) < 0)
+    return 0;
+
+  cost = get_attr_length (insn);
+  if (TARGET_DEBUG_INSN_COSTS)
+    {
+      fprintf (stderr, "cost %d for insn:\n", cost);
+      debug_rtx (insn);
+    }
+
+  /* The returned cost must be relative to COSTS_N_INSNS (1). An insn with a
+     length of 2 bytes is the smallest possible size and so must be equivalent
+     to COSTS_N_INSNS (1).  */
+  return COSTS_N_INSNS (cost) / (2 * COSTS_N_INSNS (1));
+
+  /* FIXME Add more detailed costs when optimizing for speed.
+     For now the length of the instruction is a good approximiation and roughly
+     correlates with cycle cost.  */
+}
+
 
 /* Function Entry and Exit */
 
diff --git a/gcc/config/msp430/msp430.opt b/gcc/config/msp430/msp430.opt
index 8134ca7ac95..448c41aa81c 100644
--- a/gcc/config/msp430/msp430.opt
+++ b/gcc/config/msp430/msp430.opt
@@ -115,3 +115,7 @@  Target RejectNegative Joined UInteger IntegerRange(0,65) Var(msp430_max_inline_s
 For shift operations by a constant amount, which require an individual instruction to shift by one
 position, set the maximum number of inline shift instructions (maximum value 64) to emit instead of using the corresponding __mspabi helper function.
 The default value is 4.
+
+mdebug-insn-costs
+Target Report Mask(DEBUG_INSN_COSTS)
+Print insns and their costs as calculated by TARGET_INSN_COSTS.
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c
new file mode 100644
index 00000000000..1bd6a142002
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c
@@ -0,0 +1,42 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and hwmult (none), when compiling at -O3.  */
+
+char arr[2];
+char a;
+char *ptr;
+
+/*
+** foo:
+** ...
+**	MOV.B	\&a, \&arr\+1
+**	MOV.*	#arr\+2, \&ptr
+** ...
+*/
+
+void
+foo (void)
+{
+  arr[1] = a;
+  ptr = arr + 2;
+}
+
+extern void ext (void);
+
+/*
+** bar:
+** ...
+**	CALL.*	#ext
+**	CALL.*	#ext
+** ...
+*/
+
+void
+bar (void)
+{
+  ext ();
+  ext ();
+}
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c
new file mode 100644
index 00000000000..1e48625f2e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -mhwmult=f5series" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and f5series hwmult, when compiling at -O3.  */
+
+volatile unsigned long a;
+volatile unsigned int b;
+volatile unsigned long c;
+unsigned long res1;
+unsigned long res2;
+unsigned long res3;
+
+/*
+** foo:
+** ...
+**	MOV.B	#16, R14
+**	CALL.*	#__mspabi_slll
+** ...
+**	MOV.*	\&res2.*
+** ...
+**	RLA.*RLC.*
+** ...
+**	MOV.*	\&res3.*
+** ...
+**	RLA.*RLC.*
+** ...
+*/
+void foo (void)
+{
+  /* Use the shift library function for this.  */
+  res1 = (a << 16) | b;
+  /* Emit 7 inline shifts for this.  */
+  res2 *= 128;
+  /* Perform this multiplication inline, using addition and shifts.  */
+  res3 *= 100;
+}
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c
new file mode 100644
index 00000000000..8f3d1b28049
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c
@@ -0,0 +1,43 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and hwmult (none), when compiling at -Os.  */
+
+char arr[2];
+char a;
+char *ptr;
+
+/*
+** foo:
+** ...
+**	MOV.B	\&a, \&arr\+1
+**	MOV.*	#arr\+2, \&ptr
+** ...
+*/
+
+void
+foo (void)
+{
+  arr[1] = a;
+  ptr = arr + 2;
+}
+
+extern void ext (void);
+
+/*
+** bar:
+** ...
+**	MOV.*	#ext, R10
+**	CALL.*	R10
+**	CALL.*	R10
+** ...
+*/
+
+void
+bar (void)
+{
+  ext ();
+  ext ();
+}
diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c
new file mode 100644
index 00000000000..bb37f9083d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Os -mhwmult=f5series" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Verify the MSP430 cost model is working as expected for the default ISA
+   (msp430x) and f5series hwmult, when compiling at -Os.  */
+
+volatile unsigned long a;
+volatile unsigned int b;
+volatile unsigned long c;
+unsigned long res1;
+unsigned long res2;
+unsigned long res3;
+
+/*
+** foo:
+** ...
+**	MOV.B	#16, R14
+**	CALL.*	#__mspabi_slll
+** ...
+**	MOV.B	#7, R14
+**	CALL.*	#__mspabi_slll
+** ...
+**	MOV.B	#100, R14
+**	MOV.B	#0, R15
+** ...
+**	CALL.*	#__mulsi2_f5
+** ...
+*/
+void foo (void)
+{
+  /* Use the shift library function for this.  */
+  res1 = (a << 16) | b;
+  /* Likewise.  */
+  res2 *= 128;
+  /* Use the hardware multiply library function for this.  */
+  res3 *= 100;
+}