[v3] ipa-cp: Fix PGO regression caused by r278808

Message ID 6dd026fb-6d6a-ac06-04f8-9ad05ccfab31@linux.ibm.com
State New
Headers show
Series
  • [v3] ipa-cp: Fix PGO regression caused by r278808
Related show

Commit Message

luoxhu Jan. 14, 2020, 6:45 a.m.
Hi,

On 2020/1/3 00:58, Jan Hubicka wrote:
>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

>> index 14064ae0034..947bf7c7199 100644

>> --- a/gcc/ipa-cp.c

>> +++ b/gcc/ipa-cp.c

>> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,

>>   					      false);

>>     new_sum = stats.count_sum;

>>   

>> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

>> +  if (info && info->node_is_self_scc)

>> +    {

>> +      profile_count self_recursive_count;

>> +

>> +      /* The self recursive edge is on the orig_node.  */

>> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

>> +	if (ipa_edge_within_scc (cs))

>> +	  {

>> +	    cgraph_node *callee = cs->callee->function_symbol ();

>> +	    if (callee && cs->caller == cs->callee)

>> +	      {

>> +		self_recursive_count = cs->count;

>> +		break;

>> +	      }

>> +	  }

> 

> What happens here when there are multiple self recursive calls or when

> the SCC has two mutually recursive functions?

> 

> I am still confused by the logic of this function.  I will take a deeper

> look at your previous email.

>> +

>> +      /* Proportion count for self recursive node from all callers.  */

>> +      new_sum

>> +	= (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

>> +

>> +      /* Proportion count for specialized cloned node.  */

>> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

>> +    }

>> +

>>     if (orig_node_count < orig_sum + new_sum)

>>       {

>>         if (dump_file)

>> diff --git a/gcc/params.opt b/gcc/params.opt

>> index d88ae0c468b..40a03b04580 100644

>> --- a/gcc/params.opt

>> +++ b/gcc/params.opt

>> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param

>>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.

>>   

>>   -param=ipa-cp-max-recursive-depth=

>> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param

>> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param

>>   Maximum depth of recursive cloning for self-recursive function.

> 

> The values stats from 0 but I also wonder why 10 is a good upper bound?

> If the function calls itself with one type of value (like depth-1) then

> we may clone well over 10 times, if it calls itself with two different

> sets then 8 is already quite high I would say...

> 

> I suppose the call probabilities will eventually drop to be very low,

> but I am not quite sure about that because we do not handle any IP

> frequency propagation.  Do we have some way to treat wide trees? Or do

> we clone only all self recursive calls are the same?

> 

> Honza

>>   


Update v3 patch.  This regression could only be reproduced when built with
"-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param
ipa-cp-unit-growth=80" on exchange_2, on some platforms -fno-inline may be
also needed, I attached 3 files (compressed to exchange.tar.gz)
exchange2_gcc.use.orig.wpa.071i.cp.old, exchange2_gcc.use.orig.wpa.071i.cp.new
and cp.diff to show the profile count difference of specialized node
digits_2.constprop/152 to digits_2.constprop/159 without/with this patch.

Profile count is decreasing slowly with this patch instead of keeping very
small from the first clone (very low count as cold will cause complete unroll
not working), it still differs from really execution (exchange2.png), but this
average model takes the recursive edge as feedback.  Thanks.


v3:
1. Enable proportion orig_sum to the new nodes for self recursive node (k means
   multiple self recursive calls):
   new_sum = (orig_sum + new_sum) * 1 / k \
   * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
2. Two mutually recursive functions are not supported in self recursive
   clone yet so also not supported in update_profiling_info here.
3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).
   If it calls itself two different sets, usually recursive boudary limit
   will stop the specialize first, otherwise it is slow even without
   recursive specialize.

The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

	2020-01-14  Luo Xiong Hu  <luoxhu@linux.ibm.com>

	* ipa-cp.c (update_profiling_info): Check self_scc node.
	* params.opt (ipa-cp-max-recursive-depth): Set param range.
---
 gcc/ipa-cp.c   | 33 ++++++++++++++++++++++++++++++++-
 gcc/params.opt |  2 +-
 2 files changed, 33 insertions(+), 2 deletions(-)

-- 
2.21.0.777.g83232e3864

Comments

luoxhu Feb. 10, 2020, 3:17 a.m. | #1
Ping,

attachment of https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00764/exchange2.tar.gz
shows the profile count difference on cloned nodes digits_2.constprop.[0...8]
without/with this patch.  Thanks!

Xiong Hu


On 2020/1/14 14:45, luoxhu wrote:
> Hi,

> 

> On 2020/1/3 00:58, Jan Hubicka wrote:

>>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

>>> index 14064ae0034..947bf7c7199 100644

>>> --- a/gcc/ipa-cp.c

>>> +++ b/gcc/ipa-cp.c

>>> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node 

>>> *orig_node,

>>>                             false);

>>>     new_sum = stats.count_sum;

>>> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

>>> +  if (info && info->node_is_self_scc)

>>> +    {

>>> +      profile_count self_recursive_count;

>>> +

>>> +      /* The self recursive edge is on the orig_node.  */

>>> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

>>> +    if (ipa_edge_within_scc (cs))

>>> +      {

>>> +        cgraph_node *callee = cs->callee->function_symbol ();

>>> +        if (callee && cs->caller == cs->callee)

>>> +          {

>>> +        self_recursive_count = cs->count;

>>> +        break;

>>> +          }

>>> +      }

>>

>> What happens here when there are multiple self recursive calls or when

>> the SCC has two mutually recursive functions?

>>

>> I am still confused by the logic of this function.  I will take a deeper

>> look at your previous email.

>>> +

>>> +      /* Proportion count for self recursive node from all callers.  */

>>> +      new_sum

>>> +    = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

>>> +

>>> +      /* Proportion count for specialized cloned node.  */

>>> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

>>> +    }

>>> +

>>>     if (orig_node_count < orig_sum + new_sum)

>>>       {

>>>         if (dump_file)

>>> diff --git a/gcc/params.opt b/gcc/params.opt

>>> index d88ae0c468b..40a03b04580 100644

>>> --- a/gcc/params.opt

>>> +++ b/gcc/params.opt

>>> @@ -199,7 +199,7 @@ Common Joined UInteger 

>>> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param

>>>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds 

>>> or strides known.

>>>   -param=ipa-cp-max-recursive-depth=

>>> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param

>>> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) 

>>> IntegerRange(1, 10) Param

>>>   Maximum depth of recursive cloning for self-recursive function.

>>

>> The values stats from 0 but I also wonder why 10 is a good upper bound?

>> If the function calls itself with one type of value (like depth-1) then

>> we may clone well over 10 times, if it calls itself with two different

>> sets then 8 is already quite high I would say...

>>

>> I suppose the call probabilities will eventually drop to be very low,

>> but I am not quite sure about that because we do not handle any IP

>> frequency propagation.  Do we have some way to treat wide trees? Or do

>> we clone only all self recursive calls are the same?

>>

>> Honza

> 

> Update v3 patch.  This regression could only be reproduced when built with

> "-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param

> ipa-cp-unit-growth=80" on exchange_2, on some platforms -fno-inline may be

> also needed, I attached 3 files (compressed to exchange.tar.gz)

> exchange2_gcc.use.orig.wpa.071i.cp.old, exchange2_gcc.use.orig.wpa.071i.cp.new

> and cp.diff to show the profile count difference of specialized node

> digits_2.constprop/152 to digits_2.constprop/159 without/with this patch.

> 

> Profile count is decreasing slowly with this patch instead of keeping very

> small from the first clone (very low count as cold will cause complete unroll

> not working), it still differs from really execution (exchange2.png), but this

> average model takes the recursive edge as feedback.  Thanks.

> 

> 

> v3:

> 1. Enable proportion orig_sum to the new nodes for self recursive node (k 

> means

>    multiple self recursive calls):

>    new_sum = (orig_sum + new_sum) * 1 / k \

>    * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).

> 2. Two mutually recursive functions are not supported in self recursive

>    clone yet so also not supported in update_profiling_info here.

> 3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).

>    If it calls itself two different sets, usually recursive boudary limit

>    will stop the specialize first, otherwise it is slow even without

>    recursive specialize.

> 

> The performance of exchange2 built with PGO will decrease ~28% by r278808

> due to profile count set incorrectly.  The cloned nodes are updated to a

> very small count caused later pass cunroll fail to unroll the recursive

> function in exchange2.

> 

> digits_2 ->

> digits_2.constprop.0, digits_2.constprop.1, etc.

> 

> gcc/ChangeLog:

> 

>      2020-01-14  Luo Xiong Hu  <luoxhu@linux.ibm.com>

> 

>      * ipa-cp.c (update_profiling_info): Check self_scc node.

>      * params.opt (ipa-cp-max-recursive-depth): Set param range.

> ---

> gcc/ipa-cp.c   | 33 ++++++++++++++++++++++++++++++++-

> gcc/params.opt |  2 +-

> 2 files changed, 33 insertions(+), 2 deletions(-)

> 

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

> index 17da1d8e8a7..8b759168e24 100644

> --- a/gcc/ipa-cp.c

> +++ b/gcc/ipa-cp.c

> @@ -2041,7 +2041,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,

>        /* Recursively generate lattice values with a limited count.  */

>        FOR_EACH_VEC_ELT (val_seeds, i, src_val)

>      {

> -      for (int j = 1; j < max_recursive_depth; j++)

> +      for (int j = 0; j < max_recursive_depth; j++)

>          {

>            tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,

>                               src_val, res_type);

> @@ -4313,6 +4313,37 @@ update_profiling_info (struct cgraph_node *orig_node,

>                            false);

>    new_sum = stats.count_sum;

> 

> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

> +  if (info && info->node_is_self_scc)

> +    {

> +      profile_count self_recursive_count = profile_count::zero ();

> +      unsigned k = 0;

> +

> +      /* The self recursive edge is on the orig_node.  */

> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

> +    if (ipa_edge_within_scc (cs))

> +      {

> +        cgraph_node *callee = cs->callee->function_symbol ();

> +        if (callee && cs->caller == cs->callee)

> +          {

> +        self_recursive_count += cs->count;

> +        k++;

> +          }

> +      }

> +

> +      /* Proportion count for multiple self recursive node.  */

> +      if (k)

> +    self_recursive_count.apply_scale (1, k);

> +

> +      /* Proportion count for self recursive node from all callers.  */

> +      new_sum

> +    = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

> +

> +      /* Proportion count for specialized cloned node.  */

> +      if (param_ipa_cp_max_recursive_depth)

> +    new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

> +    }

> +

>    if (orig_node_count < orig_sum + new_sum)

>      {

>        if (dump_file)

> diff --git a/gcc/params.opt b/gcc/params.opt

> index 31cc20031b1..9eceee5871f 100644

> --- a/gcc/params.opt

> +++ b/gcc/params.opt

> @@ -199,7 +199,7 @@ Common Joined UInteger 

> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza

> Compile-time bonus IPA-CP assigns to candidates which make loop bounds or 

> strides known.

> 

> -param=ipa-cp-max-recursive-depth=

> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param 

> Optimization

> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7) 

> IntegerRange(0, 8) Param Optimization

> Maximum depth of recursive cloning for self-recursive function.

> 

> -param=ipa-cp-min-recursive-probability=
luoxhu March 4, 2020, 2:57 a.m. | #2
Hi Honza,

The performance regression still exists. For exchange2, the performance
is about 28% slower for option:
"-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param ipa-cp-unit-growth=80 -fno-inline".

r278808:
commit ad06966f6677d55c11214d9c7b6d5518f915e341
Author: hubicka <hubicka@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Nov 28 14:16:29 2019 +0000

            * ipa-cp.c (update_profiling_info): Fix scaling.


Fix v3 patch and logs are here:
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00764.html


Thanks
Xionghu

On 2020/1/14 14:45, luoxhu wrote:
> Hi,

> 

> On 2020/1/3 00:58, Jan Hubicka wrote:

>>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

>>> index 14064ae0034..947bf7c7199 100644

>>> --- a/gcc/ipa-cp.c

>>> +++ b/gcc/ipa-cp.c

>>> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node 

>>> *orig_node,

>>>                             false);

>>>     new_sum = stats.count_sum;

>>> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

>>> +  if (info && info->node_is_self_scc)

>>> +    {

>>> +      profile_count self_recursive_count;

>>> +

>>> +      /* The self recursive edge is on the orig_node.  */

>>> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

>>> +    if (ipa_edge_within_scc (cs))

>>> +      {

>>> +        cgraph_node *callee = cs->callee->function_symbol ();

>>> +        if (callee && cs->caller == cs->callee)

>>> +          {

>>> +        self_recursive_count = cs->count;

>>> +        break;

>>> +          }

>>> +      }

>>

>> What happens here when there are multiple self recursive calls or when

>> the SCC has two mutually recursive functions?

>>

>> I am still confused by the logic of this function.  I will take a deeper

>> look at your previous email.

>>> +

>>> +      /* Proportion count for self recursive node from all callers.  */

>>> +      new_sum

>>> +    = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

>>> +

>>> +      /* Proportion count for specialized cloned node.  */

>>> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

>>> +    }

>>> +

>>>     if (orig_node_count < orig_sum + new_sum)

>>>       {

>>>         if (dump_file)

>>> diff --git a/gcc/params.opt b/gcc/params.opt

>>> index d88ae0c468b..40a03b04580 100644

>>> --- a/gcc/params.opt

>>> +++ b/gcc/params.opt

>>> @@ -199,7 +199,7 @@ Common Joined UInteger 

>>> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param

>>>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds 

>>> or strides known.

>>>   -param=ipa-cp-max-recursive-depth=

>>> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param

>>> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) 

>>> IntegerRange(1, 10) Param

>>>   Maximum depth of recursive cloning for self-recursive function.

>>

>> The values stats from 0 but I also wonder why 10 is a good upper bound?

>> If the function calls itself with one type of value (like depth-1) then

>> we may clone well over 10 times, if it calls itself with two different

>> sets then 8 is already quite high I would say...

>>

>> I suppose the call probabilities will eventually drop to be very low,

>> but I am not quite sure about that because we do not handle any IP

>> frequency propagation.  Do we have some way to treat wide trees? Or do

>> we clone only all self recursive calls are the same?

>>

>> Honza

> 

> Update v3 patch.  This regression could only be reproduced when built with

> "-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param

> ipa-cp-unit-growth=80" on exchange_2, on some platforms -fno-inline may be

> also needed, I attached 3 files (compressed to exchange.tar.gz)

> exchange2_gcc.use.orig.wpa.071i.cp.old, exchange2_gcc.use.orig.wpa.071i.cp.new

> and cp.diff to show the profile count difference of specialized node

> digits_2.constprop/152 to digits_2.constprop/159 without/with this patch.

> 

> Profile count is decreasing slowly with this patch instead of keeping very

> small from the first clone (very low count as cold will cause complete unroll

> not working), it still differs from really execution (exchange2.png), but this

> average model takes the recursive edge as feedback.  Thanks.

> 

> 

> v3:

> 1. Enable proportion orig_sum to the new nodes for self recursive node (k 

> means

>    multiple self recursive calls):

>    new_sum = (orig_sum + new_sum) * 1 / k \

>    * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).

> 2. Two mutually recursive functions are not supported in self recursive

>    clone yet so also not supported in update_profiling_info here.

> 3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).

>    If it calls itself two different sets, usually recursive boudary limit

>    will stop the specialize first, otherwise it is slow even without

>    recursive specialize.

> 

> The performance of exchange2 built with PGO will decrease ~28% by r278808

> due to profile count set incorrectly.  The cloned nodes are updated to a

> very small count caused later pass cunroll fail to unroll the recursive

> function in exchange2.

> 

> digits_2 ->

> digits_2.constprop.0, digits_2.constprop.1, etc.

> 

> gcc/ChangeLog:

> 

>      2020-01-14  Luo Xiong Hu  <luoxhu@linux.ibm.com>

> 

>      * ipa-cp.c (update_profiling_info): Check self_scc node.

>      * params.opt (ipa-cp-max-recursive-depth): Set param range.

> ---

> gcc/ipa-cp.c   | 33 ++++++++++++++++++++++++++++++++-

> gcc/params.opt |  2 +-

> 2 files changed, 33 insertions(+), 2 deletions(-)

> 

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

> index 17da1d8e8a7..8b759168e24 100644

> --- a/gcc/ipa-cp.c

> +++ b/gcc/ipa-cp.c

> @@ -2041,7 +2041,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,

>        /* Recursively generate lattice values with a limited count.  */

>        FOR_EACH_VEC_ELT (val_seeds, i, src_val)

>      {

> -      for (int j = 1; j < max_recursive_depth; j++)

> +      for (int j = 0; j < max_recursive_depth; j++)

>          {

>            tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,

>                               src_val, res_type);

> @@ -4313,6 +4313,37 @@ update_profiling_info (struct cgraph_node *orig_node,

>                            false);

>    new_sum = stats.count_sum;

> 

> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

> +  if (info && info->node_is_self_scc)

> +    {

> +      profile_count self_recursive_count = profile_count::zero ();

> +      unsigned k = 0;

> +

> +      /* The self recursive edge is on the orig_node.  */

> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

> +    if (ipa_edge_within_scc (cs))

> +      {

> +        cgraph_node *callee = cs->callee->function_symbol ();

> +        if (callee && cs->caller == cs->callee)

> +          {

> +        self_recursive_count += cs->count;

> +        k++;

> +          }

> +      }

> +

> +      /* Proportion count for multiple self recursive node.  */

> +      if (k)

> +    self_recursive_count.apply_scale (1, k);

> +

> +      /* Proportion count for self recursive node from all callers.  */

> +      new_sum

> +    = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

> +

> +      /* Proportion count for specialized cloned node.  */

> +      if (param_ipa_cp_max_recursive_depth)

> +    new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

> +    }

> +

>    if (orig_node_count < orig_sum + new_sum)

>      {

>        if (dump_file)

> diff --git a/gcc/params.opt b/gcc/params.opt

> index 31cc20031b1..9eceee5871f 100644

> --- a/gcc/params.opt

> +++ b/gcc/params.opt

> @@ -199,7 +199,7 @@ Common Joined UInteger 

> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza

> Compile-time bonus IPA-CP assigns to candidates which make loop bounds or 

> strides known.

> 

> -param=ipa-cp-max-recursive-depth=

> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param 

> Optimization

> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7) 

> IntegerRange(0, 8) Param Optimization

> Maximum depth of recursive cloning for self-recursive function.

> 

> -param=ipa-cp-min-recursive-probability=

Patch

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 17da1d8e8a7..8b759168e24 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2041,7 +2041,7 @@  propagate_vals_across_arith_jfunc (cgraph_edge *cs,
       /* Recursively generate lattice values with a limited count.  */
       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
 	{
-	  for (int j = 1; j < max_recursive_depth; j++)
+	  for (int j = 0; j < max_recursive_depth; j++)
 	    {
 	      tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
 						     src_val, res_type);
@@ -4313,6 +4313,37 @@  update_profiling_info (struct cgraph_node *orig_node,
 					      false);
   new_sum = stats.count_sum;
 
+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    {
+      profile_count self_recursive_count = profile_count::zero ();
+      unsigned k = 0;
+
+      /* The self recursive edge is on the orig_node.  */
+      for (cs = orig_node->callees; cs; cs = cs->next_callee)
+	if (ipa_edge_within_scc (cs))
+	  {
+	    cgraph_node *callee = cs->callee->function_symbol ();
+	    if (callee && cs->caller == cs->callee)
+	      {
+		self_recursive_count += cs->count;
+		k++;
+	      }
+	  }
+
+      /* Proportion count for multiple self recursive node.  */
+      if (k)
+	self_recursive_count.apply_scale (1, k);
+
+      /* Proportion count for self recursive node from all callers.  */
+      new_sum
+	= (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
+
+      /* Proportion count for specialized cloned node.  */
+      if (param_ipa_cp_max_recursive_depth)
+	new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
+    }
+
   if (orig_node_count < orig_sum + new_sum)
     {
       if (dump_file)
diff --git a/gcc/params.opt b/gcc/params.opt
index 31cc20031b1..9eceee5871f 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -199,7 +199,7 @@  Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
 
 -param=ipa-cp-max-recursive-depth=
-Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param Optimization
+Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7) IntegerRange(0, 8) Param Optimization
 Maximum depth of recursive cloning for self-recursive function.
 
 -param=ipa-cp-min-recursive-probability=