[v2] ipa-cp: Fix PGO regression caused by r278808

Message ID 4d99f162-084c-12d0-1205-7e790a597bc6@linux.ibm.com
State New
Headers show
Series
  • [v2] ipa-cp: Fix PGO regression caused by r278808
Related show

Commit Message

luoxhu Dec. 30, 2019, 8:11 a.m.
v2 Changes:
1. Enable proportion orig_sum to the new nodes for self recursive node:
   new_sum = (orig_sum + new_sum) \
   * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
2. Add value range for param_ipa_cp_max_recursive_depth.

The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

	2019-12-26  Luo Xiong Hu  <luoxhu@linux.ibm.com>

	* ipa-cp.c (update_profiling_info): Check self_scc node.
	* params.opt (ipa-cp-max-recursive-depth): Set param range.
---
 gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++
 gcc/params.opt |  2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

-- 
2.21.0.777.g83232e3864

Comments

Feng Xue OS Dec. 31, 2019, 6:43 a.m. | #1
One comment: it's better to allow zero value for param_ipa_cp_max_recursive_depth,
this can be used to disable recursive cloning.

Feng
________________________________________
From: luoxhu <luoxhu@linux.ibm.com>

Sent: Monday, December 30, 2019 4:11 PM
To: Jan Hubicka; Martin Jambor
Cc: Martin Liška; gcc-patches@gcc.gnu.org; segher@kernel.crashing.org; wschmidt@linux.ibm.com; guojiufu@linux.ibm.com; linkw@gcc.gnu.org; Feng Xue OS
Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

v2 Changes:
1. Enable proportion orig_sum to the new nodes for self recursive node:
   new_sum = (orig_sum + new_sum) \
   * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
2. Add value range for param_ipa_cp_max_recursive_depth.

The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

        2019-12-26  Luo Xiong Hu  <luoxhu@linux.ibm.com>

        * ipa-cp.c (update_profiling_info): Check self_scc node.
        * params.opt (ipa-cp-max-recursive-depth): Set param range.
---
 gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++
 gcc/params.opt |  2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 14064ae0034..947bf7c7199 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
                                              false);
   new_sum = stats.count_sum;

+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    {
+      profile_count self_recursive_count;
+
+      /* The self recursive edge is on the orig_node.  */
+      for (cs = orig_node->callees; cs; cs = cs->next_callee)
+       if (ipa_edge_within_scc (cs))
+         {
+           cgraph_node *callee = cs->callee->function_symbol ();
+           if (callee && cs->caller == cs->callee)
+             {
+               self_recursive_count = cs->count;
+               break;
+             }
+         }
+
+      /* Proportion count for self recursive node from all callers.  */
+      new_sum
+       = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
+
+      /* Proportion count for specialized cloned node.  */
+      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
+    }
+
   if (orig_node_count < orig_sum + new_sum)
     {
       if (dump_file)
diff --git a/gcc/params.opt b/gcc/params.opt
index d88ae0c468b..40a03b04580 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.

 -param=ipa-cp-max-recursive-depth=
-Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
+Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
 Maximum depth of recursive cloning for self-recursive function.

 -param=ipa-cp-min-recursive-probability=
--
2.21.0.777.g83232e3864
luoxhu Dec. 31, 2019, 7:43 a.m. | #2
On 2019/12/31 14:43, Feng Xue OS wrote:
> One comment: it's better to allow zero value for param_ipa_cp_max_recursive_depth,

> this can be used to disable recursive cloning.


Thanks, "1" means no recursive cloning but only constant propagation from
caller to callee in your code?

ipa-cp.c, line 2019:

       /* Recursively generate lattice values with a limited count.  */
       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
	{
	  for (int j = 1; j < param_ipa_cp_max_recursive_depth; j++)
	    {
	      tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
						     src_val, res_type);
	      if (!cstval)
		break;

	      ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
					  src_offset, &src_val, true);
	      gcc_checking_assert (src_val);
	    }
	}

XiongHu

> 

> Feng

> ________________________________________

> From: luoxhu <luoxhu@linux.ibm.com>

> Sent: Monday, December 30, 2019 4:11 PM

> To: Jan Hubicka; Martin Jambor

> Cc: Martin Liška; gcc-patches@gcc.gnu.org; segher@kernel.crashing.org; wschmidt@linux.ibm.com; guojiufu@linux.ibm.com; linkw@gcc.gnu.org; Feng Xue OS

> Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

> 

> v2 Changes:

> 1. Enable proportion orig_sum to the new nodes for self recursive node:

>     new_sum = (orig_sum + new_sum) \

>     * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).

> 2. Add value range for param_ipa_cp_max_recursive_depth.

> 

> The performance of exchange2 built with PGO will decrease ~28% by r278808

> due to profile count set incorrectly.  The cloned nodes are updated to a

> very small count caused later pass cunroll fail to unroll the recursive

> function in exchange2.

> 

> digits_2 ->

> digits_2.constprop.0, digits_2.constprop.1, etc.

> 

> gcc/ChangeLog:

> 

>          2019-12-26  Luo Xiong Hu  <luoxhu@linux.ibm.com>

> 

>          * ipa-cp.c (update_profiling_info): Check self_scc node.

>          * params.opt (ipa-cp-max-recursive-depth): Set param range.

> ---

>   gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++

>   gcc/params.opt |  2 +-

>   2 files changed, 26 insertions(+), 1 deletion(-)

> 

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

> index 14064ae0034..947bf7c7199 100644

> --- a/gcc/ipa-cp.c

> +++ b/gcc/ipa-cp.c

> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,

>                                                false);

>     new_sum = stats.count_sum;

> 

> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

> +  if (info && info->node_is_self_scc)

> +    {

> +      profile_count self_recursive_count;

> +

> +      /* The self recursive edge is on the orig_node.  */

> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

> +       if (ipa_edge_within_scc (cs))

> +         {

> +           cgraph_node *callee = cs->callee->function_symbol ();

> +           if (callee && cs->caller == cs->callee)

> +             {

> +               self_recursive_count = cs->count;

> +               break;

> +             }

> +         }

> +

> +      /* Proportion count for self recursive node from all callers.  */

> +      new_sum

> +       = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

> +

> +      /* Proportion count for specialized cloned node.  */

> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

> +    }

> +

>     if (orig_node_count < orig_sum + new_sum)

>       {

>         if (dump_file)

> diff --git a/gcc/params.opt b/gcc/params.opt

> index d88ae0c468b..40a03b04580 100644

> --- a/gcc/params.opt

> +++ b/gcc/params.opt

> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param

>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.

> 

>   -param=ipa-cp-max-recursive-depth=

> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param

> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param

>   Maximum depth of recursive cloning for self-recursive function.

> 

>   -param=ipa-cp-min-recursive-probability=

> --

> 2.21.0.777.g83232e3864

>
Feng Xue OS Dec. 31, 2019, 8:51 a.m. | #3
The first level is ordinary clone, corresponding to a non-self caller,
and the param_ipa_cp_max_recursive_depth-1 is for recursive cloning.
Then it's ok that the least value is 1, with which disabling does happen.

Feng

________________________________________
From: luoxhu <luoxhu@linux.ibm.com>

Sent: Tuesday, December 31, 2019 3:43 PM
To: Feng Xue OS; Jan Hubicka; Martin Jambor
Cc: Martin Liška; gcc-patches@gcc.gnu.org; segher@kernel.crashing.org; wschmidt@linux.ibm.com; guojiufu@linux.ibm.com; linkw@gcc.gnu.org
Subject: Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808


On 2019/12/31 14:43, Feng Xue OS wrote:
> One comment: it's better to allow zero value for param_ipa_cp_max_recursive_depth,

> this can be used to disable recursive cloning.


Thanks, "1" means no recursive cloning but only constant propagation from
caller to callee in your code?

ipa-cp.c, line 2019:

       /* Recursively generate lattice values with a limited count.  */
       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
        {
          for (int j = 1; j < param_ipa_cp_max_recursive_depth; j++)
            {
              tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
                                                     src_val, res_type);
              if (!cstval)
                break;

              ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
                                          src_offset, &src_val, true);
              gcc_checking_assert (src_val);
            }
        }

XiongHu

>

> Feng

> ________________________________________

> From: luoxhu <luoxhu@linux.ibm.com>

> Sent: Monday, December 30, 2019 4:11 PM

> To: Jan Hubicka; Martin Jambor

> Cc: Martin Liška; gcc-patches@gcc.gnu.org; segher@kernel.crashing.org; wschmidt@linux.ibm.com; guojiufu@linux.ibm.com; linkw@gcc.gnu.org; Feng Xue OS

> Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

>

> v2 Changes:

> 1. Enable proportion orig_sum to the new nodes for self recursive node:

>     new_sum = (orig_sum + new_sum) \

>     * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).

> 2. Add value range for param_ipa_cp_max_recursive_depth.

>

> The performance of exchange2 built with PGO will decrease ~28% by r278808

> due to profile count set incorrectly.  The cloned nodes are updated to a

> very small count caused later pass cunroll fail to unroll the recursive

> function in exchange2.

>

> digits_2 ->

> digits_2.constprop.0, digits_2.constprop.1, etc.

>

> gcc/ChangeLog:

>

>          2019-12-26  Luo Xiong Hu  <luoxhu@linux.ibm.com>

>

>          * ipa-cp.c (update_profiling_info): Check self_scc node.

>          * params.opt (ipa-cp-max-recursive-depth): Set param range.

> ---

>   gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++

>   gcc/params.opt |  2 +-

>   2 files changed, 26 insertions(+), 1 deletion(-)

>

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

> index 14064ae0034..947bf7c7199 100644

> --- a/gcc/ipa-cp.c

> +++ b/gcc/ipa-cp.c

> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,

>                                                false);

>     new_sum = stats.count_sum;

>

> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

> +  if (info && info->node_is_self_scc)

> +    {

> +      profile_count self_recursive_count;

> +

> +      /* The self recursive edge is on the orig_node.  */

> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

> +       if (ipa_edge_within_scc (cs))

> +         {

> +           cgraph_node *callee = cs->callee->function_symbol ();

> +           if (callee && cs->caller == cs->callee)

> +             {

> +               self_recursive_count = cs->count;

> +               break;

> +             }

> +         }

> +

> +      /* Proportion count for self recursive node from all callers.  */

> +      new_sum

> +       = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

> +

> +      /* Proportion count for specialized cloned node.  */

> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

> +    }

> +

>     if (orig_node_count < orig_sum + new_sum)

>       {

>         if (dump_file)

> diff --git a/gcc/params.opt b/gcc/params.opt

> index d88ae0c468b..40a03b04580 100644

> --- a/gcc/params.opt

> +++ b/gcc/params.opt

> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param

>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.

>

>   -param=ipa-cp-max-recursive-depth=

> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param

> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param

>   Maximum depth of recursive cloning for self-recursive function.

>

>   -param=ipa-cp-min-recursive-probability=

> --

> 2.21.0.777.g83232e3864

>
Jan Hubicka Jan. 2, 2020, 4:58 p.m. | #4
> v2 Changes:

> 1. Enable proportion orig_sum to the new nodes for self recursive node:

>    new_sum = (orig_sum + new_sum) \

>    * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).

> 2. Add value range for param_ipa_cp_max_recursive_depth.

> 

> The performance of exchange2 built with PGO will decrease ~28% by r278808

> due to profile count set incorrectly.  The cloned nodes are updated to a

> very small count caused later pass cunroll fail to unroll the recursive

> function in exchange2.

> 

> digits_2 ->

> digits_2.constprop.0, digits_2.constprop.1, etc.

> 

> gcc/ChangeLog:

> 

> 	2019-12-26  Luo Xiong Hu  <luoxhu@linux.ibm.com>

> 

> 	* ipa-cp.c (update_profiling_info): Check self_scc node.

> 	* params.opt (ipa-cp-max-recursive-depth): Set param range.

> ---

>  gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++

>  gcc/params.opt |  2 +-

>  2 files changed, 26 insertions(+), 1 deletion(-)

> 

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

> index 14064ae0034..947bf7c7199 100644

> --- a/gcc/ipa-cp.c

> +++ b/gcc/ipa-cp.c

> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,

>  					      false);

>    new_sum = stats.count_sum;

>  

> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

> +  if (info && info->node_is_self_scc)

> +    {

> +      profile_count self_recursive_count;

> +

> +      /* The self recursive edge is on the orig_node.  */

> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

> +	if (ipa_edge_within_scc (cs))

> +	  {

> +	    cgraph_node *callee = cs->callee->function_symbol ();

> +	    if (callee && cs->caller == cs->callee)

> +	      {

> +		self_recursive_count = cs->count;

> +		break;

> +	      }

> +	  }


What happens here when there are multiple self recursive calls or when
the SCC has two mutually recursive functions?

I am still confused by the logic of this function.  I will take a deeper
look at your previous email.
> +

> +      /* Proportion count for self recursive node from all callers.  */

> +      new_sum

> +	= (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

> +

> +      /* Proportion count for specialized cloned node.  */

> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

> +    }

> +

>    if (orig_node_count < orig_sum + new_sum)

>      {

>        if (dump_file)

> diff --git a/gcc/params.opt b/gcc/params.opt

> index d88ae0c468b..40a03b04580 100644

> --- a/gcc/params.opt

> +++ b/gcc/params.opt

> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param

>  Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.

>  

>  -param=ipa-cp-max-recursive-depth=

> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param

> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param

>  Maximum depth of recursive cloning for self-recursive function.


The values stats from 0 but I also wonder why 10 is a good upper bound?
If the function calls itself with one type of value (like depth-1) then
we may clone well over 10 times, if it calls itself with two different
sets then 8 is already quite high I would say...

I suppose the call probabilities will eventually drop to be very low,
but I am not quite sure about that because we do not handle any IP
frequency propagation.  Do we have some way to treat wide trees? Or do
we clone only all self recursive calls are the same?

Honza
>  

>  -param=ipa-cp-min-recursive-probability=

> -- 

> 2.21.0.777.g83232e3864

>

Patch

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 14064ae0034..947bf7c7199 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4272,6 +4272,31 @@  update_profiling_info (struct cgraph_node *orig_node,
 					      false);
   new_sum = stats.count_sum;
 
+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    {
+      profile_count self_recursive_count;
+
+      /* The self recursive edge is on the orig_node.  */
+      for (cs = orig_node->callees; cs; cs = cs->next_callee)
+	if (ipa_edge_within_scc (cs))
+	  {
+	    cgraph_node *callee = cs->callee->function_symbol ();
+	    if (callee && cs->caller == cs->callee)
+	      {
+		self_recursive_count = cs->count;
+		break;
+	      }
+	  }
+
+      /* Proportion count for self recursive node from all callers.  */
+      new_sum
+	= (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
+
+      /* Proportion count for specialized cloned node.  */
+      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
+    }
+
   if (orig_node_count < orig_sum + new_sum)
     {
       if (dump_file)
diff --git a/gcc/params.opt b/gcc/params.opt
index d88ae0c468b..40a03b04580 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -199,7 +199,7 @@  Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
 
 -param=ipa-cp-max-recursive-depth=
-Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
+Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
 Maximum depth of recursive cloning for self-recursive function.
 
 -param=ipa-cp-min-recursive-probability=