Message ID  20200106070340.853861luoxhu@linux.ibm.com 

State  New 
Headers  show 
Series 

Related  show 
On Mon, 20200106 at 01:03 0600, Xiong Hu Luo wrote: > Inline should return failure either (newsize > param_large_function_insns) > OR (newsize > limit). Sometimes newsize is larger than > param_large_function_insns, but smaller than limit, inline doesn't return > failure even if the new function is a large function. > Therefore, param_large_function_insns and param_large_function_growth should be > OR instead of AND, otherwise param largefunctiongrowth won't > work correctly with param largefunctioninsns. > > gcc/ChangeLog: > > 20200106 Luo Xiong Hu <luoxhu@linux.ibm.com> > > ipainlineanalysis.c (estimate_growth): Fix typo. > ipainline.c (caller_growth_limits): Use OR instead of AND. OK jeff
On 2020/1/7 02:01, Jeff Law wrote: > On Mon, 20200106 at 01:03 0600, Xiong Hu Luo wrote: >> Inline should return failure either (newsize > param_large_function_insns) >> OR (newsize > limit). Sometimes newsize is larger than >> param_large_function_insns, but smaller than limit, inline doesn't return >> failure even if the new function is a large function. >> Therefore, param_large_function_insns and param_large_function_growth should be >> OR instead of AND, otherwise param largefunctiongrowth won't >> work correctly with param largefunctioninsns. >> >> gcc/ChangeLog: >> >> 20200106 Luo Xiong Hu <luoxhu@linux.ibm.com> >> >> ipainlineanalysis.c (estimate_growth): Fix typo. >> ipainline.c (caller_growth_limits): Use OR instead of AND. > OK > jeff Thanks, committed in r279942. XiongHu >
> On Mon, 20200106 at 01:03 0600, Xiong Hu Luo wrote: > > Inline should return failure either (newsize > param_large_function_insns) > > OR (newsize > limit). Sometimes newsize is larger than > > param_large_function_insns, but smaller than limit, inline doesn't return > > failure even if the new function is a large function. > > Therefore, param_large_function_insns and param_large_function_growth should be > > OR instead of AND, otherwise param largefunctiongrowth won't > > work correctly with param largefunctioninsns. > > > > gcc/ChangeLog: > > > > 20200106 Luo Xiong Hu <luoxhu@linux.ibm.com> > > > > ipainlineanalysis.c (estimate_growth): Fix typo. > > ipainline.c (caller_growth_limits): Use OR instead of AND. > OK This patch also seems wrong. Inliner should return failure when newsize > param_large_function_insns && newsize > limit The reason is same as with large_unit_insns. We used to have only growth limits but it has turned out that this gets too restrictive/random for very small units/growths. So the logic is meant to be that inlining is OK either  if function is reasonably small (large_function_insns limit is not met)  or it does not grow too much (largefunctiongrowth isnot met) @item largefunctioninsns The limit specifying really large functions. For functions larger than this limit after inlining, inlining is constrained by @option{param largefunctiongrowth}. This parameter is useful primarily to avoid extreme compilation time caused by nonlinear algorithms used by the back end. Honza > jeff >
On 1/7/20 9:40 AM, Jan Hubicka wrote: >> On Mon, 20200106 at 01:03 0600, Xiong Hu Luo wrote: >>> Inline should return failure either (newsize > param_large_function_insns) >>> OR (newsize > limit). Sometimes newsize is larger than >>> param_large_function_insns, but smaller than limit, inline doesn't return >>> failure even if the new function is a large function. >>> Therefore, param_large_function_insns and param_large_function_growth should be >>> OR instead of AND, otherwise param largefunctiongrowth won't >>> work correctly with param largefunctioninsns. >>> >>> gcc/ChangeLog: >>> >>> 20200106 Luo Xiong Hu <luoxhu@linux.ibm.com> >>> >>> ipainlineanalysis.c (estimate_growth): Fix typo. >>> ipainline.c (caller_growth_limits): Use OR instead of AND. >> OK > > This patch also seems wrong. Inliner should return failure when > newsize > param_large_function_insns && newsize > limit > The reason is same as with large_unit_insns. We used to have only > growth limits but it has turned out that this gets too > restrictive/random for very small units/growths. > > So the logic is meant to be that inlining is OK either >  if function is reasonably small (large_function_insns limit is not > met) >  or it does not grow too much (largefunctiongrowth isnot met) > > @item largefunctioninsns > The limit specifying really large functions. For functions larger than > this limit after inlining, inlining is constrained by @option{param > largefunctiongrowth}. This parameter is useful primarily to avoid > extreme compilation time caused by nonlinear algorithms used by the > back end. Hello. The patch was already installed and caused quite significant changes in SPEC benchmarks: https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report May I revert the patch? Thanks, Martin > > > Honza >> jeff >>
On 2020/1/7 16:40, Jan Hubicka wrote: >> On Mon, 20200106 at 01:03 0600, Xiong Hu Luo wrote: >>> Inline should return failure either (newsize > param_large_function_insns) >>> OR (newsize > limit). Sometimes newsize is larger than >>> param_large_function_insns, but smaller than limit, inline doesn't return >>> failure even if the new function is a large function. >>> Therefore, param_large_function_insns and param_large_function_growth should be >>> OR instead of AND, otherwise param largefunctiongrowth won't >>> work correctly with param largefunctioninsns. >>> >>> gcc/ChangeLog: >>> >>> 20200106 Luo Xiong Hu <luoxhu@linux.ibm.com> >>> >>> ipainlineanalysis.c (estimate_growth): Fix typo. >>> ipainline.c (caller_growth_limits): Use OR instead of AND. >> OK > > This patch also seems wrong. Inliner should return failure when > newsize > param_large_function_insns && newsize > limit > The reason is same as with large_unit_insns. We used to have only > growth limits but it has turned out that this gets too > restrictive/random for very small units/growths. > > So the logic is meant to be that inlining is OK either >  if function is reasonably small (large_function_insns limit is not > met) >  or it does not grow too much (largefunctiongrowth isnot met) Sorry for checking too fast. Will revert once confirmed. large_function_insns is default to 2700, and largefunctiongrowth is default to 100, so inline a large function with insn 2700 to another large function with insn 2700 is allowed here (limit reach to 5400), I observed HUGE performance drop in some cases for this behavior(large function inline large function due to register spilling) compared with not inlined, that's why this patch comes out. From performance point of view，it seems benefitial to inline small function to large function, but negative when inline large function to large function(causing register spill)? 2700 seems too large for defining a function to be large and and 5400 also seems too big for growth limit? This inline happens at "Inlining one function called once", it's out of inline_small_functions and since newsize > param_large_function_insns, so I suppose it won't change behavior for inlining small functions? > > @item largefunctioninsns > The limit specifying really large functions. For functions larger than > this limit after inlining, inlining is constrained by @option{param > largefunctiongrowth}. This parameter is useful primarily to avoid > extreme compilation time caused by nonlinear algorithms used by the > back end. > > > Honza >> jeff >>
> > > On 2020/1/7 16:40, Jan Hubicka wrote: > >> On Mon, 20200106 at 01:03 0600, Xiong Hu Luo wrote: > >>> Inline should return failure either (newsize > param_large_function_insns) > >>> OR (newsize > limit). Sometimes newsize is larger than > >>> param_large_function_insns, but smaller than limit, inline doesn't return > >>> failure even if the new function is a large function. > >>> Therefore, param_large_function_insns and param_large_function_growth should be > >>> OR instead of AND, otherwise param largefunctiongrowth won't > >>> work correctly with param largefunctioninsns. > >>> > >>> gcc/ChangeLog: > >>> > >>> 20200106 Luo Xiong Hu <luoxhu@linux.ibm.com> > >>> > >>> ipainlineanalysis.c (estimate_growth): Fix typo. > >>> ipainline.c (caller_growth_limits): Use OR instead of AND. > >> OK > > > > This patch also seems wrong. Inliner should return failure when > > newsize > param_large_function_insns && newsize > limit > > The reason is same as with large_unit_insns. We used to have only > > growth limits but it has turned out that this gets too > > restrictive/random for very small units/growths. > > > > So the logic is meant to be that inlining is OK either > >  if function is reasonably small (large_function_insns limit is not > > met) > >  or it does not grow too much (largefunctiongrowth isnot met) > > Sorry for checking too fast. Will revert once confirmed. Yes, i can confirm that the origina test was as intended and it is OK to revert the patch (I am currently traveling) > large_function_insns is default to 2700, and largefunctiongrowth is > default to 100, so inline a large function with insn 2700 to another large > function with insn 2700 is allowed here (limit reach to 5400), I observed > HUGE performance drop in some cases for this behavior(large function inline > large function due to register spilling) compared with not inlined, that's > why this patch comes out. > > From performance point of view，it seems benefitial to inline small function to > large function, but negative when inline large function to large function(causing > register spill)? 2700 seems too large for defining a function to be large and and > 5400 also seems too big for growth limit? > > This inline happens at "Inlining one function called once", it's out of > inline_small_functions and since newsize > param_large_function_insns, so I > suppose it won't change behavior for inlining small functions? largefunctiongrowth/insns is not meant to improve performance. The reason why they exists is the fact that compiler is nonlinar in some cases and thus producing very large functions makes it slow / or give up on some optimizations. The problem that sometimes inlining a larger function into a cold path of other function leads to slowdown is rather old problem (see PR49194). In general most of logic in inliner is about determining when inlining is going to be useful. It is hard to tell when it is *not* going to be useful, so I am bit unsure what to do about these cases in general. Recently I experimented path which disables inlining functions called once when callee and caller frequency differ (i.e. one is hot other is cold) etc, but it did not lead to consistent perormance improvements. What situation did you run into? Honza
On 2020/1/7 23:40, Jan Hubicka wrote: >> >> >> On 2020/1/7 16:40, Jan Hubicka wrote: >>>> On Mon, 20200106 at 01:03 0600, Xiong Hu Luo wrote: >>>>> Inline should return failure either (newsize > param_large_function_insns) >>>>> OR (newsize > limit). Sometimes newsize is larger than >>>>> param_large_function_insns, but smaller than limit, inline doesn't return >>>>> failure even if the new function is a large function. >>>>> Therefore, param_large_function_insns and param_large_function_growth should be >>>>> OR instead of AND, otherwise param largefunctiongrowth won't >>>>> work correctly with param largefunctioninsns. >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 20200106 Luo Xiong Hu <luoxhu@linux.ibm.com> >>>>> >>>>> ipainlineanalysis.c (estimate_growth): Fix typo. >>>>> ipainline.c (caller_growth_limits): Use OR instead of AND. >>>> OK >>> >>> This patch also seems wrong. Inliner should return failure when >>> newsize > param_large_function_insns && newsize > limit >>> The reason is same as with large_unit_insns. We used to have only >>> growth limits but it has turned out that this gets too >>> restrictive/random for very small units/growths. >>> >>> So the logic is meant to be that inlining is OK either >>>  if function is reasonably small (large_function_insns limit is not >>> met) >>>  or it does not grow too much (largefunctiongrowth isnot met) >> >> Sorry for checking too fast. Will revert once confirmed. > > Yes, i can confirm that the origina test was as intended and it is OK to > revert the patch (I am currently traveling) Partially reverted in r279986 and remained the typo fix as it is obvious. > >> large_function_insns is default to 2700, and largefunctiongrowth is >> default to 100, so inline a large function with insn 2700 to another large >> function with insn 2700 is allowed here (limit reach to 5400), I observed >> HUGE performance drop in some cases for this behavior(large function inline >> large function due to register spilling) compared with not inlined, that's >> why this patch comes out. >> >> From performance point of view，it seems benefitial to inline small function to >> large function, but negative when inline large function to large function(causing >> register spill)? 2700 seems too large for defining a function to be large and and >> 5400 also seems too big for growth limit? >> >> This inline happens at "Inlining one function called once", it's out of >> inline_small_functions and since newsize > param_large_function_insns, so I >> suppose it won't change behavior for inlining small functions? > > largefunctiongrowth/insns is not meant to improve performance. The > reason why they exists is the fact that compiler is nonlinar > in some cases and thus producing very large functions makes it slow / or > give up on some optimizations. > > The problem that sometimes inlining a larger function into a cold path > of other function leads to slowdown is rather old problem (see PR49194). > In general most of logic in inliner is about determining when inlining > is going to be useful. It is hard to tell when it is *not* going to be > useful, so I am bit unsure what to do about these cases in general. > > Recently I experimented path which disables inlining functions called > once when callee and caller frequency differ (i.e. one is hot other is > cold) etc, but it did not lead to consistent perormance improvements. > > What situation did you run into? Thanks. So caller could be {hot, cold} + {large, small}, same for callee. It may produce up to 4 * 4 = 16 combinations. Agree that hard to define useful, and useful really doesn't reflect performance improvements certainly. :) My case is A1(1) calls A2(2), A2(2) calls A3(3). A1, A2, A3 are specialized cloned nodes with different input, they are all hot, called once and each have about 1000+ insns. By default, largefunctiongrowth/insns are both not reached, so A1 will inline A2, A2 will inline A3, which is 40% slower than noinline. If adjust the largefunctiongrowth/insns to allow A2 inline A3 only, the performance is 20% slower then noinline. All the inlinings are generated in functions called once. Xionghu > > Honza >
> > Thanks. So caller could be {hot, cold} + {large, small}, same for callee. It may > produce up to 4 * 4 = 16 combinations. Agree that hard to define useful, > and useful really doesn't reflect performance improvements certainly. :) > > My case is A1(1) calls A2(2), A2(2) calls A3(3). A1, A2, A3 are > specialized cloned nodes with different input, they are all hot, called once > and each have about 1000+ insns. By default, largefunctiongrowth/insns are > both not reached, so A1 will inline A2, A2 will inline A3, which is 40% slower than noinline. > > If adjust the largefunctiongrowth/insns to allow > A2 inline A3 only, the performance is 20% slower then noinline. > All the inlinings are generated in functions called once. I see, I assume that this is exchange. What is difficult for GCC in exchange is the large loop nest. GCC generally assumes that what is inside of deeper loop nest is more iportant and if I recall correctly there are 10 nested loops wrapping the recursie call. Basic observation is that for every self recursive function the combined frequency of all self recursive calls must be less than entry block frequency or the recursion tree will never end. Some time ago we added PRED_LOOP_EXIT_WITH_RECURSION, PRED_RECURSIVE_CALL, PRED_LOOP_GUARD_WITH_RECURSION which makes loops leading to recursion less likely to iterate. But this may not be enough to get profile correct. I wonder if we can not help the situation by extending esitmate_bb_frequencies to simply sum the frequencies of recursive calls and if they exceeds entry block forcingly scale down corresponding BBs accordingly (which would leave profile locally inconsistent, but I am not sure how to do much better  one could identif control dependencies and drop probabilities but after that one would need repropagate the loop nest I guess. This may 1) make inliner less eager to perform the inline 2) make tree optimizers to produce less damage on the outer loops if inlining happens. Honza > > > Xionghu > > > > > Honza > > >
diff git a/gcc/ipainlineanalysis.c b/gcc/ipainlineanalysis.c index 711b85bd1a9..0e584a5a401 100644  a/gcc/ipainlineanalysis.c +++ b/gcc/ipainlineanalysis.c @@ 462,7 +462,7 @@ offline_size (struct cgraph_node *node, ipa_size_summary *info) return 0; } /* Estimate the growth caused by inlining NODE into all callees. */ +/* Estimate the growth caused by inlining NODE into all callers. */ int estimate_growth (struct cgraph_node *node) diff git a/gcc/ipainline.c b/gcc/ipainline.c index 0f87c476dde..33945538def 100644  a/gcc/ipainline.c +++ b/gcc/ipainline.c @@ 184,8 +184,8 @@ caller_growth_limits (struct cgraph_edge *e) the function to shrink if it went over the limits by forced inlining. */ newsize = estimate_size_after_inlining (to, e); if (newsize >= ipa_size_summaries>get (what)>size  && newsize > opt_for_fn (to>decl, param_large_function_insns)  && newsize > limit) + && (newsize > opt_for_fn (to>decl, param_large_function_insns) +  newsize > limit)) { e>inline_failed = CIF_LARGE_FUNCTION_GROWTH_LIMIT; return false;