Обсуждение: Top-N sorts verses parallelism

Поиск
Список
Период
Сортировка

Top-N sorts verses parallelism

От
Thomas Munro
Дата:
Hi hackers,

The start-up cost of bounded (top-N) sorts is sensitive at the small
end of N, and the
comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
seem to correspond to reality.  Here's a contrived example that comes
from a benchmark:

  create table foo as
    select generate_series(1, 1000000)::int a, (random() * 1000)::int b;
  analyze foo;
  select * from foo order by b limit N;

Estimated start-up costs for different N:

  limit 1 = 19425.00
  limit 2 = 24425.00
  limit 3 = 27349.81
  limit 4 = 29425.00
  limit 10 = 36034.64
  limit 50 = 47644.28
  limit 100 = 52644.28
  limit 133 = 54701.41

But the actual time doesn't really change on this random development
system: it's around 300ms.  I stopped at limit 133 because for this
particular query, at 134 a Gather Merge plan kicked in and I got
degree 3 parallel sorting and I got my answer just shy of three times
faster.  The speed up definitely paid for the parallel_setup_cost and
only one tuple was sent from each worker to the leader because the
limit was pushed down.

I want to get my answer three times as fast when I say limit 1 too!
But I can't because we seem to think that limit 1 is dramatically
cheaper than limit 134, so parallelism isn't worth bothering with.

With wider tuples the same effect can be seen, though it gets much
more difficult to convince Gather Merge to be selected at all for
reasons I haven't looked into yet.  Is Gather Merge for top-N sorts a
missed opportunity due to underestimation of top-N for small values of
N?  Or is there some other way to think about this problem?  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: Top-N sorts verses parallelism

От
Robert Haas
Дата:
On Wed, Dec 13, 2017 at 1:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hi hackers,
>
> The start-up cost of bounded (top-N) sorts is sensitive at the small
> end of N, and the
> comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
> seem to correspond to reality.  Here's a contrived example that comes
> from a benchmark:
>
>   create table foo as
>     select generate_series(1, 1000000)::int a, (random() * 1000)::int b;
>   analyze foo;
>   select * from foo order by b limit N;
>
> But the actual time doesn't really change on this random development
> system: it's around 300ms.  I stopped at limit 133 because for this
> particular query, at 134 a Gather Merge plan kicked in and I got
> degree 3 parallel sorting and I got my answer just shy of three times
> faster.  The speed up definitely paid for the parallel_setup_cost and
> only one tuple was sent from each worker to the leader because the
> limit was pushed down.

I get:

rhaas=# explain analyze select * from foo order by b limit 1;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19425.00..19425.00 rows=1 width=8) (actual
time=183.129..183.129 rows=1 loops=1)
   ->  Sort  (cost=19425.00..21925.00 rows=1000000 width=8) (actual
time=183.128..183.128 rows=1 loops=1)
         Sort Key: b
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on foo  (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.046..89.440 rows=1000000 loops=1)
 Planning time: 0.083 ms
 Execution time: 183.171 ms
(7 rows)

If we assume the costing of the Seq Scan node is right, then the
startup cost of the Sort node is too low.  For the Seq Scan node, each
1ms of execution time corresponds to 161.28 cost units.  If that were
accurate, then the 183.128 - 89.440 = 93.668 cost units of startup
cost for the Sort ought to have a cost of 14425.00 + (93.668 * 161.28)
= 29531.77504, but the actual cost is only 19425.00.  In other words,
the ratio of the startup cost of the sort to the total cost of the seq
scan ought to be about 2:1, to match the ratio of the execution times,
but it's actually more like 4:3.

I also see that it switches to Gather Merge at LIMIT 134, but if I
lower random_page_cost and seq_page_cost from the default values to
0.1, then for me it switches earlier, at 63.  Of course, at that point
the sort cost is now 3.5x the Seq Scan cost, so now things have gone
too far in the other direction and we're still not getting the plan
right, but I am out of time to investigate this for the moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Top-N sorts verses parallelism

От
Jeff Janes
Дата:
On Tue, Dec 12, 2017 at 10:46 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
Hi hackers,

The start-up cost of bounded (top-N) sorts is sensitive at the small
end of N, and the
comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
seem to correspond to reality.


Do we want the cost-estimate to be accurate for the worst case (which is not terribly unlikely to happen, input order is opposite of desired output order), or for the average case?

Cheers,

Jeff

Re: Top-N sorts verses parallelism

От
Jeff Janes
Дата:
On Tue, Dec 12, 2017 at 10:46 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
Hi hackers,

The start-up cost of bounded (top-N) sorts is sensitive at the small
end of N, and the
comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
seem to correspond to reality.  Here's a contrived example that comes
from a benchmark:

  create table foo as
    select generate_series(1, 1000000)::int a, (random() * 1000)::int b;
  analyze foo;
  select * from foo order by b limit N;


This looks like a costing bug.  The estimated cost of sorting 416,667 estimated tuples in one parallel worker is almost identical to the estimated cost of sorting 1,000,000 tuples when parallelism is turned off.  Adding some logging to the cost_sort function, it looks like the limit does not get sent down for the parallel estimate:

NOTICE:  JJ cost_sort tuples 1000000.000000, limit 61.000000, sort_mem 65536
NOTICE:  JJ cost_sort tuples 416667.000000, limit -1.000000, sort_mem 65536

So the LIMIT gets passed down for the execution step, but not for the planning step.

(On my system, LIMIT 61 is the point at which it switches to parallel)

In any case, if we "fixed" the top-n estimate to use the random-case rather than the worst-case, that would make the LIMIT 133 look more like the LIMIT 1, so would be the opposite direction of what you want.

Cheers,

Jeff

Re: Top-N sorts verses parallelism

От
Thomas Munro
Дата:
On Fri, Dec 15, 2017 at 10:05 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Dec 12, 2017 at 10:46 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>>
>> Hi hackers,
>>
>> The start-up cost of bounded (top-N) sorts is sensitive at the small
>> end of N, and the
>> comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
>> seem to correspond to reality.  Here's a contrived example that comes
>> from a benchmark:
>>
>>   create table foo as
>>     select generate_series(1, 1000000)::int a, (random() * 1000)::int b;
>>   analyze foo;
>>   select * from foo order by b limit N;
>
> This looks like a costing bug.  The estimated cost of sorting 416,667
> estimated tuples in one parallel worker is almost identical to the estimated
> cost of sorting 1,000,000 tuples when parallelism is turned off.  Adding
> some logging to the cost_sort function, it looks like the limit does not get
> sent down for the parallel estimate:
>
> NOTICE:  JJ cost_sort tuples 1000000.000000, limit 61.000000, sort_mem 65536
> NOTICE:  JJ cost_sort tuples 416667.000000, limit -1.000000, sort_mem 65536
>
> So the LIMIT gets passed down for the execution step, but not for the
> planning step.

Oh, well spotted.  I was looking in the wrong place.  Maybe the fix is
as simple as the attached?  It certainly helps in the cases I tested,
including with wider tables.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: Top-N sorts verses parallelism

От
Jeff Janes
Дата:
On Thu, Dec 14, 2017 at 5:12 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> This looks like a costing bug.  The estimated cost of sorting 416,667
> estimated tuples in one parallel worker is almost identical to the estimated
> cost of sorting 1,000,000 tuples when parallelism is turned off.  Adding
> some logging to the cost_sort function, it looks like the limit does not get
> sent down for the parallel estimate:
>
> NOTICE:  JJ cost_sort tuples 1000000.000000, limit 61.000000, sort_mem 65536
> NOTICE:  JJ cost_sort tuples 416667.000000, limit -1.000000, sort_mem 65536
>
> So the LIMIT gets passed down for the execution step, but not for the
> planning step.

Oh, well spotted.  I was looking in the wrong place.  Maybe the fix is
as simple as the attached?  It certainly helps in the cases I tested,
including with wider tables.

I had hit on the same change.  And was also surprised that it was located where it was.  With the change, it uses the parallel plan all the way down to LIMIT 1.

With the patch, it still satisfies make check, so if it introduces errors they are subtle ones.  If we can't actually do this and it needs to stay -1, then I think we need a comment to explain why.

Cheers,

Jeff

Re: Top-N sorts verses parallelism

От
Robert Haas
Дата:
On Fri, Dec 15, 2017 at 2:10 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I had hit on the same change.  And was also surprised that it was located
> where it was.  With the change, it uses the parallel plan all the way down
> to LIMIT 1.
>
> With the patch, it still satisfies make check, so if it introduces errors
> they are subtle ones.  If we can't actually do this and it needs to stay -1,
> then I think we need a comment to explain why.

Interesting.  I suspect this is correct now, but would not have been
before commit 3452dc5240da43e833118484e1e9b4894d04431c.  AFAICS, this
doesn't affect any execution-time behavior, just the cost estimate.
And, prior to that commit, the execution-time behavior was different:
there would not have been any way for the worker to do a top-N sort,
because the LIMIT was not pushed through the Gather.

Does that sound right, or am I still confused?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Top-N sorts verses parallelism

От
Thomas Munro
Дата:
On Sat, Dec 16, 2017 at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 2:10 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I had hit on the same change.  And was also surprised that it was located
>> where it was.  With the change, it uses the parallel plan all the way down
>> to LIMIT 1.
>>
>> With the patch, it still satisfies make check, so if it introduces errors
>> they are subtle ones.  If we can't actually do this and it needs to stay -1,
>> then I think we need a comment to explain why.
>
> Interesting.  I suspect this is correct now, but would not have been
> before commit 3452dc5240da43e833118484e1e9b4894d04431c.  AFAICS, this
> doesn't affect any execution-time behavior, just the cost estimate.
> And, prior to that commit, the execution-time behavior was different:
> there would not have been any way for the worker to do a top-N sort,
> because the LIMIT was not pushed through the Gather.
>
> Does that sound right, or am I still confused?

Looks right to me.  Commit 3452dc52 just forgot to tell the planner.
I'm pleased about that because it makes this a slam-dunk bug-fix and
not some confusing hard to justify costing problem.

BTW with this patch, the same test using a smaller foo table with 250k
rows limit 1 runs a parallel plan in 45ms here, but 220k rows limit 1
runs a non-parallel plan in 80ms, but that's just a regular costing
problem where the point at which the curves cross over will be hard to
get right due to all kinds of other variables, so I guess that kind of
thing is expected.

PS Oops, I do actually know how to spell "versus".  Typos are so much
more embarrassing in subject lines!

-- 
Thomas Munro
http://www.enterprisedb.com


Re: Top-N sorts verses parallelism

От
Robert Haas
Дата:
On Fri, Dec 15, 2017 at 4:13 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Looks right to me.  Commit 3452dc52 just forgot to tell the planner.
> I'm pleased about that because it makes this a slam-dunk bug-fix and
> not some confusing hard to justify costing problem.

Jeff Janes inquired off-list about other places where cost_sort() gets called.

In costsize.c, it is called from initial_cost_mergejoin, which seems
like it does not present an opportunity for pushing down limits, since
we don't know how many rows we'll have to join to get a certain number
of outputs.

In createplan.c, it is called only from label_sort_with_costsize().
That in turn called from create_merge_append_plan(), passing the tuple
limit from the path being converted to a plan, which seems
unimpeachable.  It's also called from make_unique_plan() and
create_mergejoin_plan() with -1, which seems OK since in neither case
do we know how many input rows we need to read.

In planner.c, it's called from plan_cluster_use_sort() with -1;
CLUSTER has to read the whole input, so that's fine.

In prepunion.c, it's called from choose_hashed_setop() with -1.  I
think set operations also need to read the whole input.

In pathnode.c, it's called from create_merge_append_path,
create_unique_path, create_gather_merge_path, create_groupingset_path,
and create_sort_path.  create_merge_append_path passes any limit
applicable to the subpath.  create_unique_path passes -1.
create_gather_merge_path also passes -1, which as Jeff also pointed
out seems possibly wrong.  create_sort_path also passes -1, and so
does create_groupingsets_path.

I went through the callers to create_sort_path and the only one that
looks like it can pass a limit is the one you and Jeff already
identified.  So I think the question is just whether
create_gather_merge_path needs a similar fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Top-N sorts verses parallelism

От
Thomas Munro
Дата:
On Mon, Dec 18, 2017 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I went through the callers to create_sort_path and the only one that
> looks like it can pass a limit is the one you and Jeff already
> identified.  So I think the question is just whether
> create_gather_merge_path needs a similar fix.

I might be missing something, but it looks like there are no cases
where we have a limit_tuples value we could use AND we're relying on
create_gather_merge_path's own ability to create sort paths.  So I
suspect there is no reason to change create_gather_merge_path itself
to deal with tuple limits.  I looked at each of its callers:

1.  create_ordered_paths is the case the patch posted earlier covers:
it has a useful limit_tuples value but it creates the sort path itself
first, so there is no need for create_gather_merge_path to be aware of
it.

2.  create_grouping_paths doesn't have limit_tuples value because
grouping always inhibits limits.

3.  generate_gather_paths is in turn called by:

3.1.  standard_joinsearch can't use limits (at least in general) since
it's dealing with a join.

3.2.  geco's merge_clump is also about joins, so ditto.

3.3.  set_rel_pathlist will consider only pathkeys from existing index
scans that set_plain_rel_pathlist found, not creating new pathkeys by
sorting.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: Top-N sorts verses parallelism

От
Amit Kapila
Дата:
On Tue, Dec 19, 2017 at 5:24 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, Dec 18, 2017 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I went through the callers to create_sort_path and the only one that
>> looks like it can pass a limit is the one you and Jeff already
>> identified.  So I think the question is just whether
>> create_gather_merge_path needs a similar fix.
>
> I might be missing something, but it looks like there are no cases
> where we have a limit_tuples value we could use AND we're relying on
> create_gather_merge_path's own ability to create sort paths.  So I
> suspect there is no reason to change create_gather_merge_path itself
> to deal with tuple limits.
>

Exactly.  I was about to post the same and my analysis results are
same as yours.  I think this raises the question, do we really need
cost_sort at that place and if so for which case?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Top-N sorts verses parallelism

От
Robert Haas
Дата:
On Tue, Dec 19, 2017 at 6:54 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, Dec 18, 2017 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I went through the callers to create_sort_path and the only one that
>> looks like it can pass a limit is the one you and Jeff already
>> identified.  So I think the question is just whether
>> create_gather_merge_path needs a similar fix.
>
> I might be missing something, but it looks like there are no cases
> where we have a limit_tuples value we could use AND we're relying on
> create_gather_merge_path's own ability to create sort paths.  So I
> suspect there is no reason to change create_gather_merge_path itself
> to deal with tuple limits.  I looked at each of its callers:
>
> 1.  create_ordered_paths is the case the patch posted earlier covers:
> it has a useful limit_tuples value but it creates the sort path itself
> first, so there is no need for create_gather_merge_path to be aware of
> it.
>
> 2.  create_grouping_paths doesn't have limit_tuples value because
> grouping always inhibits limits.
>
> 3.  generate_gather_paths is in turn called by:
>
> 3.1.  standard_joinsearch can't use limits (at least in general) since
> it's dealing with a join.
>
> 3.2.  geco's merge_clump is also about joins, so ditto.
>
> 3.3.  set_rel_pathlist will consider only pathkeys from existing index
> scans that set_plain_rel_pathlist found, not creating new pathkeys by
> sorting.

Well, it might be good future-proofing, but at least it's good to know
that it's not a active bug.  I pushed your earlier fix, which turns
out to be a revert of commit
dc02c7bca4dccf7de278cdc6b3325a829e75b252.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company