Top-N sorts verses parallelism

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Top-N sorts verses parallelism
Дата
Msg-id CAEepm=1BWtC34vUroA0Uqjw02MaqdUrW+d6WD85_k8SLyPiKHQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Top-N sorts verses parallelism  (Robert Haas <robertmhaas@gmail.com>)
Re: Top-N sorts verses parallelism  (Jeff Janes <jeff.janes@gmail.com>)
Re: Top-N sorts verses parallelism  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
Hi hackers,

The start-up cost of bounded (top-N) sorts is sensitive at the small
end of N, and the
comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
seem to correspond to reality.  Here's a contrived example that comes
from a benchmark:

  create table foo as
    select generate_series(1, 1000000)::int a, (random() * 1000)::int b;
  analyze foo;
  select * from foo order by b limit N;

Estimated start-up costs for different N:

  limit 1 = 19425.00
  limit 2 = 24425.00
  limit 3 = 27349.81
  limit 4 = 29425.00
  limit 10 = 36034.64
  limit 50 = 47644.28
  limit 100 = 52644.28
  limit 133 = 54701.41

But the actual time doesn't really change on this random development
system: it's around 300ms.  I stopped at limit 133 because for this
particular query, at 134 a Gather Merge plan kicked in and I got
degree 3 parallel sorting and I got my answer just shy of three times
faster.  The speed up definitely paid for the parallel_setup_cost and
only one tuple was sent from each worker to the leader because the
limit was pushed down.

I want to get my answer three times as fast when I say limit 1 too!
But I can't because we seem to think that limit 1 is dramatically
cheaper than limit 134, so parallelism isn't worth bothering with.

With wider tuples the same effect can be seen, though it gets much
more difficult to convince Gather Merge to be selected at all for
reasons I haven't looked into yet.  Is Gather Merge for top-N sorts a
missed opportunity due to underestimation of top-N for small values of
N?  Or is there some other way to think about this problem?  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] parallel.c oblivion of worker-startup failures
Следующее
От: Ali Akbar
Дата:
Сообщение: Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" inchild table must be marked NOT NULL