Re: Support run-time partition pruning for hash join

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: Support run-time partition pruning for hash join
Дата
Msg-id CAMbWs488_uMqmyQo0Cq-XrDSiEqiXCnfJtCyhXrWn_HdBen3Tw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Support run-time partition pruning for hash join  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Support run-time partition pruning for hash join  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers

On Fri, Aug 25, 2023 at 11:03 AM David Rowley <dgrowleyml@gmail.com> wrote:
I'd suggest writing some cost which costs an execution of run-time
pruning.  With LIST and RANGE you probably want something like
cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account
for the binary search over the sorted datum array. For HASH
partitions, something like cpu_operator_cost * npartcols once for each
hashed tuple.

You'll need to then come up with some counter costs to subtract from
the Append/MergeAppend.  This is tricky, as discussed.  Just come up
with something crude for now.

To start with, it could just be as crude as:

total_costs *= (Min(expected_outer_rows, n_append_subnodes)  /
n_append_subnodes);

i.e assume that every outer joined row will require exactly 1 new
partition up to the total number of partitions.  That's pretty much
worst-case, but it'll at least allow the optimisation to work for
cases like where the hash table is expected to contain just a tiny
number of rows (fewer than the number of partitions)

To make it better, you might want to look at join selectivity
estimation and see if you can find something there to influence
something better.

I have a go at writing some costing codes according to your suggestion.
That's compute_partprune_cost() in the v2 patch.

For the hash side, this function computes the pruning cost as
cpu_operator_cost * LOG2(nparts) * inner_rows for LIST and RANGE, and
cpu_operator_cost * nparts * inner_rows for HASH.

For the Append/MergeAppend side, this function first estimates the size
of outer side that matches, using the same idea as we estimate the
joinrel size for JOIN_SEMI.  Then it assumes that each outer joined row
occupies one new partition (the worst case) and computes how much cost
can be saved from partition pruning.

If the cost saved from the Append/MergeAppend side is larger than the
pruning cost from the Hash side, then we say that partition pruning is a
win.

Note that this costing logic runs for each Append-Hash pair, so it copes
with the case where we have multiple join levels.

With this costing logic added, I performed the same performance
comparisons of the worst case as in [1], and here is what I got.

tuples  unpatched   patched
10000   44.66       44.37   -0.006493506
20000   52.41       52.29   -0.002289639
30000   61.11       61.12   +0.000163639
40000   67.87       68.24   +0.005451599
50000   74.51       74.75   +0.003221044
60000   82.3        81.55   -0.009113001
70000   87.16       86.98   -0.002065168
80000   93.49       93.89   +0.004278532
90000   101.52      100.83  -0.00679669
100000  108.34      108.56  +0.002030644

So the costing logic successfully avoids performing the partition
pruning in the worst case.

I also tested the cases where partition pruning is possible with
different sizes of the hash side.

tuples  unpatched   patched
100     36.86       2.4     -0.934888768
200     35.87       2.37    -0.933928074
300     35.95       2.55    -0.92906815
400     36.4        2.63    -0.927747253
500     36.39       2.85    -0.921681781
600     36.32       2.97    -0.918226872
700     36.6        3.23    -0.911748634
800     36.88       3.44    -0.906724512
900     37.02       3.46    -0.906537007
1000    37.25       37.21   -0.001073826

The first 9 rows show that the costing logic allows the partition
pruning to be performed and the pruning turns out to be a big win.  The
last row shows that the partition pruning is disallowed by the costing
logic because it thinks no partition can be pruned (we have 1000
partitions in total).

So it seems that the new costing logic is quite crude and tends to be
very conservative, but it can help avoid the large overhead in the worst
cases.  I think this might be a good start to push this patch forward.

Any thoughts or comments?

[1] https://www.postgresql.org/message-id/CAMbWs49%2Bp6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q%40mail.gmail.com

Thanks
Richard
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Sync scan & regression tests
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Standardize spelling of "power of two"