Re: Support run-time partition pruning for hash join

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Support run-time partition pruning for hash join
Дата
Msg-id CAApHDvrTwpgYG9UGjBJpMrey36LMpwtEx_2RWrByxSq7NnKqdA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Support run-time partition pruning for hash join  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: Support run-time partition pruning for hash join  (Richard Guo <guofenglinux@gmail.com>)
Re: Support run-time partition pruning for hash join  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers
On Thu, 24 Aug 2023 at 21:27, Richard Guo <guofenglinux@gmail.com> wrote:
> I performed some performance comparisons of the worst case with two
> tables as below:
>
> 1. The partitioned table has 1000 children, and 100,000 tuples in total.
>
> 2. The other table is designed that
>     a) its tuples occupy every partition of the partitioned table so
>        that no partitions can be pruned during execution,
>     b) tuples belong to the same partition are placed together so that
>        we need to scan all its tuples before we could know that no
>        pruning would happen and we could stop trying to prune,
>     c) the tuples are unique on the hash key so as to minimize the cost
>        of hash probe, so that we can highlight the impact of the pruning
>        codes.
>
> Here is the execution time (ms) I get with different sizes of the other
> table.
>
> tuples  unpatched   patched
> 10000   45.74       53.46   (+0.17)
> 20000   54.48       70.18   (+0.29)
> 30000   62.57       85.18   (+0.36)
> 40000   69.14       99.19   (+0.43)
> 50000   76.46       111.09  (+0.45)
> 60000   82.68       126.37  (+0.53)
> 70000   92.69       137.89  (+0.49)
> 80000   94.49       151.46  (+0.60)
> 90000   101.53      164.93  (+0.62)
> 100000  107.22      178.44  (+0.66)
>
> So the overhead the pruning code adds to Hash Join is too large to be
> accepted :(.

Agreed.  Run-time pruning is pretty fast to execute, but so is
inserting a row into a hash table.

> I think we need to solve this problem first before we can
> make this new partition pruning mechanism some useful in practice, but
> how?  Some thoughts currently in my mind include
>
> 1) we try our best to estimate the cost of this partition pruning when
> creating hash join paths, and decide based on the cost whether to use it
> or not.  But this does not seem to be an easy task.

I think we need to consider another Hash Join path when we detect that
the outer side of the Hash Join involves scanning a partitioned table.

I'd suggest writing some cost which costs an execution of run-time
pruning.  With LIST and RANGE you probably want something like
cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account
for the binary search over the sorted datum array. For HASH
partitions, something like cpu_operator_cost * npartcols once for each
hashed tuple.

You'll need to then come up with some counter costs to subtract from
the Append/MergeAppend.  This is tricky, as discussed.  Just come up
with something crude for now.

To start with, it could just be as crude as:

total_costs *= (Min(expected_outer_rows, n_append_subnodes)  /
n_append_subnodes);

i.e assume that every outer joined row will require exactly 1 new
partition up to the total number of partitions.  That's pretty much
worst-case, but it'll at least allow the optimisation to work for
cases like where the hash table is expected to contain just a tiny
number of rows (fewer than the number of partitions)

To make it better, you might want to look at join selectivity
estimation and see if you can find something there to influence
something better.

> 2) we use some heuristics when executing hash join, such as when we
> notice that a $threshold percentage of the partitions must be visited
> we just abort the pruning and assume that no partitions can be pruned.

You could likely code in something that checks
bms_num_members(jpstate->part_prune_result) to see if it still remains
below the total Append/MergeAppend subplans whenever, say whenever the
lower 8 bits of hashtable->totalTuples are all off.  You can just give
up doing any further pruning when all partitions are already required.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tristan Partin"
Дата:
Сообщение: Re: psql --no-connect option
Следующее
От: David Rowley
Дата:
Сообщение: Re: meson uses stale pg_config_paths.h left over from make