Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Дата
Msg-id CAFjFpRfAreXz2s+qOEjjPM7p5x_=GZLU474diNgrAgDZ9rA62g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Список pgsql-hackers
On Fri, Sep 15, 2017 at 5:29 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>>
>> Apart from these there is a regression case on a custom table, on head
>> query completes in 20s and with this patch it takes 27s. Please find
>> the attached .out and .sql file for the output and schema for the test
>> case respectively. I have reported this case before (sometime around
>> March this year) as well, but I am not sure if it was overlooked or is
>> an unimportant and expected behaviour for some reason.
>>
>
> Are you talking about [1]? I have explained about the regression in
> [2] and [3]. This looks like an issue with the existing costing model.
>

I debugged this case further. There are two partitioned tables being
joined prt (with partitions prt_p1, prt_p2 and so on) and prt2 (with
partitions prt2_p1, prt2_p2, and so on). When join is executed without
partition-wise join, prt2 is used to build hash table and prt is used
to probe that hash table. prt2 has lesser number of rows than prt. But
when partition-wise join is used, individual partitions are joined in
reverse join order i.e. partitions of prt are used to build the hash
table and partitions of prt2 are used to probe. This happens because
the path for the other join order (partition of prt2 used to build the
hash table and partition of prt used to probe) has huge cost compared
to the first one (74459 and 313109) and a portion worth 259094 comes
from lines 3226/7 of final_cost_hashjoin()
3215         /*
3216          * The number of tuple comparisons needed is the number of outer
3217          * tuples times the typical number of tuples in a hash
bucket, which
3218          * is the inner relation size times its bucketsize
fraction.  At each
3219          * one, we need to evaluate the hashjoin quals.  But actually,
3220          * charging the full qual eval cost at each tuple is pessimistic,
3221          * since we don't evaluate the quals unless the hash values match
3222          * exactly.  For lack of a better idea, halve the cost estimate to
3223          * allow for that.
3224          */
3225         startup_cost += hash_qual_cost.startup;
3226         run_cost += hash_qual_cost.per_tuple * outer_path_rows *
3227             clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

That's because for some reason innerbucketsize for partition of prt is
22 times more than that for partition of prt2. Looks like we have some
estimation error in estimating bucket sizes.

If I force partitions to be joined with the same order as partitioned
tables (without partition-wise join), child-joins execute faster and
in turn partition-wise join performs better than the
non-partition-wise join. So, this is clearly some estimation and
costing problem with regular joins.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] [COMMITTERS] pgsql: Fix bool/int type confusion
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [HACKERS] Re: proposal - psql: possibility to specify sort fordescribe commands, when size is printed