Обсуждение: Partial join

Поиск
Список
Период
Сортировка

Partial join

От
Arne Roland
Дата:
Hello,

I attached one example of a partitioned table with multi column partition key. I also attached the output.
Disabling the hash_join is not really necessary, it just shows the more drastic result in the case of low work_mem.

Comparing the first and the second query I was surprised to see that SET enable_partitionwise_join could cause the costs to go up. Shouldn't the paths of the first query be generated as well?

The third query seems to have a different issue. That one is close to my original performance problem. It looks to me like the push down of the sl condition stops the optimizer considering a partial join.
If so would it be sane to keep a copy of the original quals to make the partial join possible? Do you have better ideas?


Regards
Arne
Вложения

Re: Partial join

От
Richard Guo
Дата:

On Thu, Aug 1, 2019 at 5:38 PM Arne Roland <A.Roland@index.de> wrote:
Hello,

I attached one example of a partitioned table with multi column partition key. I also attached the output.
Disabling the hash_join is not really necessary, it just shows the more drastic result in the case of low work_mem.

Comparing the first and the second query I was surprised to see that SET enable_partitionwise_join could cause the costs to go up. Shouldn't the paths of the first query be generated as well?

The third query seems to have a different issue. That one is close to my original performance problem. It looks to me like the push down of the sl condition stops the optimizer considering a partial join.
If so would it be sane to keep a copy of the original quals to make the partial join possible? Do you have better ideas?

For the third query,  a rough investigation shows that, the qual 'sl =
5' and 'sc.sl = sg.sl' will form an equivalence class and generate two
implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed
down to the base rels. One consequence of the deduction is when
constructing restrict lists for the joinrel, we lose the original
restrict 'sc.sl = sg.sl', and this would fail the check
have_partkey_equi_join(), which checks if there exists an equi-join
condition for each pair of partition keys. As a result, this joinrel
would not be considered as an input to further partitionwise joins.

We need to fix this.

Thanks
Richard

Re: Partial join

От
Arne Roland
Дата:

Hello Richard,

thanks for your quick reply.


> We need to fix this.


Do you have a better idea than just keeping the old quals - possibly just the ones that get eliminated - in a separate data structure? Is the push down of quals the only case of elimination of quals, only counting the ones which happen before the restrict lists are generated?


Regards

Arne


From: Richard Guo <riguo@pivotal.io>
Sent: Thursday, August 1, 2019 1:14:44 PM
To: Arne Roland
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: Partial join
 

On Thu, Aug 1, 2019 at 5:38 PM Arne Roland <A.Roland@index.de> wrote:
Hello,

I attached one example of a partitioned table with multi column partition key. I also attached the output.
Disabling the hash_join is not really necessary, it just shows the more drastic result in the case of low work_mem.

Comparing the first and the second query I was surprised to see that SET enable_partitionwise_join could cause the costs to go up. Shouldn't the paths of the first query be generated as well?

The third query seems to have a different issue. That one is close to my original performance problem. It looks to me like the push down of the sl condition stops the optimizer considering a partial join.
If so would it be sane to keep a copy of the original quals to make the partial join possible? Do you have better ideas?

For the third query,  a rough investigation shows that, the qual 'sl =
5' and 'sc.sl = sg.sl' will form an equivalence class and generate two
implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed
down to the base rels. One consequence of the deduction is when
constructing restrict lists for the joinrel, we lose the original
restrict 'sc.sl = sg.sl', and this would fail the check
have_partkey_equi_join(), which checks if there exists an equi-join
condition for each pair of partition keys. As a result, this joinrel
would not be considered as an input to further partitionwise joins.

We need to fix this.

Thanks
Richard

Re: Partial join

От
Tom Lane
Дата:
Richard Guo <riguo@pivotal.io> writes:
> For the third query,  a rough investigation shows that, the qual 'sl =
> 5' and 'sc.sl = sg.sl' will form an equivalence class and generate two
> implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed
> down to the base rels. One consequence of the deduction is when
> constructing restrict lists for the joinrel, we lose the original
> restrict 'sc.sl = sg.sl', and this would fail the check
> have_partkey_equi_join(), which checks if there exists an equi-join
> condition for each pair of partition keys. As a result, this joinrel
> would not be considered as an input to further partitionwise joins.

> We need to fix this.

Uh ... why?  The pushed-down restrictions should result in pruning
away any prunable partitions at the scan level, leaving nothing for
the partitionwise join code to do.

            regards, tom lane



Re: Partial join

От
Arne Roland
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Uh ... why?  The pushed-down restrictions should result in pruning
> away any prunable partitions at the scan level, leaving nothing for
> the partitionwise join code to do.

It seems reasonable to me that the join condition can no longer be verified, since  'sc.sl = sg.sl' is now replaced by 'sg.sl = 5' so the join condition can no longer be validated.

It's true that the pruning would prune everything but one partition, in case we'd just have a single column partition key. But we don't. I don't see how pruning partitions should help in this case, since we are left with multiple partitions for both relations.

Regards
Arne


From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, August 1, 2019 4:14:54 PM
To: Richard Guo
Cc: Arne Roland; pgsql-hackers@lists.postgresql.org
Subject: Re: Partial join
 
Richard Guo <riguo@pivotal.io> writes:
> For the third query,  a rough investigation shows that, the qual 'sl =
> 5' and 'sc.sl = sg.sl' will form an equivalence class and generate two
> implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed
> down to the base rels. One consequence of the deduction is when
> constructing restrict lists for the joinrel, we lose the original
> restrict 'sc.sl = sg.sl', and this would fail the check
> have_partkey_equi_join(), which checks if there exists an equi-join
> condition for each pair of partition keys. As a result, this joinrel
> would not be considered as an input to further partitionwise joins.

> We need to fix this.

Uh ... why?  The pushed-down restrictions should result in pruning
away any prunable partitions at the scan level, leaving nothing for
the partitionwise join code to do.

                        regards, tom lane



Re: Partial join

От
Richard Guo
Дата:

On Thu, Aug 1, 2019 at 10:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <riguo@pivotal.io> writes:
> For the third query,  a rough investigation shows that, the qual 'sl =
> 5' and 'sc.sl = sg.sl' will form an equivalence class and generate two
> implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed
> down to the base rels. One consequence of the deduction is when
> constructing restrict lists for the joinrel, we lose the original
> restrict 'sc.sl = sg.sl', and this would fail the check
> have_partkey_equi_join(), which checks if there exists an equi-join
> condition for each pair of partition keys. As a result, this joinrel
> would not be considered as an input to further partitionwise joins.

> We need to fix this.

Uh ... why?  The pushed-down restrictions should result in pruning
away any prunable partitions at the scan level, leaving nothing for
the partitionwise join code to do.

Hmm..In the case of  multiple partition keys, for range partitioning, if
we have no clauses for a given key, any later keys would not be
considered for partition pruning.

That is to day, for table 'p partition by range (k1, k2)', quals like
'k2 = Const' would not prune partitions.

For query:

select * from p as t1 join p as t2 on t1.k1 = t2.k1 and t1.k2 = t2.k2
and t1.k2 = 2;

Since we don't consider ECs containing consts when generating join
clauses, we don't have restriction 't1.k2 = t2.k2' when building the
joinrel. As a result, partitionwise join is not considered as it
requires there existing an equi-join condition for each pair of
partition keys.

Is this a problem? What's your opinion?

Thanks
Richard

Re: Partial join

От
Richard Guo
Дата:


On Thu, Aug 1, 2019 at 7:46 PM Arne Roland <A.Roland@index.de> wrote:

Hello Richard,

thanks for your quick reply.


> We need to fix this.


Do you have a better idea than just keeping the old quals - possibly just the ones that get eliminated - in a separate data structure? Is the push down of quals the only case of elimination of quals, only counting the ones which happen before the restrict lists are generated?

In you case, the restriction 'sl = sl' is just not generated for the
join, because it forms an EC with const, which is not considered when
generating join clauses.

Please refer to the code snippet below:

@@ -1164,8 +1164,8 @@ generate_join_implied_equalities(PlannerInfo *root,
                List       *sublist = NIL;

                /* ECs containing consts do not need any further enforcement */
                if (ec->ec_has_const)
                        continue;

Thanks
Richard

Re: Partial join

От
Arne Roland
Дата:
Richard Guo <riguo@pivotal.io> wrote:
> Please refer to the code snippet below:
>
> @@ -1164,8 +1164,8 @@ generate_join_implied_equalities(PlannerInfo *root,
>                 List       *sublist = NIL;
>
>                 /* ECs containing consts do not need any further enforcement */
>                 if (ec->ec_has_const)
>                         continue;

Sorry, I'm quite busy currently. And thanks! That was a good read.

I might be wrong, but I think have_partkey_equi_join in joinrels.c should be aware of the const case. My naive approach would be keeping pointers to the first few constant clauses, which are referencing to a yet unmatched partition key, to keep the memory footprint feasible in manner similar to pk_has_clause. The question would be what to do, if there are a lot of const expressions on the part keys. One could palloc additional memory in that case, hoping that it will be quite rare. Or is there a different, better way to go about that?
Thank you for your feedback!

Regards
Arne