Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Дата	8 мая 2020 г. 04:57:24
Msg-id	CAKU4AWpBXBoRzz3_CnUu-rg5E1kR8z1MCQQqs1gTmODj3-mUAQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey (Andy Fan <zhihui.fan1213@gmail.com>) Re: [PATCH] Keeps tracking the uniqueness with UniqueKey (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) Re: [PATCH] Keeps tracking the uniqueness with UniqueKey (Andy Fan <zhihui.fan1213@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi Ashutosh:

Appreciate for your comments!

On Thu, May 7, 2020 at 7:26 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

Hi Andy,
Sorry for delay in review.

I understand no one has obligation to do that, and it must take reviewer's time

and more, so really appreciated for it! Hope I can provide more kindly help like

this in future as well.

Your earlier patches are very large and it
requires some time to review those. I didn't finish reviewing those
but here are whatever comments I have till now on the previous set of
patches. Please see if any of those are useful to the new set.

Yes, I just realized the size as well, so I split them to smaller commit and

each commit and be build and run make check successfully.

All of your comments still valid except the last one (covert_subquery_uniquekeys)

which has been fixed v7 version.

+/*
+ * Return true iff there is an equal member in target for every
+ * member in members

Suggest reword: return true iff every entry in "members" list is also present
in the "target" list.

Will do, thanks!

This function doesn't care about multi-sets, so please
mention that in the prologue clearly.

+
+ if (root->parse->hasTargetSRFs)
+ return;

Why? A relation's uniqueness may be useful well before we work on SRFs.

Looks I misunderstand when the SRF function is executed. I will fix this in v8.

+
+ if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+ /*
+ * Set UniqueKey on member rel is useless, we have to recompute it at
+ * upper level, see populate_partitionedrel_uniquekeys for reference
+ */
+ return;

Handling these here might help in bottom up approach. We annotate each
partition here and then annotate partitioned table based on the individual
partitions. Same approach can be used for partitioned join produced by
partitionwise join.

+ /*
+ * If the unique index doesn't contain partkey, then it is unique
+ * on this partition only, so it is useless for us.
+ */

Not really. It could help partitionwise join.

+
+ /* Now we have the unique index list which as exactly same on all
childrels,
+ * Set the UniqueIndex just like it is non-partition table
+ */

I think it's better to annotate each partition with whatever unique index it
has whether or not global. That will help partitionwise join, partitionwise
aggregate/group etc.

Excellent catch! All the 3 items above is partitionwise join related, I need some time

to check how to handle that.

+ /* A Normal group by without grouping set */
+ if (parse->groupClause)
+ add_uniquekey_from_sortgroups(root,
+ grouprel,
+ root->parse->groupClause);

Those keys which are part of groupClause and also form unique keys in the input
relation, should be recorded as unique keys in group rel. Knowing the minimal
set of keys allows more optimizations.

This is a very valid point now. I ignored this because I wanted to remove the AggNode

totally if the part of groupClause is unique, However it doesn't happen later if there is

aggregation call in this query.

+
+ foreach(lc, unionrel->reltarget->exprs)
+ {
+ exprs = lappend(exprs, lfirst(lc));
+ colnos = lappend_int(colnos, i);
+ i++;
+ }

This should be only possible when it's not UNION ALL. We should add some assert
or protection for that.

OK, actually I called this function in generate_union_paths. which handle

UNION case only. I will add the Assert anyway.

+
+ /* Fast path */
+ if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+ return;
+
+ outer_is_onerow = relation_is_onerow(outerrel);
+ inner_is_onerow = relation_is_onerow(innerrel);
+
+ outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+ innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+ clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+ restrictlist, jointype);

Something similar happens in select_mergejoin_clauses().

I didn't realized this before. I will refactor this code accordingly if necessary

after reading that.

At least from the
first reading of this patch, I get an impression that all these functions which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful to
the optimizer.

+
+
+ if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+ {
+ foreach(lc, innerrel_ukey_ctx)
+ {
+ UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+ if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+ {
+ /* The UniqueKey on baserel is not useful on the joinrel */

A joining relation need not be a base rel always, it could be a join rel as
well.

good catch.

+ ctx->useful = false;
+ continue;
+ }
+ if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+ {
+ /* Change the multi_nullvals to true at this case */

Need a comment explaining this. Generally, I feel, this and other functions in
this file need good comments explaining the logic esp. "why" instead of "what".

Exactly.

+ else if (inner_is_onerow)
+ {
+ /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+ * the join result will be onerow also as well. Note:
onerow implies
+ * multi_nullvals = false.

I don't understand this comment. Why is there only one row in the other
relation which can join to this row?

I guess you may miss the onerow special case if I understand correctly.

inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".

innerrel can't be duplicated means: t1.y = t2.pk; so the finally result is onerow

as well. One of the overall query is SELECT .. FROM t1, t2 where t2.y = t2.pk;

I explained more about onerow in the v7 README.unqiuekey document, just copy

it here.

===

1. What is UniqueKey?

....

onerow is also a kind of UniqueKey which means the RelOptInfo will have 1 row at
most. it has a stronger semantic than others. like SELECT uk FROM t; uk is
normal unique key and may have different values.
SELECT colx FROM t WHERE uk = const. colx is unique AND we have only 1 value. This
field can used for innerrel_is_unique. and also be used as an optimization for
this case. We don't need to maintain multi UniqueKey, we just maintain one with
onerow = true and exprs = NIL.

onerow is set to true only for 2 cases right now. 1) SELECT .. FROM t WHERE uk =
1; 2). SELECT aggref(xx) from t; // Without group by.

===

2. How is it maintained?

....

More considerations about onerow:
1. If relation with one row and it can't be duplicated, it is still possible
contains mulit_nullvas after outer join.
2. If the either UniqueKey can be duplicated after join, the can get one row
only when both side is one row AND there is no outer join.
3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
row UniqueKey to normal unique key since we don't store exprs for one-row
relation. get_exprs_from_uniquekeys will be used here.

===

and 3. in the v7 implementation, the code struct is more clearer:)

+ }
+ /*
+ * Calculate max_colno in subquery. In fact we can check this with
+ * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+ * is not set on UPPERREL_FINAL relation, so do it this way
+ */

Should/can we use the same logic to convert an expression in the subquery into
a Var of the outer query as in convert_subquery_pathkeys().

Yes, my previous implementation is actually wrong. and should be fixed it in v7.

If the subquery doesn't have a reltarget set, we should be able to use reltarget

of any of its paths since all of those should match the positions across subquery

and the outer query.

but I think it should be rel->subroot->processed_tlist rather than reltarget? Actually I still

a bit of uneasy about rel->subroot->processed_tlist for some DML case, which the

processed_tlist is different and I still not figure out its impact.

Will continue reviewing your new set of patches as time permits.

Thank you! Actually there is no big difference between v6 and v7 regarding the

UniqueKey part except 2 bug fix. However v7 has some more documents,

comments improvement and code refactor/split, which may be helpful

for review. You may try v7 next time if v8 has not come yet:)

Best Regards

Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Fujii Masao
Дата: 08 мая 2020 г., 04:42:18
Сообщение: Re: Why are wait events not reported even though it reads/writes atimeline history file?

Следующее

От: Fujii Masao
Дата: 08 мая 2020 г., 05:31:42
Сообщение: Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Предыдущее

Следующее