Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Дата
Msg-id CAKU4AWpBXBoRzz3_CnUu-rg5E1kR8z1MCQQqs1gTmODj3-mUAQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Keeps tracking the uniqueness with UniqueKey  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: [PATCH] Keeps tracking the uniqueness with UniqueKey  (Andy Fan <zhihui.fan1213@gmail.com>)
Re: [PATCH] Keeps tracking the uniqueness with UniqueKey  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Re: [PATCH] Keeps tracking the uniqueness with UniqueKey  (Andy Fan <zhihui.fan1213@gmail.com>)
Список pgsql-hackers
Hi Ashutosh:

Appreciate for your comments!

On Thu, May 7, 2020 at 7:26 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Hi Andy,
Sorry for delay in review.

I  understand no one has obligation to do that,  and it must take reviewer's time 
and more, so really appreciated for it!  Hope I can provide more kindly help like
this in future as well. 
 
Your earlier patches are very large and it
requires some time to review those. I didn't finish reviewing those
but here are whatever comments I have till now on the previous set of
patches. Please see if any of those are useful to the new set.

Yes, I just realized the size as well, so I split them to smaller commit and 
each commit and be build and run make check successfully.  

All of your comments still valid except the last one (covert_subquery_uniquekeys) 
which has been fixed v7 version. 
 

+/*
+ * Return true iff there is an equal member in target for every
+ * member in members

Suggest reword: return true iff every entry in "members" list is also present
in the "target" list.

Will do, thanks!
 
This function doesn't care about multi-sets, so please
mention that in the prologue clearly.

+
+    if (root->parse->hasTargetSRFs)
+        return;

Why? A relation's uniqueness may be useful well before we work on SRFs.


Looks I misunderstand when the SRF function is executed.  I will fix this in v8. 

+
+    if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+        /*
+         * Set UniqueKey on member rel is useless, we have to recompute it at
+         * upper level, see populate_partitionedrel_uniquekeys for reference
+         */
+        return;

Handling these here might help in bottom up approach. We annotate each
partition here and then annotate partitioned table based on the individual
partitions. Same approach can be used for partitioned join produced by
partitionwise join.

+        /*
+         * If the unique index doesn't contain partkey, then it is unique
+         * on this partition only, so it is useless for us.
+         */

Not really. It could help partitionwise join.

+
+    /* Now we have the unique index list which as exactly same on all
childrels,
+     * Set the UniqueIndex just like it is non-partition table
+     */

I think it's better to annotate each partition with whatever unique index it
has whether or not global. That will help partitionwise join, partitionwise
aggregate/group etc.

Excellent catch! All the 3 items above is partitionwise join related, I need some time
to check how to handle that. 
 

+    /* A Normal group by without grouping set */
+    if (parse->groupClause)
+        add_uniquekey_from_sortgroups(root,
+                                      grouprel,
+                                      root->parse->groupClause);

Those keys which are part of groupClause and also form unique keys in the input
relation, should be recorded as unique keys in group rel. Knowing the minimal
set of keys allows more optimizations.

This is a very valid point now. I ignored this because I wanted to remove the AggNode
totally if the part of groupClause is unique,  However it doesn't happen later if there is
aggregation call in this query.


+
+    foreach(lc,  unionrel->reltarget->exprs)
+    {
+        exprs = lappend(exprs, lfirst(lc));
+        colnos = lappend_int(colnos, i);
+        i++;
+    }

This should be only possible when it's not UNION ALL. We should add some assert
or protection for that.

OK, actually I called this function in generate_union_paths. which handle
UNION case only.  I will add the Assert anyway. 
 

+
+    /* Fast path */
+    if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+        return;
+
+    outer_is_onerow = relation_is_onerow(outerrel);
+    inner_is_onerow = relation_is_onerow(innerrel);
+
+    outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+    innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+    clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+                                               restrictlist, jointype);

Something similar happens in select_mergejoin_clauses().

I didn't realized this before.  I will refactor this code accordingly if necessary
after reading that. 
 
At least from the
first reading of this patch, I get an impression that all these functions which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful to
the optimizer. 
+
+
+    if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+    {
+        foreach(lc, innerrel_ukey_ctx)
+        {
+            UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+            if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+            {
+                /* The UniqueKey on baserel is not useful on the joinrel */

A joining relation need not be a base rel always, it could be a join rel as
well.

good catch. 
 

+                ctx->useful = false;
+                continue;
+            }
+            if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+            {
+                /* Change the multi_nullvals to true at this case */

Need a comment explaining this. Generally, I feel, this and other functions in
this file need good comments explaining the logic esp. "why" instead of "what".
 
Exactly.  
 
+            else if (inner_is_onerow)
+            {
+                /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+                 * the join result will be onerow also as well. Note:
onerow implies
+                 * multi_nullvals = false.

I don't understand this comment. Why is there only one row in the other
relation which can join to this row?

I guess you may miss the onerow special case if I understand correctly. 
inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
innerrel can't be duplicated means:  t1.y = t2.pk;  so the finally result is onerow 
as well.  One of the overall query is  SELECT .. FROM t1, t2 where t2.y = t2.pk


I explained more about onerow in the v7 README.unqiuekey document, just copy 
it here. 

===
1. What is UniqueKey?
.... 
onerow is also a kind of UniqueKey which means the RelOptInfo will have 1 row at
most. it has a stronger semantic than others. like SELECT uk FROM t; uk is
normal unique key and may have different values.
SELECT colx FROM t WHERE uk = const.  colx is unique AND we have only 1 value. This
field can used for innerrel_is_unique. and also be used as an optimization for
this case. We don't need to maintain multi UniqueKey, we just maintain one with
onerow = true and exprs = NIL.

onerow is set to true only for 2 cases right now. 1) SELECT .. FROM t WHERE uk =
1; 2). SELECT aggref(xx) from t; // Without group by.
===

===
2. How is it maintained?
....
More considerations about onerow:
1. If relation with one row and it can't be duplicated, it is still possible
   contains mulit_nullvas after outer join.
2. If the either UniqueKey can be duplicated after join, the can get one row
   only when both side is one row AND there is no outer join.
3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
   row UniqueKey to normal unique key since we don't store exprs for one-row
   relation. get_exprs_from_uniquekeys will be used here.
===

and 3. in the v7 implementation,  the code struct is more clearer:)  

 

+    }
+    /*
+     * Calculate max_colno in subquery. In fact we can check this with
+     * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+     * is not set on UPPERREL_FINAL relation, so do it this way
+     */


Should/can we use the same logic to convert an expression in the subquery into
a Var of the outer query as in convert_subquery_pathkeys().

Yes,  my previous implementation is actually wrong. and should be  fixed it in v7. 
 
If the subquery doesn't have a reltarget set, we should be able to use reltarget 
of any of its paths since all of those should match the positions across subquery 
and the outer query.

but I think it should be rel->subroot->processed_tlist rather than reltarget?  Actually I still
a bit of uneasy about  rel->subroot->processed_tlist for some DML case, which the 
processed_tlist is different and I still not figure out its impact. 


Will continue reviewing your new set of patches as time permits.

Thank you!  Actually there is no big difference between v6 and v7 regarding the
 UniqueKey part except 2 bug fix.  However v7 has some more documents, 
comments improvement and code refactor/split, which may be helpful
for review. You may try v7 next time if v8 has not come yet:) 

Best Regards
Andy Fan 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: Why are wait events not reported even though it reads/writes atimeline history file?
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators