Re: Using multiple extended statistics for estimates

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Using multiple extended statistics for estimates
Дата
Msg-id 20191107.133820.1542779476322302789.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Using multiple extended statistics for estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Using multiple extended statistics for estimates
Список pgsql-hackers
Hello.

At Wed, 6 Nov 2019 20:58:49 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in 
> >Here is a slightly polished v2 of the patch, the main difference being
> >that computing clause_attnums was moved to a separate function.
> >
> 
> This time with the attachment ;-)

This patch is a kind of straight-forward, which repeats what the
previous statext_mcv_clauselist_selectivity did as long as remaining
clauses matches any of MV-MCVs. Almost no regression in the cases
where zero or just one MV-MCV applies to the given clause list.

It applies cleanly on the current master and seems working as
expected.


I have some comments.

Could we have description in the documentation on what multiple
MV-MCVs are used in a query? And don't we need some regression tests?


+/*
+ * statext_mcv_clause_attnums
+ *        Recalculate attnums from compatible but not-yet-estimated clauses.

It returns attnums collected from multiple clause*s*. Is the name OK
with "clause_attnums"?

The comment says as if it checks the compatibility of each clause but
the work is done in the caller side. I'm not sure such strictness is
required, but it might be better that the comment represents what
exactly the function does.


+ */
+static Bitmapset *
+statext_mcv_clause_attnums(int nclauses, Bitmapset **estimatedclauses,
+                           Bitmapset **list_attnums)

The last two parameters are in the same type in notation but in
different actual types.. that is, one is a pointer to Bitmapset*, and
another is an array of Bitmaptset*. The code in the function itself
suggests that, but it would be helpful if a brief explanation of the
parameters is seen in the function comment.

+        /*
+         * Recompute attnums in the remaining clauses (we simply use the bitmaps
+         * computed earlier, so that we don't have to inspect the clauses again).
+         */
+        clauses_attnums = statext_mcv_clause_attnums(list_length(clauses),

Couldn't we avoid calling this function twice with the same parameters
at the first round in the loop?

+        foreach(l, clauses)
         {
-            stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
-            *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+            /*
+             * If the clause is compatible with the selected statistics, mark it
+             * as estimated and add it to the list to estimate.
+             */
+            if (list_attnums[listidx] != NULL &&
+                bms_is_subset(list_attnums[listidx], stat->keys))
+            {
+                stat_clauses = lappend(stat_clauses, (Node *) lfirst(l));
+                *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
+            }

The loop runs through all clauses every time. I agree that that is
better than using a copy of the clauses to avoid to step on already
estimated clauses, but maybe we need an Assertion that the listidx is
not a part of estimatedclauses to make sure no clauses are not
estimated twice.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: base backup client as auxiliary backend process
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Attempt to consolidate reading of XLOG page