Re: multivariate statistics / patch v7

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: multivariate statistics / patch v7
Дата
Msg-id 20150713.175144.45075432.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: multivariate statistics / patch v7  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: multivariate statistics / patch v7  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: multivariate statistics / patch v7  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
Hi, Thanks for the detailed explaination. I misunderstood the
code (more honest speaking, din't look so close there). Then I
looked it closer.


At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<559C76D4.2030805@2ndquadrant.com>
> FWIW this was a stupid bug in update_match_bitmap_histogram(), which
> initially handled only AND clauses, and thus assumed the "match" of a
> bucket can only decrease. But for OR clauses this is exactly the
> opposite (we assume no buckets match and add buckets matching at least
> one of the clauses).
> 
> With this fixed, the estimates look like this:
> 

> IMHO pretty accurate estimates - no issue with OR clauses.

Ok, I understood the diferrence between what I thought and what
you say. The code is actually concious of OR clause but is looks
somewhat confused.

Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*  attributes appeared in the whole clauses regardless of and/or
exprsby walking whole the clause tree.
 
  Perhaps this is the measure to early bailout.

2.1. Within every disjunction elements, collect mv-related  attributes while checking whether the all leaf nodes (binop
or ifnull) are compatible by (eventually) walking whole the  clause tree.
 

2.2. Check if all the collected attribute are contained in  mv-stats columns.

3. Finally, clauseset_mv_selectivity_histogram() (and others).
  This funciton applies every ExprOp onto every attribute in  every histogram backes and (tries to) make the boolean
operationof the result bitmaps.
 

I have some comments on the implement and I also try to find the
solution for them.


1. The flow above looks doing  very similiar thins repeatedly.

2. I believe what the current code does can be simplified.

3. As you mentioned in comments, some additional infrastructure  needed.

After all, I think what we should do after this are as follows,
as the first step.

- Add the means to judge the selectivity operator(?) by other than oprrest of the op of ExprOp. (You missed neqsel
already)
 I suppose one solution for this is adding oprmvstats taking 'm', 'h' and 'f' and their combinations. Or for the
convenience,it would be a fixed-length string like this.
 
 oprname | oprmvstats =       | 'mhf' <>      | 'mhf' <       | 'mh-' >       | 'mh-' >=      | 'mh-' <=      | 'mh-'
 This would make the code in clause_is_mv_compatible like this.
 > oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */ > if (oprmvstats & types) > { >    *attnums =
bms_add_member(*attnums,var->varattno); >    return true; > } > return false;
 

- Current design just manage to work but it is too complicated and hardly have affinity with the existing estimation
framework.I proposed separation of finding stats phase and calculation phase, but I would like to propose transforming
RestrictInfo(andfinding mvstat) phase and running the transformed RestrictInfo phase after looking close to the patch.
 
 I think transforing RestrictInfo makes the situnation better. Since it nedds different information, maybe it is better
tohave new struct, say, RestrictInfoForEstimate (boo!). Then provide mvstatssel() to use in the new struct. The rough
lookingof the code would be like below. 
 
 clauselist_selectivity() {   ...   RestrictInfoForEstmate *esclause =     transformClauseListForEstimation(root,
clauses,varRelid);   ...
 
   return clause_selectivity(esclause): }
 clause_selectivity(RestrictInfoForEstmate *esclause) {   if (IsA(clause, RestrictInfo))...   if (IsA(clause,
RestrictInfoForEstimate))  {      RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause;      if
(ecl->selfunc)     {         sx = ecl->selfunc(root, ecl);      }   }   if (IsA(clause, Var))... }
 
  transformClauseListForEstimation(...) {   ...
   relid = collect_mvstats_info(root, clause, &attlist);   if (!relid) return;   if (get_mvstats_hook)        mvstats =
(*get_mvstats_hoook)(root, relid, attset);   else        mvstats = find_mv_stats(root, relid, attset)) } ...
 

> I've pushed this to github [1] but I need to do some additional
> fixes. I also had to remove some optimizations while fixing this, and
> will have to reimplement those.
> 
> That's not to say that the handling of OR-clauses is perfectly
> correct. After looking at clauselist_selectivity_or(), I believe it's
> a bit broken and will need a bunch of fixes, as explained in the
> FIXMEs I pushed to github.
> 
> [1] https://github.com/tvondra/postgres/tree/mvstats

I don't see whether it is doable or not, and I suppose you're
unwilling to change the big picture, so I will consider the idea
and will show you the result, if it turns out to be possible and
promising.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: TABLESAMPLE patch is really in pretty sad shape
Следующее
От: "Shulgin, Oleksandr"
Дата:
Сообщение: Re: [PATCH] Generalized JSON output functions