Re: [HACKERS] multivariate statistics (v19)

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: [HACKERS] multivariate statistics (v19)
Дата
Msg-id CAFiTN-sdJmUYiARd0qBQKBXqs73096qD0Hd1QrNs=JQ3F_0QGQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] multivariate statistics (v19)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] multivariate statistics (v19)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Thu, Jan 5, 2017 at 3:27 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Thanks. Those plans match my experiments with the TPC-H data set, although
> I've been playing with the smallest scale (1GB).
>
> It's not very difficult to make the estimation error arbitrary large, e.g.
> by using perfectly correlated (identical) columns.

I have done an initial review for ndistint and histogram patches,
there are few review comments.

ndistinct
---------
1. Duplicate statistics:
postgres=# create statistics s with (ndistinct) on (a,c) from t;
2017-01-07 16:21:54.575 IST [63817] ERROR:  duplicate key value
violates unique constraint "pg_mv_statistic_name_index"
2017-01-07 16:21:54.575 IST [63817] DETAIL:  Key (staname,
stanamespace)=(s, 2200) already exists.
2017-01-07 16:21:54.575 IST [63817] STATEMENT:  create statistics s
with (ndistinct) on (a,c) from t;
ERROR:  duplicate key value violates unique constraint
"pg_mv_statistic_name_index"
DETAIL:  Key (staname, stanamespace)=(s, 2200) already exists.

For duplicate statistics, I think we can check the existence of the
statistics and give more meaningful error code something statistics
"s" already exist.

2. Typo
+ /*
+ * Sort the attnums, which makes detecting duplicies somewhat
+ * easier, and it does not hurt (it does not affect the efficiency,
+ * onlike for indexes, for example).
+ */
/onlike/unlike

3. Typo
/** Find attnims of MV stats using the mvoid.*/
int2vector *
find_mv_attnums(Oid mvoid, Oid *relid)

/attnims/attnums


histograms
--------------
+ if (matches[i] == MVSTATS_MATCH_FULL)
+ s += mvhist->buckets[i]->ntuples;
+ else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+ s += 0.5 * mvhist->buckets[i]->ntuples;

Isn't it will be better that take some percentage of the bucket based
on the number of distinct element for partial matching buckets.


+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+  int2vector *stakeys,
+  MVSerializedHistogram mvhist,
+  int nmatches, char *matches,
+  bool is_or)
+{
+ int i;

For each clause we are processing all the buckets, can't we use some
data structure which can make multi-dimensions information searching
faster.
Something like HTree, RTree, Maybe storing histogram in these formats
will be difficult?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] Transactions involving multiple postgres foreign servers