Re: [HACKERS] extended statistics: n-distinct

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: [HACKERS] extended statistics: n-distinct
Дата
Msg-id 20170322210345.zoqj4tmdyoh23mxm@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: [HACKERS] extended statistics: n-distinct  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: extended statistics: n-distinct  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Kyotaro HORIGUCHI wrote:

> At Mon, 20 Mar 2017 16:02:20 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20170320190220.ixlaueanxegqd5gr@alvherre.pgsql>

> > This is a new thread to present a version of the n-distinct patch that
> > IMO is close enough to commit.  There are some work items still.
> > There's some discussion on the topic of cross-column statistics:
> > https://wiki.postgresql.org/wiki/Cross_Columns_Stats
> > 
> > This problem is important enough that Kyotaro Horiguchi submitted
> > another patch that does the same thing:
> > https://www.postgresql.org/message-id/flat/20150828.173334.114731693.horiguchi.kyotaro%40lab.ntt.co.jp
> > This patch aims to provide the same functionality, keeping the design
> > general enough that other kinds of statistics can be added later (such
> > as functional dependencies, histograms and MCVs, all of which have been
> > previously submitted as patches by Tomas).
> 
> I may be stupid but I don't get the picture here, specifically
> about the relation to Tomas's patch. Does this work as
> infrastructure for Tomas's mv patch? Or in some other
> relationsip?

Well, this patch is Tomas' first patch, which I've reviewed and reworked
-- I changed some things that weren't properly finished, cleaned up the
code, made it all more robust, and made sure the sane cases work sanely
while the others rejected promptly (rather than throwing bogus error
messages at a later time, or crashing).

I didn't review your own n-distinct patch.  I don't think there's any
common code, but it would be very useful if you could try your test
scenarios and make sure they are handled sanely by this patch.

Regarding your question:

> Do you planning to realize correcting esitimation of joins
> perplexed by strong correlations?

There is a later patch in Tomas' series which I would like to get to
before PG10 closes, but it's not this patch.  It needs to be rebased on
top of this one.

Attached is v30, which includes some more cleanup.  Detailed commits can
be seen here:
https://github.com/2ndQuadrant/postgres/commits/dev/mvstats-ndistinct
In particular, this includes code from Tomas to consider mixing
ndistinct estimates from multiple multivariate statistic objects, which
is better than the old approach of only using the estimate when a
perfect match was found.  However, I lobotomized Tomas' selfuncs.c code
however and I need to revert that part before pushing -- essentially I
removed examine_variable() processing, which seemed a bit on the
expensive side for what we were doing, but that was a silly mistake.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Elvis Pranskevichus
Дата:
Сообщение: Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.
Следующее
От: David Steele
Дата:
Сообщение: Re: [HACKERS] increasing the default WAL segment size