Re: PATCH: adaptive ndistinct estimator v4

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: PATCH: adaptive ndistinct estimator v4
Дата
Msg-id 551AEF45.7010700@2ndquadrant.com
обсуждение исходный текст
Ответ на PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: PATCH: adaptive ndistinct estimator v4  (Greg Stark <stark@mit.edu>)
Re: PATCH: adaptive ndistinct estimator v4  (Jeff Janes <jeff.janes@gmail.com>)
Re: PATCH: adaptive ndistinct estimator v4  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi all,

attached is v4 of the patch implementing adaptive ndistinct estimator.

I've been looking into the strange estimates, mentioned on 2014/12/07:

>      values   current    adaptive
>      ------------------------------
>      106           99         107
>      106            8     6449190
>      1006          38     6449190
>      10006        327       42441

I suspected this might be some sort of rounding error in the numerical
optimization (looking for 'm' solving the equation from paper), but
turns out that's not the case.

The adaptive estimator is a bit unstable for skewed distributions, that
are not sufficiently smooth. Whenever f[1] or f[2] was 0 (i.e. there
were no values occuring exactly once or twice in the sample), the result
was rather off.

The simple workaround for this was adding a fallback to GEE when f[1] or
f[2] is 0. GEE is another estimator described in the paper, behaving
much better in those cases.

With the current version, I do get this (with statistics_target=10):

      values   current    adaptive
      ------------------------------
      106           99         108
      106            8         178
      1006          38        2083
      10006        327       11120

The results do change a bit based on the sample, but these values are a
good example of the values I'm getting.

The other examples (with skewed but smooth distributions) work as good
as before.

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bug #10432 failed to re-find parent key in index
Следующее
От: Tom Lane
Дата:
Сообщение: Re: GUC context information in the document.