Re: Yet another abort-early plan disaster on 9.3

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: Yet another abort-early plan disaster on 9.3
Дата	21 ноября 2014 г. 21:38:34
Msg-id	CAMkU=1yoiQt6+_XMACL=PTv-aHwc1E0BgiHGhKW3HL+ByN6zpw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Yet another abort-early plan disaster on 9.3 (Tomas Vondra <tv@fuzzy.cz>)
Ответы	Re: Yet another abort-early plan disaster on 9.3 (Tomas Vondra <tv@fuzzy.cz>)
Список	pgsql-performance

Дерево обсуждения

On Fri, Oct 10, 2014 at 10:53 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 10.10.2014 14:10, Tomas Vondra wrote:
> Dne 10 Říjen 2014, 13:16, Greg Stark napsal(a):
>> On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Yes, it's only intractable if you're wedded to the idea of a tiny,
>>> fixed-size sample. If we're allowed to sample, say, 1% of the table, we
>>> can get a MUCH more accurate n_distinct estimate using multiple
>>> algorithms, of which HLL is one. While n_distinct will still have some
>>> variance, it'll be over a much smaller range.
>>
>> I've gone looking for papers on this topic but from what I read this
>> isn't so. To get any noticeable improvement you need to read 10-50% of
>> the table and that's effectively the same as reading the entire table
>> -- and it still had pretty poor results. All the research I could find
>> went into how to analyze the whole table while using a reasonable
>> amount of scratch space and how to do it incrementally.
>
> I think it's really difficult to discuss the estimation without some basic
> agreement on what are the goals. Naturally, we can't get a perfect
> estimator with small samples (especially when the sample size is fixed and
> not scaling with the table). But maybe we can improve the estimates
> without scanning most of the table?
>
> FWIW I've been playing with the adaptive estimator described in [1] and
> the results looks really interesting, IMHO. So far I was testing it on
> synthetic datasets outside the database, but I plan to use it instead of
> our estimator, and do some more tests.

Attached is an experimental patch implementing the adaptive estimator.

It was fairly simple (although it's a bit messy). It only computes the
estimates for the "scalar" case (i.e. data types that we can sort).
Implementing this for the "minimal" case is possible, but requires a bit
more work.

It only computes the estimate and prints a WARNING with both the current
and new estimate, but the old estimate is stored.

When I run this patch on the regression database, I get a case where the current method is exact but the adaptive one is off:

WARNING: ndistinct estimate current=676.00 adaptive=906.00

select count(distinct stringu1) from onek;

676

It should be seeing every single row, so I don't know why the adaptive method is off. Seems like a bug.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее

От: Tom Lane
Дата: 21 ноября 2014 г., 08:12:15
Сообщение: Re: A pessimistic planner

Следующее

От: Tomas Vondra
Дата: 23 ноября 2014 г., 23:20:07
Сообщение: Re: Yet another abort-early plan disaster on 9.3

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Yet another abort-early plan disaster on 9.3

Предыдущее

Следующее