Re: Yet another abort-early plan disaster on 9.3

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Yet another abort-early plan disaster on 9.3
Дата
Msg-id 542F343E.40906@fuzzy.cz
обсуждение исходный текст
Ответ на Re: Yet another abort-early plan disaster on 9.3  (Peter Geoghegan <peter.geoghegan86@gmail.com>)
Список pgsql-performance
On 3.10.2014 02:54, Peter Geoghegan wrote:
> On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Yes, it's only intractable if you're wedded to the idea of a tiny,
>> fixed-size sample. If we're allowed to sample, say, 1% of the
>> table, we can get a MUCH more accurate n_distinct estimate using
>> multiple algorithms, of which HLL is one. While n_distinct will
>> still have some variance, it'll be over a much smaller range.
>
> I think that HyperLogLog, as a streaming algorithm, will always
> require that the entire set be streamed. This doesn't need to be a
> big deal - in the case of my abbreviated key patch, it appears to
> basically be free because of the fact that we were streaming
> everything anyway. It's a very cool algorithm, with fixed overhead
> and constant memory usage. It makes very useful guarantees around
> accuracy.

I think you're mixing two things here - estimating the number of
distinct values in a sample (which can be done very efficiently using
HLL) and estimating the number of distinct values in the whole table.
For that HLL is not usable, unless you process all the data.

Sadly HLL is rather incompatible with the usual estimators, because the
ones I'm aware of need to know the number of occurences for the distinct
values etc.

But couldn't we just piggyback this on autovacuum? One of the nice HLL
features is that it's additive - you can build "partial counters" for
ranges of blocks (say, a few MBs per range), and then merge them when
needed. By keeping the parts it's possible to rebuild it separately.

But maybe this idea is way too crazy ...

regards
Tomas


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Yet another abort-early plan disaster on 9.3
Следующее
От: Rodrigo Barboza
Дата:
Сообщение: Re: auto vaccum is dying