Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Дата
Msg-id CAMkU=1wOk0WJjyUU0ZboZpdpSenGvhPXUW_K+vciorW9+tGaqg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Список pgsql-hackers
On Sat, Jun 20, 2015 at 8:28 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi Tomas,


I've lobotomized the sampling a bit to really produce a random set of blocks first, and that produces way better estimates:

   statistics target     estimate               random
   -----------------------------------------------------------------
   100                     429491 (10000x)   334430766 (14x)
   1000                   4240418  (1000x)   439010499 (10x)

Also, the number of sampled blocks is not that different. With target 100, the current sampler reads ~2900 blocks, while a completely random sampler uses 3000 blocks. So, where's the benefit?

I don't know you did.  The block sampling is already random, unless there is some overlooked bug, it can't be made more random.  Could you post the patch?

As I understand it, with a target of 100, it should be sampling exactly 30,000 blocks.  Some of those blocks will end up having no rows chosen from them (just by chance), but the blocks were still read.  If a few thousand of those blocks end up with no tuples in the final sample, the motivation for that was not to save IO, is just an artifact of how the random sampling work.

Thanks,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: checkpointer continuous flushing