Re: A better way than tweaking NTUP_PER_BUCKET

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: A better way than tweaking NTUP_PER_BUCKET
Дата
Msg-id 20140125223346.GT31026@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: A better way than tweaking NTUP_PER_BUCKET  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: A better way than tweaking NTUP_PER_BUCKET  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > In the end, I believe we absolutely should do something about this.
> > Hashing a 64M-row table (requiring upwards of 8G) instead of hashing
> > a 2M-row table is really bad of us.
>
> Huh?  I don't see anything in the thread suggesting that we're doing that,
> or that changing NTUP_PER_BUCKET would fix it if we do.  Are you thinking
> of some other discussion?

This thread sprung from or was at least related to, as I recall, my
issues with NTUP_PER_BUCKET over the summer.  Perhaps I'm wrong, but
here's the thread I was referring to:

http://www.postgresql.org/message-id/20130404201612.GM4361@tamriel.snowman.net

Where I demonstrated that we decide to hash a much larger table, rather
than the smaller one, based on the estimated depth of the buckets and
including the costing from that, which is driven based on how big we
decide to make the hash table where we use NTUP_PER_BUCKET to pick a
table size much smaller than we really should be.

> AFAICT, there was no consensus in this thread on what to do, which
> probably has something to do with the lack of concrete performance
> tests presented to back up any particular proposal.

This I entirely agree with- more testing and more information on how
such a change impacts other workloads would be great.  Unfortunately,
while I've provided a couple of test cases and seen similar situations
on IRC, this is very data-dependent which makes it difficult to have
concrete answers for every workload.

Still, I'll try and spend some time w/ pg_bench's schema definition and
writing up some larger queries to run through it (aiui, the default set
of queries won't typically result in a hashjoin) and see what happens
there.
Thanks,
    Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marco Atzeri
Дата:
Сообщение: Re: Postgresql for cygwin - 3rd
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: What is happening on buildfarm member crake?