Re: A better way than tweaking NTUP_PER_BUCKET

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: A better way than tweaking NTUP_PER_BUCKET
Дата
Msg-id CAFj8pRA7d3G0KhRt6W=iB2GUqk8vb6ZqAvx9Hb15bOpGS10-kA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A better way than tweaking NTUP_PER_BUCKET  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: A better way than tweaking NTUP_PER_BUCKET  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers



2014-01-27 Stephen Frost <sfrost@snowman.net>
* Simon Riggs (simon@2ndQuadrant.com) wrote:
> I don't see anything for 9.4 in here now.

Attached is what I was toying with (thought I had attached it previously
somewhere..  perhaps not), but in re-testing, it doesn't appear to do
enough to move things in the right direction in all cases.  I did play
with this a fair bit yesterday and while it improved some cases by 20%
(eg: a simple join between pgbench_accounts and pgbench_history), when
we decide to *still* hash the larger side (as in my 'test_case2.sql'),
it can cause a similairly-sized decrease in performance.  Of course, if
we can push that case to hash the smaller side (which I did by hand with
cpu_tuple_cost), then it goes back to being a win to use a larger number
of buckets.

I definitely feel that there's room for improvment here but it's not an
easily done thing, unfortunately.  To be honest, I was pretty surprised
when I saw that the larger number of buckets performed worse, even if it
was when we picked the "wrong" side to hash and I plan to look into that
more closely to try and understand what's happening.  My first guess
would be what Tom had mentioned over the summer- if the size of the
bucket array ends up being larger than the CPU cache, we can end up
paying a great deal more to build the hash table than it costs to scan
through the deeper buckets that we end up with as a result (particularly
when we're scanning a smaller table).  Of course, choosing to hash the
larger table makes that more likely..

This topic is interesting - we found very bad performance with hashing large tables with high work_mem. MergeJoin with quicksort was significantly faster.

I didn't deeper research - there is a possibility of virtualization overhead.

Regards

Pavel
 

        Thanks,

                Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: new json funcs
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Add %z support to elog/ereport?