Re: [HACKERS] Poor memory context performance in large hash joins

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: [HACKERS] Poor memory context performance in large hash joins
Дата
Msg-id CAMkU=1wCoM1bO3G2mNnEMWgGhcHqH_M5O3VuwUQZD8zRYDdK0g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Poor memory context performance in large hash joins  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers

On Thu, Feb 23, 2017 at 10:47 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-02-23 17:28:26 -0500, Tom Lane wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
> > The number of new chunks can be almost as as large as the number of old
> > chunks, especially if there is a very popular value.  The problem is that
> > every time an old chunk is freed, the code in aset.c around line 968 has to
> > walk over all the newly allocated chunks in the linked list before it can
> > find the old one being freed.  This is an N^2 operation, and I think it has
> > horrible CPU cache hit rates as well.
>
> Maybe it's time to convert that to a doubly-linked list.

Yes, I do think so. Given that we only have that for full blocks, not
for small chunks, the cost seems neglegible.

That would also, partially, address the performance issue
http://archives.postgresql.org/message-id/d15dff83-0b37-28ed-0809-95a5cc7292ad%402ndquadrant.com
addresses, in a more realistically backpatchable manner.

Jeff, do you have a handy demonstrator?


Not exactly "handy", as it takes about 45 minutes to set up and uses >50GB of disk and 16GB of RAM, but here is a demonstration.

create table foobar as select CASE when random()<(420.0/540.0) then 1 else floor(random()*880000)::int END as titleid from generate_series(1,540000000);

create table foobar2 as select distinct titleid from foobar ;

analyze;

set work_mem TO "13GB";

select count(*) from foobar2 where not exists (select 1 from foobar t where t.titleid=foobar2.titleid);

This will run for effectively forever, unless it gets killed/dies due to OOM first.  If I have other things consuming some of my 16GB of RAM, then it gets OOM (which is not a complaint: it is as one should expect given that I told it work_mem was 13GB). If I have no other demands on RAM, then I can't tell if it would eventually OOM or not because of how long it would take to get that far.


I ran into this while evaluating Tom's responding patch, but you don't need to apply that patch to run this example and see the effect.  I don't have an example of a case that demonstrates the problem in the absence of a degenerate hash bucket.  I think there should be non-degenerate cases that trigger it, but I haven't been able to produce one yet.
 
> Although if the
> hash code is producing a whole lot of requests that are only a bit bigger
> than the separate-block threshold, I'd say It's Doing It Wrong.  It should
> learn to aggregate them into larger requests.

That's probably right, but we can't really address that in the
back-branches.  And to me this sounds like something we should address
in the branches, not just in master.  Even if we'd also fix the
hash-aggregation logic, I think such an O(n^2) behaviour in the
allocator is a bad idea in general, and we should fix it anyway.

I don't know how important a back-patch would be.  This is a toy case for me.  Presumably not for David Hinkle, except he doesn't want a hash join in the first place. While I want one that finishes in a reasonable time. It might depend on what happens to Tom's OOM patch.

It would be great if the allocator was made bullet-proof, but I don't think adding reverse links (or anything else back-patchable) is going to be enough to do that.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ants Aasma
Дата:
Сообщение: Re: [HACKERS] Checksums by default?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Poor memory context performance in large hash joins