Re: WIP: dynahash replacement for buffer table

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: WIP: dynahash replacement for buffer table
Дата
Msg-id CA+Tgmoa80iLreNDPhFVu856dcivsWF9x2sUcgNQ6Uy=PS56rWQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: dynahash replacement for buffer table  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: WIP: dynahash replacement for buffer table
Список pgsql-hackers
On Thu, Oct 16, 2014 at 6:53 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> When using shared_buffers = 96GB there's a performance benefit, but not
> huge:
> master (f630b0dd5ea2de52972d456f5978a012436115e):               153621.8
> master + LW_SHARED + lockless StrategyGetBuffer():              477118.4
> master + LW_SHARED + lockless StrategyGetBuffer() + chash:      496788.6
> master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 499562.7
>
> But with shared_buffers = 16GB:
> master (f630b0dd5ea2de52972d456f5978a012436115e):               177302.9
> master + LW_SHARED + lockless StrategyGetBuffer():              206172.4
> master + LW_SHARED + lockless StrategyGetBuffer() + chash:      413344.1
> master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 426405.8

Very interesting.  This doesn't show that chash is the right solution,
but it definitely shows that doing nothing is the wrong solution.  It
shows that, even with the recent bump to 128 lock manager partitions,
and LW_SHARED on top of that, workloads that actually update the
buffer mapping table still produce a lot of contention there.  This
hasn't been obvious to me from profiling, but the numbers above make
it pretty clear.

It also seems to suggest that trying to get rid of the memory barriers
isn't a very useful optimization project.  We might get a couple of
percent out of it, but it's pretty small potatoes, so unless it can be
done more easily than I suspect, it's probably not worth bothering
with.  An approach I think might have more promise is to have bufmgr.c
call the CHash stuff directly instead of going through buf_table.c.
Right now, for example, BufferAlloc() creates and initializes a
BufferTag and passes a pointer to that buffer tag to BufTableLookup,
which copies it into a BufferLookupEnt.  But it would be just as easy
for BufferAlloc() to put the BufferLookupEnt on its own stack, and
then you wouldn't need to copy the data an extra time.  Now a 20-byte
copy isn't a lot, but it's completely unnecessary and looks easy to
get rid of.

> I had to play with setting max_connections+1 sometimes to get halfway
> comparable results for master - unaligned data otherwise causes wierd
> results otherwise. Without doing that the performance gap between master
> 96/16G was even bigger. We really need to fix that...
>
> This is pretty awesome.

Thanks.  I wasn't quite sure how to test this or where the workloads
that it would benefit would be found, so I appreciate you putting time
into it.  And I'm really glad to hear that it delivers good results.

I think it would be useful to plumb the chash statistics into the
stats collector or at least a debugging dump of some kind for testing. They include a number of useful contention
measures,and I'd be
 
interested to see what those look like on this workload.  (If we're
really desperate for every last ounce of performance, we could also
disable those statistics in production builds.  That's probably worth
testing at least once to see if it matters much, but I kind of hope it
doesn't.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: WIP: dynahash replacement for buffer table
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Superuser connect during smart shutdown