Re: dynamic shared memory and locks

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: dynamic shared memory and locks
Дата
Msg-id CA+TgmoZUdX2=7eou10gig5b=DTPm0foXCx4n9VGPw70pBe9GTQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: dynamic shared memory and locks  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: dynamic shared memory and locks  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Mon, Jan 6, 2014 at 9:48 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> None of these ideas are a complete solution for LWLOCK_STATS.  In the
>> other three cases noted above, we only need an identifier for the lock
>> "instantaneously", so that we can pass it off to the logger or dtrace
>> or whatever.  But LWLOCK_STATS wants to hold on to data about the
>> locks that were visited until the end of the session, and it does that
>> using an array that is *indexed* by lwlockid.  I guess we could
>> replace that with a hash table.  Ugh.  Any suggestions?
>
> Yeah, that's not fun.  No good suggestions here offhand.

Replacing it with a hashtable turns out not to be too bad, either in
terms of code complexity or performance, so I think that's the way to
go.  I did some test runs with pgbench -S, scale factor 300, 32
clients, shared_buffers=8GB, five minute runs and got these results:

resultsr.lwlock-stats.32.300.300:tps = 195493.037962 (including
connections establishing)
resultsr.lwlock-stats.32.300.300:tps = 189985.964658 (including
connections establishing)
resultsr.lwlock-stats.32.300.300:tps = 197641.293892 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 193286.066063 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 191832.100877 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 191899.834211 (including
connections establishing)
resultsr.master.32.300.300:tps = 197939.111998 (including connections
establishing)
resultsr.master.32.300.300:tps = 198641.337720 (including connections
establishing)
resultsr.master.32.300.300:tps = 198675.404349 (including connections
establishing)

"master" is the master branch, commit
10a82cda67731941c18256e009edad4a784a2994.  "lwlock-stats" is the same,
but with LWLOCK_STATS defined.  "lwlock-stats-htab" is the same, with
the attached patch and LWLOCK_STATS defined.  The runs were
interleaved, but the results are shown here grouped by test run.  If
we assume that the 189k result is an outlier, then there's probably
some regression associated with the lwlock-stats-htab patch, but not a
lot.  Considering that this code isn't even compiled unless you have
LWLOCK_STATS defined, I think that's OK.

This is only part of the solution, of course: a complete solution will
involve making the hash table key something other than the lock ID.
What I'm thinking we can do is making the lock ID consist of two
unsigned 32-bit integers.  One of these will be stored in the lwlock
itself, which if my calculations are correct won't increase the size
of LWLockPadded on any common platforms (a 64-bit integer would).
Let's call this the "tranch id".  The other will be derived from the
LWLock address.  Let's call this the "instance ID".  We'll keep a
table of tranch IDs, which will be assigned consecutively starting
with 0.  We'll keep an array of metadata for tranches, indexed by
tranch ID, and each entry will have three associated pieces of
information: an array base, a stride length, and a printable name.
When we need to identify an lwlock in the log or to dtrace, we'll
fetch the tranch ID from the lwlock itself and use that to index into
the tranch metadata array.  We'll then take the address of the lwlock,
subtract the array base address for the tranch, and divide by the
stride length; the result is the instance ID.  When reporting the
user, we can report either the tranch ID directly or the associated
name for that tranch; in either case, we'll also report the instance
ID.

So initially we'll probably just have tranch 0: the main LWLock array.
 If we move buffer content and I/O locks to the buffer headers, we'll
define tranch 1 and tranch 2 with the same base address: the start of
the buffer descriptor array, and the same stride length, the size of a
buffer descriptor.  One will have the associated name "buffer content
lock" and the other "buffer I/O lock".  If we want, we can define
split the main LWLock array into several tranches so that we can more
easily identify lock manager locks, predicate lock manager locks, and
buffer mapping locks.

I like this system because it's very cheap - we only need a small
array of metadata and a couple of memory accesses to name a lock - but
it still lets us report data in a way that's actually *more*
human-readable than what we have now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: AK
Дата:
Сообщение: Re: How to reproduce serialization failure for a read only transaction.
Следующее
От: Florian Pflug
Дата:
Сообщение: Re: Re: How to reproduce serialization failure for a read only transaction.