WALInsertLock contention

Поиск
Список
Период
Сортировка
От Robert Haas
Тема WALInsertLock contention
Дата
Msg-id AANLkTim6nRLc1PBMLtzfgu3Tb-diuckj0t2=Pm+z2aJw@mail.gmail.com
обсуждение исходный текст
Ответы Re: WALInsertLock contention  (Tatsuo Ishii <ishii@postgresql.org>)
Re: WALInsertLock contention  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-hackers
I've been thinking about the problem of $SUBJECT, and while I know
it's too early to think seriously about any 9.2 development, I want to
get my thoughts down in writing while they're fresh in my head.

It seems to me that there are two basic approaches to this problem.
We could either split up the WAL stream into several streams, say one
per database or one per tablespace or something of that sort, or we
could keep it as a single stream but try not to do so much locking
whilst in the process of getting it out the door.  Or we could try to
do both, and maybe ultimately we'll need to.  However, if the second
one is practical, it's got two major advantages: it'll probably be a
lot less invasive, and it won't add any extra fsync traffic.  In
thinking about how we might accomplish the goal of reducing lock
contention, it occurred to me there's probably no need for the final
WAL stream to reflect the exact order in which WAL is generated.

For example, suppose transaction T1 inserts a tuple into table A;
transaction T2 inserts a tuple into table B; T1 commits; T2 commits.
The commit records need to be in the right order, and all the actions
that are part of a given transaction need to precede the associated
commit record, but, for example, I don't think it would matter if you
emitted the commit record for T1 before T2's insert into B.  Or you
could switch the order in which you logged the inserts, since they're
not touching the same buffers.

So here's the basic idea.  Each backend, if it so desires, is
permitted to maintain a per-backend WAL buffer.  Per-backend WAL
buffers live in shared memory and can be accessed by any backend, but
the idea is that most of the time only one backend will be accessing
them, so that the locks won't be heavily contended.  Any WAL written
to a per-backend WAL buffer will eventually be transferred into the
main WAL buffers, and flushed.  When a process writes to a per-backend
WAL buffer, it writes (1) the actual WAL data and (2) the list of
buffers affected.  Those buffers are stamped with a fake LSN that
points back to the per-backend WAL buffer, and they can't be written
until the WAL has been moved from the per-backend WAL buffers to the
main WAL buffers.

So, if a buffer with a fake LSN needs to be (a) written back to the OS
or (b) modified by a backend other than the one that owns the fake
LSN, this triggers a flush of the per-backend WAL buffers to the main
WAL buffers.  When this happens, all the affected buffers get stamped
with a real LSN and the entries are discarded from the per-backend WAL
buffers.  Such a flush would also be needed when a backend commits or
otherwise needs an XLOG flush, or when there's no more per-backend
buffer space.  In theory, all of this taken together should mean that
WAL gets pushed out in larger chunks: a transaction that does three
inserts and commits should only need to grab WALInsertLock once,
instead of once per heap insert, once per index insert, and again for
the commit, though it'll have to write a bigger chunk of data when it
does get the lock.  It'll have to repeatedly grab the lock on its
per-backend WAL buffer, but ideally that's uncontended.

A further refinement would be to try to jigger things so that as a
backend fills up per-backend WAL buffers, it somehow throws them over
the fence to one of the background processes to write out.  For
short-running transactions, that won't really make any difference,
since the commit will force the per-backend buffers out to the main
buffers anyway.  But for long-running transactions it seems like it
could be quite useful; in essence, the task of assembling the final
WAL stream from the WAL output of individual backends becomes a
background activity, and ideally the background process doing the work
is the only one touching the cache lines being shuffled around.  Of
course, to make this work, backends would need a steady supply of
available per-backend WAL buffers.  Maybe shared buffers could be used
for this purpose, with the buffer header being marked in some special
way to indicate that this is what the buffer's being used for.

One not-so-good property of this algorithm is that the operation of
moving per-backend WAL into the main WAL buffers requires relocking
all the buffers whose fake LSNs now need to changed to "real" LSNs.
That could possible be problematic from a performance standpoint, and
there are deadlock risks to worry about too.

Any thoughts?  Other ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Debian readline/libedit breakage
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: WALInsertLock contention