Обсуждение: xlog.c: WALInsertLock vs. WALWriteLock

Поиск
Список
Период
Сортировка

xlog.c: WALInsertLock vs. WALWriteLock

От
fazool mein
Дата:
Hello guys,<br /><br />I'm writing a function that will read data from the buffer in xlog (i.e. from XLogCtl->pages
andXLogCtl->xlblocks). I want to make sure that I am doing it correctly.<br />For reading from the buffer, do I need
tolock WALInsertLock or WALWriteLock? Also, can you explain a bit the usage of 'LW_SHARED'. Can we use it for read
purposes?<br/><br />Thanks a lot.<br /><br /><br /> 

Re: xlog.c: WALInsertLock vs. WALWriteLock

От
David Fetter
Дата:
On Fri, Oct 22, 2010 at 12:08:54PM -0700, fazool mein wrote:
> Hello guys,
> 
> I'm writing a function that will read data from the buffer in xlog
> (i.e.  from XLogCtl->pages and XLogCtl->xlblocks).  I want to make
> sure that I am doing it correctly.

Got an example of what the function might look like?

> For reading from the buffer, do I need to lock WALInsertLock or
> WALWriteLock?  Also, can you explain a bit the usage of 'LW_SHARED'.
> Can we use it for read purposes?

Help me understand.  Do you foresee some kind of concurrency issue,
and if so, what?

Cheers,
David.
> 
> Thanks a lot.

-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Tallat Mahmood
Дата:
> I'm writing a function that will read data from the buffer in xlog
> (i.e.  from XLogCtl->pages and XLogCtl->xlblocks).  I want to make
> sure that I am doing it correctly.

Got an example of what the function might look like?

Say something like this:
bool ReadLogFromBuffer(char *buf, int len, XLogRecPtr p) 
which will mean that we want to read the log (records) into buf at position p of length len.
 

> For reading from the buffer, do I need to lock WALInsertLock or
> WALWriteLock?  Also, can you explain a bit the usage of 'LW_SHARED'.
> Can we use it for read purposes?

Help me understand.  Do you foresee some kind of concurrency issue,
and if so, what?

Yes. For example, while a process is reading from the buffer, another process may insert new records into the buffer. To give a specific example, walsender might want to read data from the buffer instead of reading log from disk. In parallel, there might be transactions on the server that modify the buffer.

Regards,
Tallat



Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Robert Haas
Дата:
On Fri, Oct 22, 2010 at 3:08 PM, fazool mein <fazoolmein@gmail.com> wrote:
> I'm writing a function that will read data from the buffer in xlog (i.e.
> from XLogCtl->pages and XLogCtl->xlblocks). I want to make sure that I am
> doing it correctly.
> For reading from the buffer, do I need to lock WALInsertLock or
> WALWriteLock? Also, can you explain a bit the usage of 'LW_SHARED'. Can we
> use it for read purposes?

Holding WALInsertLock in shared mode prevents other processes from
inserting WAL, or in other words it keeps the "end" position from
moving, while holding WALWriteLock in shared mode prevents other
processes from writing the WAL from the buffers out to the operating
system, or in other words it keeps the "start" position from moving.
So you could probably take WALInsertLock in shared mode, figure out
the current end of WAL position, release the lock; then take
WALWriteLock in shared mode, read any WAL before the end of WAL
position, and release the lock.  But note that this wouldn't guarantee
that you read all WAL as it's generated....

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Jeff Janes
Дата:
On Mon, Oct 25, 2010 at 6:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 22, 2010 at 3:08 PM, fazool mein <fazoolmein@gmail.com> wrote:
>> I'm writing a function that will read data from the buffer in xlog (i.e.
>> from XLogCtl->pages and XLogCtl->xlblocks). I want to make sure that I am
>> doing it correctly.
>> For reading from the buffer, do I need to lock WALInsertLock or
>> WALWriteLock? Also, can you explain a bit the usage of 'LW_SHARED'. Can we
>> use it for read purposes?
>
> Holding WALInsertLock in shared mode prevents other processes from
> inserting WAL, or in other words it keeps the "end" position from
> moving, while holding WALWriteLock in shared mode prevents other
> processes from writing the WAL from the buffers out to the operating
> system, or in other words it keeps the "start" position from moving.
> So you could probably take WALInsertLock in shared mode, figure out
> the current end of WAL position, release the lock;

Once you release the WALInsertLock, someone else can grab it and
overwrite the part of the buffer you think you are reading.
So I think you have to hold WALInsertLock throughout the duration of
the operation.

Of course it couldn't be overwritten if the wal record itself is not
yet written from buffer to the OS/disk.  But since you are not yet
holding the WALWriteLock, this could be happening at any time.

> then take
> WALWriteLock in shared mode, read any WAL before the end of WAL
> position, and release the lock.  But note that this wouldn't guarantee
> that you read all WAL as it's generated....

I don't think that holding WALWriteLock accomplishes much.  It
prevents part of the buffer from being written out to OS/disk, and
thus becoming eligible for being overwritten in the buffer, but the
WALInsertLock prevents it from actually being overwritten.  And what
if the part of the buffer you want to read was already eligible for
overwriting but not yet actually overwritten?  WALWriteLock won't
allow you to safely access it, but WALInsertLock will (assuming you
have a safe way to identify the record in the first place).  For
either case, holding it in shared mode would be sufficient.


Jeff


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Alvaro Herrera
Дата:
Excerpts from Jeff Janes's message of mar oct 26 12:22:38 -0300 2010:

> I don't think that holding WALWriteLock accomplishes much.  It
> prevents part of the buffer from being written out to OS/disk, and
> thus becoming eligible for being overwritten in the buffer, but the
> WALInsertLock prevents it from actually being overwritten.  And what
> if the part of the buffer you want to read was already eligible for
> overwriting but not yet actually overwritten?  WALWriteLock won't
> allow you to safely access it, but WALInsertLock will (assuming you
> have a safe way to identify the record in the first place).  For
> either case, holding it in shared mode would be sufficient.

And horrible for performance, I imagine.  Those locks are highly trafficked.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Jeff Janes's message of mar oct 26 12:22:38 -0300 2010:
>> I don't think that holding WALWriteLock accomplishes much.  It
>> prevents part of the buffer from being written out to OS/disk, and
>> thus becoming eligible for being overwritten in the buffer, but the
>> WALInsertLock prevents it from actually being overwritten.  And what
>> if the part of the buffer you want to read was already eligible for
>> overwriting but not yet actually overwritten?  WALWriteLock won't
>> allow you to safely access it, but WALInsertLock will (assuming you
>> have a safe way to identify the record in the first place).  For
>> either case, holding it in shared mode would be sufficient.

> And horrible for performance, I imagine.  Those locks are highly trafficked.

I think you might actually need *both* locks to ensure the WAL buffers
aren't changing underneath you.  If you don't have the walwriter locked
out, it is free to change the state of a buffer from "dirty" to
"written" and then to "prepared to receive next page of WAL".  If the
latter doesn't involve changing the content of the buffer today, it
still could tomorrow.

And on top of all that, there remains the problem that the piece of WAL
you want might already be gone from the buffers.

Might I suggest adopting the same technique walsender does, ie just read
the data back from disk?  There's a reason why we gave up trying to have
walsender read directly from the buffers.
        regards, tom lane


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
fazool mein
Дата:
<br /><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid
rgb(204,204, 204); padding-left: 1ex;"> Might I suggest adopting the same technique walsender does, ie just read<br />
thedata back from disk?  There's a reason why we gave up trying to have<br /> walsender read directly from the
buffers.<br/><br /></blockquote><br /></div>That is exactly what I do not want to do, i.e. read from disk, as long as
thepiece of WAL is available in the buffers. Can you please describe why walsender reading directly from the buffers
wasgiven up? To avoid a lot of locking? <br /> The locking issue might not be a problem considering synchronous
replication.In synchronous replication, the primary will anyways wait for the standby to send a confirmation before it
cando more WAL inserts. Hence, reading from buffers might be better in this case.<br /><br />So, as I understand from
theemails, we need to lock both WALWriteLock and WALInsertLock in exclusive mode for reading from buffers. Agreed?<br
/><br/>Thanks.<br /><br /><br /> 

Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Heikki Linnakangas
Дата:
On 26.10.2010 21:03, fazool mein wrote:
>> Might I suggest adopting the same technique walsender does, ie just read
>> the data back from disk?  There's a reason why we gave up trying to have
>> walsender read directly from the buffers.
>>
> That is exactly what I do not want to do, i.e. read from disk, as long as
> the piece of WAL is available in the buffers.

Why not? If the reason is performance, I'd like to see some performance 
numbers to show that it's worth the trouble. You could perhaps do a 
quick and dirty hack that doesn't do the locking 100% correctly first, 
and do some benchmarking on that to get a ballpark number of how much 
potential there is. Or run oprofile on the current walsender 
implementation to see how much time is currently spent reading WAL from 
the kernel buffers.

> Can you please describe why
> walsender reading directly from the buffers was given up? To avoid a lot of
> locking?

To avoid locking yes, and complexity in general.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Robert Haas
Дата:
On Tue, Oct 26, 2010 at 2:13 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 26.10.2010 21:03, fazool mein wrote:
>>>
>>> Might I suggest adopting the same technique walsender does, ie just read
>>> the data back from disk?  There's a reason why we gave up trying to have
>>> walsender read directly from the buffers.
>>>
>> That is exactly what I do not want to do, i.e. read from disk, as long as
>> the piece of WAL is available in the buffers.
>
> Why not? If the reason is performance, I'd like to see some performance
> numbers to show that it's worth the trouble. You could perhaps do a quick
> and dirty hack that doesn't do the locking 100% correctly first, and do some
> benchmarking on that to get a ballpark number of how much potential there
> is. Or run oprofile on the current walsender implementation to see how much
> time is currently spent reading WAL from the kernel buffers.
>
>> Can you please describe why
>> walsender reading directly from the buffers was given up? To avoid a lot
>> of
>> locking?
>
> To avoid locking yes, and complexity in general.

And the fact that it might allow the standby to get ahead of the
master, leading to silent database corruption.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
fazool mein
Дата:

On Tue, Oct 26, 2010 at 11:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Oct 26, 2010 at 2:13 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>
>> Can you please describe why
>> walsender reading directly from the buffers was given up? To avoid a lot
>> of
>> locking?
>
> To avoid locking yes, and complexity in general.

And the fact that it might allow the standby to get ahead of the
master, leading to silent database corruption.


I agree that the standby might get ahead, but this doesn't necessarily lead to database corruption. Here, the interesting case is what happens when the primary fails, which can lead to *either* of the following two cases:
1) The standby, due to some triggering mechanism, becomes the new primary. In this case, even if the standby was ahead, its fine.
2) The primary comes back as primary. In this case, the standby will connect again to the primary. At this point, *if* somehow we are able to detect that the standby is ahead, then we should abort the standby and create a standby from scratch.

I agree with Heikki that going through all this trouble only makes sense if there is a huge performance boost.




Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Robert Haas
Дата:
On Tue, Oct 26, 2010 at 2:57 PM, fazool mein <fazoolmein@gmail.com> wrote:
>
> On Tue, Oct 26, 2010 at 11:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Tue, Oct 26, 2010 at 2:13 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>> >
>> >> Can you please describe why
>> >> walsender reading directly from the buffers was given up? To avoid a
>> >> lot
>> >> of
>> >> locking?
>> >
>> > To avoid locking yes, and complexity in general.
>>
>> And the fact that it might allow the standby to get ahead of the
>> master, leading to silent database corruption.
>>
>
> I agree that the standby might get ahead, but this doesn't necessarily lead
> to database corruption. Here, the interesting case is what happens when the
> primary fails, which can lead to *either* of the following two cases:
> 1) The standby, due to some triggering mechanism, becomes the new primary.
> In this case, even if the standby was ahead, its fine.

True.

> 2) The primary comes back as primary. In this case, the standby will connect
> again to the primary. At this point, *if* somehow we are able to detect that
> the standby is ahead, then we should abort the standby and create a standby
> from scratch.

Unless you set restart_after_crash=off, the master could
crash-and-restart before you can do anything about it.  But that
doesn't exist in the 9.0 branch.

> I agree with Heikki that going through all this trouble only makes sense if
> there is a huge performance boost.

There's probably quite a large performance boost in the sync rep case
from allowing the master and standby to fsync() in parallel, but first
we need to get something that works at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Josh Berkus
Дата:
> I agree that the standby might get ahead, but this doesn't necessarily
> lead to database corruption. Here, the interesting case is what happens
> when the primary fails, which can lead to *either* of the following two
> cases:
> 1) The standby, due to some triggering mechanism, becomes the new
> primary. In this case, even if the standby was ahead, its fine.
> 2) The primary comes back as primary. In this case, the standby will
> connect again to the primary. At this point, *if* somehow we are able to
> detect that the standby is ahead, then we should abort the standby and
> create a standby from scratch.

Yes.  And we weren't able to implement that for 9.0.  It's worth
revisiting for 9.1.  In fact, the issue of "is the standby ahead of the
master" has come up repeatedly in potential failure scenarios; I think
we're going to need a fairly bulletproof method to determine this.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Robert Haas
Дата:
On Tue, Oct 26, 2010 at 3:00 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> I agree that the standby might get ahead, but this doesn't necessarily
>> lead to database corruption. Here, the interesting case is what happens
>> when the primary fails, which can lead to *either* of the following two
>> cases:
>> 1) The standby, due to some triggering mechanism, becomes the new
>> primary. In this case, even if the standby was ahead, its fine.
>> 2) The primary comes back as primary. In this case, the standby will
>> connect again to the primary. At this point, *if* somehow we are able to
>> detect that the standby is ahead, then we should abort the standby and
>> create a standby from scratch.
>
> Yes.  And we weren't able to implement that for 9.0.  It's worth
> revisiting for 9.1.  In fact, the issue of "is the standby ahead of the
> master" has come up repeatedly in potential failure scenarios; I think
> we're going to need a fairly bulletproof method to determine this.

Agreed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Markus Wanner
Дата:
On 10/26/2010 05:52 PM, Alvaro Herrera wrote:
> And horrible for performance, I imagine.  Those locks are highly trafficked.

Note, however, that offloading this to the file-system just moves
congestion there. So we are effectively saying that we expect
filesystems to do a better job (in that aspect) than our WAL implementation.

(Note that I'm not claiming that is or is not true - I didn't measure).

Regards

Markus Wanner


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Alvaro Herrera
Дата:
Excerpts from Markus Wanner's message of mié oct 27 11:44:20 -0300 2010:
> On 10/26/2010 05:52 PM, Alvaro Herrera wrote:
> > And horrible for performance, I imagine.  Those locks are highly trafficked.
> 
> Note, however, that offloading this to the file-system just moves
> congestion there. So we are effectively saying that we expect
> filesystems to do a better job (in that aspect) than our WAL implementation.

Well, you can just read at your pace from the filesystem; the data is
going to stay there for a long time.  WAL buffers are constantly moving,
and aren't as big.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: xlog.c: WALInsertLock vs. WALWriteLock

От
Fujii Masao
Дата:
On Wed, Oct 27, 2010 at 3:03 AM, fazool mein <fazoolmein@gmail.com> wrote:
>
>> Might I suggest adopting the same technique walsender does, ie just read
>> the data back from disk?  There's a reason why we gave up trying to have
>> walsender read directly from the buffers.
>>
>
> That is exactly what I do not want to do, i.e. read from disk, as long as
> the piece of WAL is available in the buffers.

I implemented before the patch which makes walsender read WAL from the buffer
without holding neither WALInsertLock nor WALWriteLock. That might be helpful
for you. Please see the following post.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00661.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center