Обсуждение: Latch for the WAL writer - further reducing idle wake-ups.

Поиск
Список
Период
Сортировка

Latch for the WAL writer - further reducing idle wake-ups.

От
Peter Geoghegan
Дата:
Attached patch latches up the WAL Writer, reducing wake-ups and thus
saving electricity in a way that is more-or-less analogous to my work
on the BGWriter:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6d90eaaa89a007e0d365f49d6436f35d2392cfeb

I am hoping this gets into 9.2 . I am concious of the fact that this
is quite late, but it the patch addresses an open item, the concluding
part of a much wider feature. In any case, it is a useful patch, that
ought to be committed at some point. I should point out:

1. This functionality was covered by the group commit patch that I
worked on back in January, which was submitted in advance of the
commitfest deadline. However, an alternative implementation was
ultimately committed that did not consider WAL Writer wake-ups.

2. The WAL writer is the most important auxiliary process to latch-up.
Though it is tied with the BGWriter at 5 wake-ups per second by
default, I consider the WAL Writer to be more important than the
BGWriter because I find it much more plausible that the WAL Writer
really won't need to be around for much of the time, as with a
read-mostly work load. "Cloud" type deployments often have read-mostly
workloads, so we can still save some power even if the DB is actually
servicing lots of read queries. That being the case, it would be a
shame if we didn't get this last one in, as it adds a lot more value
than any of the other patches.

3. This is a fairly simple patch; as I've said, it works in a way that
is quite analogous to the BGWriter patch, applying lessons learned
there.

With this patch, my instrumentation shows that wake-ups when Postgres
reaches a fully idle state are just 2.7 per second for the entire
postgres process group, quite an improvement on the 7.6 per second in
HEAD. This is exactly what you'd expect from a reduction of 5 wake-ups
per second to 0.1 per second on average for the WAL Writer.

I have determined this with PowerTOP 1.13 on my Fedora 16 laptop. Here
is an example session, began after the cluster reached a fully idle
state, with this patch applied (if, alternatively, I want to see
things at per-process granularity, I can get that from PowerTOP 1.98
beta 1, which is available from my system's package manager):

[peter@peterlaptop powertop-1.13]$ sudo ./powertop -d --time=300
[sudo] password for peter:
PowerTOP 1.13   (C) 2007 - 2010 Intel Corporation

Collecting data for 300 seconds


Cn              Avg residency
C0 (cpu running)        ( 2.8%)
polling          0.0ms ( 0.0%)
C1 mwait      0.5ms ( 1.0%)
C2 mwait      0.9ms ( 0.6%)
C3 mwait      1.4ms ( 0.1%)
C4 mwait      6.7ms (95.4%)
P-states (frequencies)
  2.61 Ghz     5.7%
  1.80 Ghz     0.1%
  1200 Mhz     0.1%
  1000 Mhz     0.2%
   800 Mhz    93.5%
Wakeups-from-idle per second : 171.3    interval: 300.0s
no ACPI power usage estimate available
Top causes for wakeups:
  23.0% (134.5)   chrome
***SNIP***
   0.5% (  2.7)   postgres
***SNIP***

This is a rather low number, that will make us really competitive with
other RDBMSs in this area. Recall that we started from 11.5 wake-ups
for an idle Postgres cluster with a default configuration.

To put the 2.7 number in context, I measured MySQL's wake-ups at 2.2
last year (mysql-server version 5.1.56, Fedora 14), though I
subsequently saw much higher numbers (over 20 per second) for version
5.5.19 on Fedora 16, so you should probably take that with a grain of
salt - I don't know anything about MySQL, and so cannot really be sure
that I'm making an objective comparison in comparing that number with
the number of wake-ups Postgres has with a stock postgresql.conf.

I've employed the same trick used when a buffer is dirtied for the
BGWriter - most of the time, the SetLatch() calls will check a single
flag, and find it already set. We are careful to only "arm" the latch
with a call to ResetLatch() when it is really needed. Rather than
waiting for the clocksweep to be lapped, we wait for a set number of
iterations of consistent inactivity.

I've made the WAL Writer use its process latch, rather than the latch
that was previously within XLogCtl. This seems much more idiomatic, as
in doing so we reserve the right to register generic signal handlers.
With a non-process latch, we'd have to worry about signal invalidation
issues on an ongoing basis, since the handler wouldn't be calling
SetLatch() against the latch we waited on. I have also added a comment
in latch.h generally advising against ad-hoc shared latches where .

I took initial steps to quantify the performance hit from this patch.
A simple "insert.sql" pgbench-tools benchmark on my laptop, with a
generic configuration showed no problems, though I do not assume that
this conclusively proves the case. Results:

http://walwriterlatch.staticloud.com/

My choice of XLogInsert() as an additional site at which to call
SetLatch() was one that wasn't taken easily, and frankly I'm not
entirely confident that I couldn't have been just as effective while
placing the SetLatch() call in a less hot, perhaps higher-level
codepath. That said, MarkBufferDirty() is also a very hot code path,
and it's where one of the SetLatch() calls goes in the earlier
BGWriter patch, besides which I haven't been able to quantify any
performance hit as yet.

Thoughts?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Вложения

Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> Attached patch latches up the WAL Writer, reducing wake-ups and thus
> saving electricity in a way that is more-or-less analogous to my work
> on the BGWriter:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6d90eaaa89a007e0d365f49d6436f35d2392cfeb
> I am hoping this gets into 9.2 . I am concious of the fact that this
> is quite late, but it the patch addresses an open item, the concluding
> part of a much wider feature.

It is getting a bit late to be considering such changes for 9.2, but
I'm willing to review and commit this if there's not anybody who feels
strongly that it's too late.  Personally I think it's in the nature of
cleanup and so fair game as long as we haven't formally started beta.
However I will confess to some bias about wanting to get the server's
idle wake-up rate down, because Fedora people have been bugging me
about that for a long time now.  So I'm probably not the best person to
objectively evaluate whether we should hold this for 9.3.  Comments?

Schedule questions aside, I'm disturbed by this bit:

> My choice of XLogInsert() as an additional site at which to call
> SetLatch() was one that wasn't taken easily, and frankly I'm not
> entirely confident that I couldn't have been just as effective while
> placing the SetLatch() call in a less hot, perhaps higher-level
> codepath.

Adding any contention at all to XLogInsert doesn't seem like a smart
idea, even if you failed to measure any problem in the specific tests
you made.  I wonder whether we could not improve matters by adding
an additional bool "wal_writer_needs_wakening" in the state that's
considered to be protected by WALInsertLock.  XLogInsert would check
this while still holding the lock, and only consider that it needs to do
a SetLatch if the flag was set, whereupon it would clear it before
releasing the lock.  In the normal case this would add one uncontended
fetch followed by boolean-test-and-jump to the work done while holding
the lock, which should be pretty negligible.  Then, the WAL writer would
need to take WALInsertLock to set that flag, but presumably it should
only be doing that when there is no contention for the lock.  (In fact,
we could have it do a ConditionalLockAcquire on WALInsertLock for the
purpose, and consider that failure means it shouldn't go to sleep after
all.)

Now this might sound pretty much equivalent to testing the latch's
is_set flag; perhaps it is and I'm worrying over nothing.  But I'm
thinking that the wal_writer_needs_wakening flag would be in a cache
line that an acquirer of WALInsertLock would have to get ownership of
anyway, if it is adjacent to variables that XLogInsert has to manipulate
anyway.  On the other hand, the WAL writer's process latch would be in
some other cache line that would also need to get passed around a lot,
if it's touched during every XLogInsert.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Robert Haas
Дата:
On Wed, May 2, 2012 at 7:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It is getting a bit late to be considering such changes for 9.2, but
> I'm willing to review and commit this if there's not anybody who feels
> strongly that it's too late.  Personally I think it's in the nature of
> cleanup and so fair game as long as we haven't formally started beta.
> However I will confess to some bias about wanting to get the server's
> idle wake-up rate down, because Fedora people have been bugging me
> about that for a long time now.  So I'm probably not the best person to
> objectively evaluate whether we should hold this for 9.3.  Comments?

Well, I feel that one of the weaknesses of our CommitFest process is
that changes like this (which are really pretty small) end up having
the same deadline as patches that are large (command triggers,
checksums, etc.); in fact, they sometimes end up having an earlier
deadline, because the people doing the big stuff end up continuing to
hack on it for another couple months while the door is shut to smaller
improvements.  So I'm not going to object if you feel like slipping
this one in.  I looked it over myself and I think it's broadly
reasonable, although I'm not too sure about the particular criteria
chosen for sending the WAL writer to sleep and waking it up again.
And like you I'd like to see some more improvement in this area.

> Adding any contention at all to XLogInsert doesn't seem like a smart
> idea, even if you failed to measure any problem in the specific tests
> you made.  I wonder whether we could not improve matters by adding
> an additional bool "wal_writer_needs_wakening" in the state that's
> considered to be protected by WALInsertLock.

I am skeptical about this, although it could be right.  It could also
be better the way Peter did it; a fetch of an uncontended cache line
is pretty cheap.  Another approach - which I think might be better
still - is to not bother kicking the WAL writer and let it wake up
when it wakes up.  Maybe have it hibernate for 3 seconds instead of
10, or something like that.  It seems unlikely to cause any real
problem if WAL writer takes a couple seconds to get with the program
after a long period of inactivity; note that an async commit will kick
it anyway, and a sync commit will probably half to flush WAL whether
the WAL writer wakes up or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> ... It seems unlikely to cause any real
> problem if WAL writer takes a couple seconds to get with the program
> after a long period of inactivity; note that an async commit will kick
> it anyway, and a sync commit will probably half to flush WAL whether
> the WAL writer wakes up or not.

That's a good point.  What about only kicking the WAL writer in code
paths where a backend found itself having to write/flush WAL for itself?
The added overhead is very surely negligible in such a situation.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Robert Haas
Дата:
On Wed, May 2, 2012 at 11:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> ... It seems unlikely to cause any real
>> problem if WAL writer takes a couple seconds to get with the program
>> after a long period of inactivity; note that an async commit will kick
>> it anyway, and a sync commit will probably half to flush WAL whether
>> the WAL writer wakes up or not.
>
> That's a good point.  What about only kicking the WAL writer in code
> paths where a backend found itself having to write/flush WAL for itself?
> The added overhead is very surely negligible in such a situation.

Yeah, I think that would make sense, though I'd probably still argue
for a hibernation period not quite so long as ten seconds.  Actually,
what I'd really like is for this to be adaptive: if we find that
there's no WAL to write, increase the time until the next wakeup by 10
ms until we hit the maximum of, say, 3 seconds.  If we find that there
is WAL to write, cut the time until the next wakeup in half until we
hit a minimum of, say, 20ms.  And, if we're forced to write/flush WAL
ourselves, or we async commit, kick the WAL writer in the pants and
wake him up right away.  That way we're willing to get
super-aggressive when needed, but we don't stay there very long once
the pounding ends.  Also, we avoid having a hard "cut" between regular
sleeps and deep hibernation; instead, we kind of gradually drift off.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Heikki Linnakangas
Дата:
On 03.05.2012 03:41, Robert Haas wrote:
> On Wed, May 2, 2012 at 7:21 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Adding any contention at all to XLogInsert doesn't seem like a smart
>> idea, even if you failed to measure any problem in the specific tests
>> you made.  I wonder whether we could not improve matters by adding
>> an additional bool "wal_writer_needs_wakening" in the state that's
>> considered to be protected by WALInsertLock.
>
> I am skeptical about this, although it could be right.  It could also
> be better the way Peter did it; a fetch of an uncontended cache line
> is pretty cheap.

I'm very wary of adding any extra shared memory accesses to XLogInsert. 
I spent a lot of time trying to eliminate them in my XLogInsert scaling 
patch. It might be ok if the flag is usually not modified, and we don't 
add any extra barrier instructions in there, but it would be better to 
avoid it.

One simple idea would be to only try to set the latch every 100 
XLogInsert calls in the backend. That would cut whatever contention it 
might cause by a factor of 100, making it negligible.

>  Another approach - which I think might be better
> still - is to not bother kicking the WAL writer and let it wake up
> when it wakes up.  Maybe have it hibernate for 3 seconds instead of
> 10, or something like that.  It seems unlikely to cause any real
> problem if WAL writer takes a couple seconds to get with the program
> after a long period of inactivity; note that an async commit will kick
> it anyway, and a sync commit will probably half to flush WAL whether
> the WAL writer wakes up or not.

Yeah, that'd be even simpler.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Magnus Hagander
Дата:
On Thu, May 3, 2012 at 2:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 2, 2012 at 7:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It is getting a bit late to be considering such changes for 9.2, but
>> I'm willing to review and commit this if there's not anybody who feels
>> strongly that it's too late.  Personally I think it's in the nature of
>> cleanup and so fair game as long as we haven't formally started beta.
>> However I will confess to some bias about wanting to get the server's
>> idle wake-up rate down, because Fedora people have been bugging me
>> about that for a long time now.  So I'm probably not the best person to
>> objectively evaluate whether we should hold this for 9.3.  Comments?
>
> Well, I feel that one of the weaknesses of our CommitFest process is
> that changes like this (which are really pretty small) end up having
> the same deadline as patches that are large (command triggers,
> checksums, etc.); in fact, they sometimes end up having an earlier
> deadline, because the people doing the big stuff end up continuing to
> hack on it for another couple months while the door is shut to smaller
> improvements.  So I'm not going to object if you feel like slipping
> this one in.  I looked it over myself and I think it's broadly
> reasonable, although I'm not too sure about the particular criteria
> chosen for sending the WAL writer to sleep and waking it up again.
> And like you I'd like to see some more improvement in this area.

I agree that it's ok to slip it in given that it's "finishing off a
patch from earlier". I think it's reasonable to hold it to a little
bit higher review stadards since it's that late in the cycle though,
such as two people reviewing it before it goes in (or 1 reviewer + 1
committer - and of course, unless it's a truly trivial patch). Which
it seems you both are doing now, so that makes it ok ;)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Peter Geoghegan
Дата:
On 3 May 2012 10:56, Magnus Hagander <magnus@hagander.net> wrote:
> I agree that it's ok to slip it in given that it's "finishing off a
> patch from earlier". I think it's reasonable to hold it to a little
> bit higher review stadards since it's that late in the cycle though,
> such as two people reviewing it before it goes in (or 1 reviewer + 1
> committer - and of course, unless it's a truly trivial patch). Which
> it seems you both are doing now, so that makes it ok ;)

Right. It's a simple, largely mechanical patch, that doesn't have any
behavioural changes, and is of strategic importance, so I thought it
was worthy of special consideration, without actually expecting it.

Attached patch removes the questionable SetLatch() call, under the
assumption that it's okay if the WALWriter, having entered hibernation
due to sustained inactivity (10 seconds) subsequently takes up to 5
seconds (2.5 on average) to notice that it has work to do. These
values may need to be tweaked. I have not bothered with making the
sleep time adaptive, because it's probably too late to do that.

This latest revision also covers the checkpointer. The code for that
is far simpler than that for the WAL Writer, so it doesn't
particularly feel like I'm pushing my luck by slipping that into
something to be slipped in. I should not have excluded it before,
since it accounts for another 2 wake-ups per second. All told, this
new revision sees Postgres wake-ups stabilise at 0.9 per second. With
the checkpointer code included, we roundly beat MySQL in this area,
which will be a nice advocacy message for 9.2, though I probably
shouldn't be quoted on that until I get the opportunity to go back and
make absolutely sure that I've been fair.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Вложения

Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> This latest revision also covers the checkpointer. The code for that
> is far simpler than that for the WAL Writer, so it doesn't
> particularly feel like I'm pushing my luck by slipping that into
> something to be slipped in.

Well ... maybe, or maybe not, or maybe you are just poking at a sore
spot that was already created by the patch to make a separate
checkpointer process.  What bothers me in looking at this is that the
main loop of the checkpointer includes an AbsorbFsyncRequests() call,
which is now the only wakeup condition that isn't covered by latch
logic or a predictable time delay.  A long sleep period could easily
result in overflow of the fsync request queue, which is not good for
performance.  I'm inclined to think that we'd better add logic to
ForwardFsyncRequest() to set the latch once the queue is, say, more
than half full.

I also notice that the separate-checkpointer patch failed to rename
assorted things like BgWriterCommLock, BgWriterRequest,
BgWriterShmemStruct, which are all 100% inappropriately named now.
And it still contains various obsolete comments referring to itself
as the background writer.  Will see about cleaning that up.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Simon Riggs
Дата:
On 7 May 2012 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> This latest revision also covers the checkpointer. The code for that
>> is far simpler than that for the WAL Writer, so it doesn't
>> particularly feel like I'm pushing my luck by slipping that into
>> something to be slipped in.
>
> Well ... maybe, or maybe not, or maybe you are just poking at a sore
> spot that was already created by the patch to make a separate
> checkpointer process.  What bothers me in looking at this is that the
> main loop of the checkpointer includes an AbsorbFsyncRequests() call,
> which is now the only wakeup condition that isn't covered by latch
> logic or a predictable time delay.  A long sleep period could easily
> result in overflow of the fsync request queue, which is not good for
> performance.  I'm inclined to think that we'd better add logic to
> ForwardFsyncRequest() to set the latch once the queue is, say, more
> than half full.

OK

> I also notice that the separate-checkpointer patch failed to rename
> assorted things like BgWriterCommLock, BgWriterRequest,
> BgWriterShmemStruct, which are all 100% inappropriately named now.
> And it still contains various obsolete comments referring to itself
> as the background writer.  Will see about cleaning that up.

For want of a better name, keeping them the same seemed best.

If you have a suggested name change, I'd be happy to oblige.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 7 May 2012 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I also notice that the separate-checkpointer patch failed to rename
>> assorted things like BgWriterCommLock, BgWriterRequest,
>> BgWriterShmemStruct, which are all 100% inappropriately named now.
>> And it still contains various obsolete comments referring to itself
>> as the background writer. �Will see about cleaning that up.

> For want of a better name, keeping them the same seemed best.

I was just thinking s/BgWriter/Checkpointer/, do you think that's too
long?
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Simon Riggs
Дата:
On 7 May 2012 19:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 7 May 2012 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I also notice that the separate-checkpointer patch failed to rename
>>> assorted things like BgWriterCommLock, BgWriterRequest,
>>> BgWriterShmemStruct, which are all 100% inappropriately named now.
>>> And it still contains various obsolete comments referring to itself
>>> as the background writer.  Will see about cleaning that up.
>
>> For want of a better name, keeping them the same seemed best.
>
> I was just thinking s/BgWriter/Checkpointer/, do you think that's too
> long?

CheckpointerCommLock
CheckpointerShmemStruct
work OK

CheckpointerRequest
sounds a little vague, but can be tweaked

It also leaves the situation that we have a catalog view called
pg_stat_bgwriter that would be accessing "checkpointer" things. That's
really the thorny one that I wasn't sure how to handle. Good example
of why we shouldn't expose internals too much.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> It also leaves the situation that we have a catalog view called
> pg_stat_bgwriter that would be accessing "checkpointer" things. That's
> really the thorny one that I wasn't sure how to handle. Good example
> of why we shouldn't expose internals too much.

Yeah, that's a bit unfortunate but changing it doesn't seem like a good
idea.  The names I intended to change are all internal.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Simon Riggs
Дата:
On 7 May 2012 20:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> It also leaves the situation that we have a catalog view called
>> pg_stat_bgwriter that would be accessing "checkpointer" things. That's
>> really the thorny one that I wasn't sure how to handle. Good example
>> of why we shouldn't expose internals too much.
>
> Yeah, that's a bit unfortunate but changing it doesn't seem like a good
> idea.  The names I intended to change are all internal.

OK, I'll change just the internal names.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> Attached patch removes the questionable SetLatch() call, under the
> assumption that it's okay if the WALWriter, having entered hibernation
> due to sustained inactivity (10 seconds) subsequently takes up to 5
> seconds (2.5 on average) to notice that it has work to do. These
> values may need to be tweaked. I have not bothered with making the
> sleep time adaptive, because it's probably too late to do that.

Now that I've actually read the patch, rather than just responding to
your description of it, I find myself entirely unhappy with the proposed
changes in the walwriter's sleep logic.  You have introduced race
conditions (it is NOT okay to reset the latch somewhere below the top of
the loop) and degraded the walwriter's signal response time in normal
non-hibernation state, to solve a problem not in evidence; to wit that
backends spend too much time signaling the walwriter.  Given the
location of the only existing SetLatch call for the purpose, I find that
proposition more than a bit doubtful.  I see little or no value in
trying to keep the walwriter's procLatch set when it's not hibernating,
and I definitely don't think it is worth the risk of introducing bugs
when we're about to start beta.  I'm intending to rip all that out and
go back to the plain "ResetLatch at the top of the loop, WaitLatch at
the bottom" design, with the hibernation logic just controlling the
timeout on the WaitLatch call.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Peter Geoghegan
Дата:
On 8 May 2012 22:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now that I've actually read the patch, rather than just responding to
> your description of it, I find myself entirely unhappy with the proposed
> changes in the walwriter's sleep logic.  You have introduced race
> conditions (it is NOT okay to reset the latch somewhere below the top of
> the loop)

Yes, there is some checking of flags before the potential ResetLatch()
call, which may be acted on. The code there is almost identical to
that of the extant bgwriter code. I was under the impression that this
did not amount to a race, though it's rather late now, and I'm feeling
under the weather, so I have not taken steps to verify that I have it
right. Arguably, you'd want somebody's SetLatch call to be ignored if

> and degraded the walwriter's signal response time in normal
> non-hibernation state, to solve a problem not in evidence; to wit that
> backends spend too much time signaling the walwriter.  Given the
> location of the only existing SetLatch call for the purpose, I find that
> proposition more than a bit doubtful.

I do too. The elaborate logic to reduce that overhead was nothing more
than a vestige of the first version. I apologise for making such a
basic error.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 8 May 2012 22:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Now that I've actually read the patch, rather than just responding to
>> your description of it, I find myself entirely unhappy with the proposed
>> changes in the walwriter's sleep logic. �You have introduced race
>> conditions (it is NOT okay to reset the latch somewhere below the top of
>> the loop)

> Yes, there is some checking of flags before the potential ResetLatch()
> call, which may be acted on. The code there is almost identical to
> that of the extant bgwriter code.

Um, yes, I noticed that shortly after sending my previous message.
I'm pretty unhappy about the current state of the bgwriter loop, too.
I rather wonder whether that coding explains the "postmaster failed to
shut down" errors that we've been seeing lately in the buildfarm.
Not noticing a shutdown signal promptly would go a long way towards
explaining that.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Peter Geoghegan
Дата:
On 9 May 2012 00:21, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Yes, there is some checking of flags before the potential ResetLatch()
> call, which may be acted on. The code there is almost identical to
> that of the extant bgwriter code. I was under the impression that this
> did not amount to a race, though it's rather late now, and I'm feeling
> under the weather, so I have not taken steps to verify that I have it
> right. Arguably, you'd want somebody's SetLatch call to be ignored if

Sent too early. That should be: Arguably, you'd want somebody's
SetLatch call to be ignored under the circumstances that that could
happen in both the bgwriter, and the WALWriter within my recent patch.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
I've applied the walwriter/checkpointer patch, with the mentioned
re-simplification of the logic.  While measuring that, I noticed that
the stats collector was now the biggest repeated-wakeup culprit, so
I took the time to latch-ify it as well.  AFAICS we no longer have
anything that wakes up oftener than once every five seconds when the
system is idle, so life is looking pretty good in powertop land.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
On further reflection I've realized that there's a really unpleasant
consequence of the walwriter change as-committed: it breaks the former
guarantee that async commits would reach disk within at most 3 times
the WalWriterDelay setting.  They will still get written within at most
3 walwriter cycles, but a lone async commit occurring when the writer
is hibernating could see a delay much longer than before.  This seems to
me to be unacceptable.  Probably nobody cares that much about the exact
multiplier of 3, but if the delay could be an order of magnitude or two
more than that, that's going to make users of async commits unhappy.

So what we need here is for XLogSetAsyncXactLSN() to be able to boot the
walwriter out of hibernate mode.  I still don't care in the least for
the original hack of using the state of the procLatch to indicate
whether the walwriter is hibernating, but we can add a separate flag
instead so as to avoid having every trip through XLogSetAsyncXactLSN
do a SetLatch call (which would be bad anyway since it would prevent
the walwriter from sleeping normally).  Since XLogSetAsyncXactLSN
has to take the info_lck anyway, we might as well make this new flag
be protected by info_lck.  The walwriter won't need to change the
flag's state very often, by definition, so that end of it isn't going
to cost anything noticeable.
        regards, tom lane


Re: Latch for the WAL writer - further reducing idle wake-ups.

От
Tom Lane
Дата:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 9 May 2012 00:21, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Yes, there is some checking of flags before the potential ResetLatch()
>> call, which may be acted on. The code there is almost identical to
>> that of the extant bgwriter code. I was under the impression that this
>> did not amount to a race, though it's rather late now, and I'm feeling
>> under the weather, so I have not taken steps to verify that I have it
>> right. Arguably, you'd want somebody's SetLatch call to be ignored if

> Sent too early. That should be: Arguably, you'd want somebody's
> SetLatch call to be ignored under the circumstances that that could
> happen in both the bgwriter, and the WALWriter within my recent patch.

I don't believe that for a moment, and even if it accidentally manages
to not fail in today's code, it's *way* too fragile for my taste.
I believe the bgwriter code needs to be fixed similarly to the way
I changed the walwriter patch, namely that there needs to be a separate
flag (not the latch's isSet state) advertising whether the bgwriter is
currently in hibernation mode.  Now, unlike the walwriter, there isn't
any convenient place to keep such a flag where it can be inspected
inside an already-existing spinlock or LWLock guard.  However, unlike
the walwriter there is not a correctness requirement for the bgwriter
to wake up promptly.  So I think we could just put a bool
"bgwriter_sleeping" in ProcGlobal and accept the possibility of other
processes sometimes seeing stale values.  Thoughts?
        regards, tom lane


bgwriter idle-mode behavior (was Re: Latch for the WAL writer)

От
Tom Lane
Дата:
I wrote:
> I believe the bgwriter code needs to be fixed similarly to the way
> I changed the walwriter patch, namely that there needs to be a separate
> flag (not the latch's isSet state) advertising whether the bgwriter is
> currently in hibernation mode.

After further study of the bgwriter hibernation patch (commit
6d90eaaa89a007e0d365f49d6436f35d2392cfeb), I think that my worries about
race conditions in the use of the bgwriter's latch are really the least
of its problems.  BgBufferSync embodies a rather carefully tuned
feedback control loop, and I think these changes broke it.  In the first
place, that feedback loop is built on the assumption that BgBufferSync
is executed at fixed intervals, which isn't true anymore; and in the
second place, the loop is not driven so much by the rate of buffers
being dirtied as it is by the rate of buffers being allocated.  To be
concrete, if there is a constant but slow rate of buffer consumption,
the bgwriter will eventually lap the strategy scan and then stay there,
resulting in BgBufferSync returning true every time even though actually
the system is doing things.  This results in bgwriter.c deciding it's in
hibernation mode, whereupon we have a scenario where backends will be
signaling it all the time.  The way BgWriterNap is coded, that means
BgBufferSync is not executed at a fixed BgWriterDelay interval, but at
variable intervals from BgWriterDelay up to BGWRITER_HIBERNATE_MS, which
pretty much destroys the predictability of the feedback loop.

My proposal for fixing this is that

(1) BgBufferSync should return true (OK to hibernate) only if it's
lapped the strategy scan *and* recent_alloc is zero, meaning no new
buffers were allocated anywhere since last time.

(2) We should remove the bgwriter wakening calls from MarkBufferDirty
and SetBufferCommitInfoNeedsSave, and instead place one in buffer
allocation.

(3) The bottom-of-loop logic in bgwriter should be along the lines of
rc = WaitLatch(..., BgWriterDelay);if (rc == WL_TIMEOUT && can_hibernate){    set global flag to tell backends to kick
bgwriter       if they allocate a buffer;    WaitLatch(..., BGWRITER_HIBERNATE_MS);    clear global flag;}
 

In comparison to the existing code, this method guarantees immediate
response to any signal (latch-setting event), and it ensures that
if we extend the normal sleep time for the bgwriter, the extra sleep
covers only an interval in which no new buffer allocations happened.
That provision seems to me to justify pretending that that interval
simply didn't exist for the purposes of the feedback control loop,
which allows us to not have to rethink how that loop works.

Comments?
        regards, tom lane


Re: bgwriter idle-mode behavior (was Re: Latch for the WAL writer)

От
Heikki Linnakangas
Дата:
On 10.05.2012 00:34, Tom Lane wrote:
> After further study of the bgwriter hibernation patch (commit
> 6d90eaaa89a007e0d365f49d6436f35d2392cfeb), I think that my worries about
> race conditions in the use of the bgwriter's latch are really the least
> of its problems.  BgBufferSync embodies a rather carefully tuned
> feedback control loop, and I think these changes broke it.  In the first
> place, that feedback loop is built on the assumption that BgBufferSync
> is executed at fixed intervals, which isn't true anymore; and in the
> second place, the loop is not driven so much by the rate of buffers
> being dirtied as it is by the rate of buffers being allocated.  To be
> concrete, if there is a constant but slow rate of buffer consumption,
> the bgwriter will eventually lap the strategy scan and then stay there,
> resulting in BgBufferSync returning true every time even though actually
> the system is doing things.  This results in bgwriter.c deciding it's in
> hibernation mode, whereupon we have a scenario where backends will be
> signaling it all the time.  The way BgWriterNap is coded, that means
> BgBufferSync is not executed at a fixed BgWriterDelay interval, but at
> variable intervals from BgWriterDelay up to BGWRITER_HIBERNATE_MS, which
> pretty much destroys the predictability of the feedback loop.
>
> My proposal for fixing this is that
>
> (1) BgBufferSync should return true (OK to hibernate) only if it's
> lapped the strategy scan *and* recent_alloc is zero, meaning no new
> buffers were allocated anywhere since last time.
>
> (2) We should remove the bgwriter wakening calls from MarkBufferDirty
> and SetBufferCommitInfoNeedsSave, and instead place one in buffer
> allocation.

Hmm, that means that if you don't dirty any pages, bgwriter will wake up 
even though it has no real work to do. It only wakes up to advance its 
scan. I guess that's OK, if the system is busy doing read-only things 
anyway, a few extra wakeups don't matter.

Waking bgwriter in buffer allocation makes sense also because the buffer 
allocation stats will be updated more promptly that way. At the moment, 
if the bgwriter hibernates, the bgwriter only updates the stats every 10 
seconds.

> (3) The bottom-of-loop logic in bgwriter should be along the lines of
>
>     rc = WaitLatch(..., BgWriterDelay);
>     if (rc == WL_TIMEOUT&&  can_hibernate)
>     {
>         set global flag to tell backends to kick bgwriter
>             if they allocate a buffer;
>         WaitLatch(..., BGWRITER_HIBERNATE_MS);
>         clear global flag;
>     }
>
> In comparison to the existing code, this method guarantees immediate
> response to any signal (latch-setting event), and it ensures that
> if we extend the normal sleep time for the bgwriter, the extra sleep
> covers only an interval in which no new buffer allocations happened.
> That provision seems to me to justify pretending that that interval
> simply didn't exist for the purposes of the feedback control loop,
> which allows us to not have to rethink how that loop works.
>
> Comments?

Seems reasonable. Would you like me to write a patch, or are you already 
on it?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: bgwriter idle-mode behavior (was Re: Latch for the WAL writer)

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 10.05.2012 00:34, Tom Lane wrote:
>> After further study of the bgwriter hibernation patch (commit
>> 6d90eaaa89a007e0d365f49d6436f35d2392cfeb), I think that my worries about
>> race conditions in the use of the bgwriter's latch are really the least
>> of its problems.

> Seems reasonable. Would you like me to write a patch, or are you already 
> on it?

Done already, but please review the commit.
        regards, tom lane