Обсуждение: Reducing bgwriter wakeups

Поиск
Список
Период
Сортировка

Reducing bgwriter wakeups

От
Simon Riggs
Дата:
Recent changes for power reduction mean that we now issue a wakeup
call to the bgwriter every time we set a hint bit.

However cheap that is, its still overkill.

My proposal is that we wakeup the bgwriter whenever a backend is
forced to write a dirty buffer, a job the bgwriter should have been
doing.

This significantly reduces the number of wakeup calls and allows the
bgwriter to stay asleep even when very light traffic happens, which is
good because the bgwriter is often the last process to sleep.

Seems useful to have an explicit discussion on this point, especially
in view of recent performance results.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Reducing bgwriter wakeups

От
Robert Haas
Дата:
On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Recent changes for power reduction mean that we now issue a wakeup
> call to the bgwriter every time we set a hint bit.
>
> However cheap that is, its still overkill.
>
> My proposal is that we wakeup the bgwriter whenever a backend is
> forced to write a dirty buffer, a job the bgwriter should have been
> doing.
>
> This significantly reduces the number of wakeup calls and allows the
> bgwriter to stay asleep even when very light traffic happens, which is
> good because the bgwriter is often the last process to sleep.
>
> Seems useful to have an explicit discussion on this point, especially
> in view of recent performance results.

I don't see what this has to do with recent performance results, so
please elaborate.  Off-hand, I don't see any point in getting cheap.
It seems far more important to me that the background writer become
active when needed than that we save some trivial amount of power by
waiting longer before activating it.  If we're concerned about saving
power, then IMHO what we should be worried about is that the wal
writer is still waking up 5x/s.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Reducing bgwriter wakeups

От
Simon Riggs
Дата:
On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Recent changes for power reduction mean that we now issue a wakeup
>> call to the bgwriter every time we set a hint bit.
>>
>> However cheap that is, its still overkill.
>>
>> My proposal is that we wakeup the bgwriter whenever a backend is
>> forced to write a dirty buffer, a job the bgwriter should have been
>> doing.
>>
>> This significantly reduces the number of wakeup calls and allows the
>> bgwriter to stay asleep even when very light traffic happens, which is
>> good because the bgwriter is often the last process to sleep.
>>
>> Seems useful to have an explicit discussion on this point, especially
>> in view of recent performance results.
>
> I don't see what this has to do with recent performance results, so
> please elaborate.  Off-hand, I don't see any point in getting cheap.
> It seems far more important to me that the background writer become
> active when needed than that we save some trivial amount of power by
> waiting longer before activating it.

Then you misunderstand, since I am advocating waking it when needed.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Reducing bgwriter wakeups

От
Robert Haas
Дата:
On Sun, Feb 19, 2012 at 4:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Recent changes for power reduction mean that we now issue a wakeup
>>> call to the bgwriter every time we set a hint bit.
>>>
>>> However cheap that is, its still overkill.
>>>
>>> My proposal is that we wakeup the bgwriter whenever a backend is
>>> forced to write a dirty buffer, a job the bgwriter should have been
>>> doing.
>>>
>>> This significantly reduces the number of wakeup calls and allows the
>>> bgwriter to stay asleep even when very light traffic happens, which is
>>> good because the bgwriter is often the last process to sleep.
>>>
>>> Seems useful to have an explicit discussion on this point, especially
>>> in view of recent performance results.
>>
>> I don't see what this has to do with recent performance results, so
>> please elaborate.  Off-hand, I don't see any point in getting cheap.
>> It seems far more important to me that the background writer become
>> active when needed than that we save some trivial amount of power by
>> waiting longer before activating it.
>
> Then you misunderstand, since I am advocating waking it when needed.

Well, I guess that depends on when it's actually needed.  You haven't
presented any evidence one way or the other.

I mean, let's suppose that a sudden spike of activity hits a
previously-idle system.  If we wait until all of shared_buffers is
dirty before waking up the background writer, it seems possible that
the background writer is going to have a hard time catching up.  If we
wake it immediately, we don't have that problem.

Also, in general, I think that it's not a good idea to let dirty data
sit in shared_buffers forever.  I'm unhappy about the change this
release cycle to skip checkpoints if we've written less than a full
WAL segment, and this seems like another step in that direction.  It's
exposing us to needless risk of data loss.  In 9.1, if you process a
transaction and, an hour later, the disk where pg_xlog is written
melts into a heap of molten slag, your transaction will be there, even
if you end up having to run pg_resetxlog.  In 9.2, it may well be that
xlog contains the only record of that transaction, and you're hosed.
The more work we do to postpone writing the data until the absolutely
last possible moment, the more likely it is that it won't be on disk
when we need it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Reducing bgwriter wakeups

От
Jeff Janes
Дата:
On Sun, Feb 19, 2012 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Also, in general, I think that it's not a good idea to let dirty data
> sit in shared_buffers forever.  I'm unhappy about the change this
> release cycle to skip checkpoints if we've written less than a full
> WAL segment, and this seems like another step in that direction.  It's
> exposing us to needless risk of data loss.  In 9.1, if you process a
> transaction and, an hour later, the disk where pg_xlog is written
> melts into a heap of molten slag, your transaction will be there, even
> if you end up having to run pg_resetxlog.

Would the log really have been archived in 9.1?  I don't think
checkpoint_timeout caused a log switch, just a checkpoint which could
happily be in the same file as the previous checkpoint.

> In 9.2, it may well be that
> xlog contains the only record of that transaction, and you're hosed.
> The more work we do to postpone writing the data until the absolutely
> last possible moment, the more likely it is that it won't be on disk
> when we need it.

Isn't that what archive_timeut is for?

Should archive_timeout default to something like 5 min, rather than 0?

Cheers,

Jeff


Re: Reducing bgwriter wakeups

От
Robert Haas
Дата:
On Sun, Feb 19, 2012 at 5:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Would the log really have been archived in 9.1?  I don't think
> checkpoint_timeout caused a log switch, just a checkpoint which could
> happily be in the same file as the previous checkpoint.

The log segment doesn't need to get archived - it's sufficient that
the dirty buffers get written to disk.

>> In 9.2, it may well be that
>> xlog contains the only record of that transaction, and you're hosed.
>> The more work we do to postpone writing the data until the absolutely
>> last possible moment, the more likely it is that it won't be on disk
>> when we need it.
>
> Isn't that what archive_timeut is for?
>
> Should archive_timeout default to something like 5 min, rather than 0?

I dunno.  I think people are doing replication are probably mostly
using streaming replication these days, in which case archive_timeout
won't matter one way or the other.  But if you're not doing
replication, your only hope of recovering from a trashed pg_xlog is
that PostgreSQL wrote the buffers and (in the case of an OS crash) the
OS wrote them to disk.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Reducing bgwriter wakeups

От
Heikki Linnakangas
Дата:
On 20.02.2012 00:18, Robert Haas wrote:
> On Sun, Feb 19, 2012 at 4:11 PM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>> On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>>> On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>>>> Recent changes for power reduction mean that we now issue a wakeup
>>>> call to the bgwriter every time we set a hint bit.
>>>>
>>>> However cheap that is, its still overkill.
>>>>
>>>> My proposal is that we wakeup the bgwriter whenever a backend is
>>>> forced to write a dirty buffer, a job the bgwriter should have been
>>>> doing.
>>>>
>>>> This significantly reduces the number of wakeup calls and allows the
>>>> bgwriter to stay asleep even when very light traffic happens, which is
>>>> good because the bgwriter is often the last process to sleep.

That seems like swinging the pendulum too much in the other direction, 
as others have noted. A simple thing you could do, however, is to only 
wake up bgwriter every 10 dirtied pages in the backend or something like 
that. That would reduce the wakeups by a factor of 10. Would that be 
useful? It's not actually clear to me what the problem you're trying to 
solve is.

>>>> Seems useful to have an explicit discussion on this point, especially
>>>> in view of recent performance results.
>>>
>>> I don't see what this has to do with recent performance results, so
>>> please elaborate.  Off-hand, I don't see any point in getting cheap.
>>> It seems far more important to me that the background writer become
>>> active when needed than that we save some trivial amount of power by
>>> waiting longer before activating it.
>>
>> Then you misunderstand, since I am advocating waking it when needed.
>
> Well, I guess that depends on when it's actually needed.  You haven't
> presented any evidence one way or the other.
>
> I mean, let's suppose that a sudden spike of activity hits a
> previously-idle system.  If we wait until all of shared_buffers is
> dirty before waking up the background writer, it seems possible that
> the background writer is going to have a hard time catching up.  If we
> wake it immediately, we don't have that problem.

Well, as long as the OS has some clean buffers, as it presumably does if 
the system has been idle for a while, bgwriter will catch up very 
quickly by simply dumping a large number of dirty pages to the OS. Also, 
as the code stands, bgwriter still wakes up every 10 seconds even when 
no-one signals it, which makes this a much less likely to happen.

Nevertheless, I also feel that it would be better for bgwriter to be a 
bit more proactive than that.

> Also, in general, I think that it's not a good idea to let dirty data
> sit in shared_buffers forever.  I'm unhappy about the change this
> release cycle to skip checkpoints if we've written less than a full
> WAL segment, and this seems like another step in that direction.  It's
> exposing us to needless risk of data loss.  In 9.1, if you process a
> transaction and, an hour later, the disk where pg_xlog is written
> melts into a heap of molten slag, your transaction will be there, even
> if you end up having to run pg_resetxlog.  In 9.2, it may well be that
> xlog contains the only record of that transaction, and you're hosed.
> The more work we do to postpone writing the data until the absolutely
> last possible moment, the more likely it is that it won't be on disk
> when we need it.

True. (but as noted above, bgwriter still wakes up every 10 seconds so 
this isn't really an issue at the moment)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com