Обсуждение: BUG #9161: wal_writer_delay is limited to 10s

Поиск
Список
Период
Сортировка

BUG #9161: wal_writer_delay is limited to 10s

От
linuxhippy@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      9161
Logged by:          Clemens Eisserer
Email address:      linuxhippy@gmail.com
PostgreSQL version: 9.2.6
Operating system:   Linux
Description:

It seems wal_writer_delay is limited to 10s without any technical reason.

I am using postgresql in an embedded system and writing every 10s
burns flash at an amazing rate (a 32GB SDHC card survived 4 days with
only a few GB written) - I've now switched from ext4 to nilfs2 which is able
to distribute writes more evenly - but I am still not confident this setup
would last longer than a few weeks/months.

A wal_writer_delay of arround ~1h would be a lot more efficient and solve
this issue.

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Tom Lane
Дата:
linuxhippy@gmail.com writes:
> It seems wal_writer_delay is limited to 10s without any technical reason.

The "technical reason" is that values much larger than that would be a
performance and reliability disaster for typical installations.

> I am using postgresql in an embedded system and writing every 10s
> burns flash at an amazing rate (a 32GB SDHC card survived 4 days with
> only a few GB written) - I've now switched from ext4 to nilfs2 which is able
> to distribute writes more evenly - but I am still not confident this setup
> would last longer than a few weeks/months.

> A wal_writer_delay of arround ~1h would be a lot more efficient and solve
> this issue.

It's not apparent to me what usage pattern that would actually help with.
If the system is idle then it doesn't matter whether the WAL writer wakes
up or not.  But if you're generating WAL, it's going to get written anyway
whenever a transaction commits.  You could forestall that with fsync=off,
perhaps, but if you do that then the WAL writer won't fsync either, so
it shouldn't matter how often it wakes up.  Checkpoints will force WAL
output more often than once per hour by default, too.  So I'm wondering
exactly what combination of other settings you envision using this with,
and what's the workload of the database server.

While it could certainly be argued that the limit of 10s is a bit too
tight, allowing values as large as 1h would be a large-caliber foot gun
for most people.  So I'm very hesitant to raise it that far without a
more convincing argument that useful behavior can be had there.

The bigger picture is that if you allow WAL to not reach disk for 1h,
that's up to 1h worth of data that you lose irretrievably in a crash.
So even if this adjustment allowed you to get that behavior, it's not
very attractive behavior.  If you actually are OK with such little data
security, maybe you should consider other approaches.  Perhaps you could
keep the whole database on a RAM disk (tmpfs) and rsync it to flash every
hour, or some variant of that?

Also, if I understand your problem correctly, it's not so much that
you'd like the WAL writer to not write often as that you'd like it
to not write partial blocks.  We might be willing to accept a patch
that added a setting and some logic to control things that way.
But I'm pretty dubious that just changing the upper limit of
wal_writer_delay is going to have useful results, even in your scenario.

            regards, tom lane

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Clemens Eisserer
Дата:
Hi Tom,

Thanks for taking a look at my bug report.

> But if you're generating WAL, it's going to get written anyway
> whenever a transaction commits.

As far as I understood, this is only the case with
synchronous_commit=on, which is the reason why I turned synchrous
commit off.

> You could forestall that with fsync=off,
> perhaps, but if you do that then the WAL writer won't fsync either, so
> it shouldn't matter how often it wakes up.  Checkpoints will force WAL
> output more often than once per hour by default, too.  So I'm wondering
> exactly what combination of other settings you envision using this with,
> and what's the workload of the database server.

Because the system should be crash-resistent, my goal would be to have
postgresql issue fsyncs only every few minutes instead of every 10s,
by running the wal writer only every now and then (I know there is
only very little data in the WAL anyway).

For now I have a single insert transaction every 10s with
synchronous_commit=off, however because the wal writer weaks up every
10s data are immediatly written to the SD card, dramatically reducing
its lifespan.

Lost data isn't a real concern, however a worst-case scenario would be
when after a crash / power loss the database would be corrupted and
would refuse to initialize.

Regards, Clemens

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Andres Freund
Дата:
On 2014-02-14 23:03:44 +0100, Clemens Eisserer wrote:
> Hi Tom,
>
> Thanks for taking a look at my bug report.
>
> > But if you're generating WAL, it's going to get written anyway
> > whenever a transaction commits.
>
> As far as I understood, this is only the case with
> synchronous_commit=on, which is the reason why I turned synchrous
> commit off.

The WAL writer will be woken up anyway if a transaction commits
(c.f. XLogSetAsyncXactLSN()).

> > You could forestall that with fsync=off,
> > perhaps, but if you do that then the WAL writer won't fsync either, so
> > it shouldn't matter how often it wakes up.  Checkpoints will force WAL
> > output more often than once per hour by default, too.  So I'm wondering
> > exactly what combination of other settings you envision using this with,
> > and what's the workload of the database server.
>
> Because the system should be crash-resistent, my goal would be to have
> postgresql issue fsyncs only every few minutes instead of every 10s,
> by running the wal writer only every now and then (I know there is
> only very little data in the WAL anyway).
>
> For now I have a single insert transaction every 10s with
> synchronous_commit=off, however because the wal writer weaks up every
> 10s data are immediatly written to the SD card, dramatically reducing
> its lifespan.

The WAL writer shouldn't write anything if there's nothing to
write. Maybe you have checkpoint_timeout set to something low and it's
performing checkpoints regularly?

You might also rethink the quality of the media you're using...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Claudio Freire
Дата:
On Fri, Feb 14, 2014 at 7:50 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > You could forestall that with fsync=off,
>> > perhaps, but if you do that then the WAL writer won't fsync either, so
>> > it shouldn't matter how often it wakes up.  Checkpoints will force WAL
>> > output more often than once per hour by default, too.  So I'm wondering
>> > exactly what combination of other settings you envision using this with,
>> > and what's the workload of the database server.
>>
>> Because the system should be crash-resistent, my goal would be to have
>> postgresql issue fsyncs only every few minutes instead of every 10s,
>> by running the wal writer only every now and then (I know there is
>> only very little data in the WAL anyway).
>>
>> For now I have a single insert transaction every 10s with
>> synchronous_commit=off, however because the wal writer weaks up every
>> 10s data are immediatly written to the SD card, dramatically reducing
>> its lifespan.
>
> The WAL writer shouldn't write anything if there's nothing to
> write. Maybe you have checkpoint_timeout set to something low and it's
> performing checkpoints regularly?
>
> You might also rethink the quality of the media you're using...


Indeed, from my experience, SD cards are terribly unreliable because
they have terrible write leveling (if at all). You're better off using
a USB stick if you have a USB port. It may cost you a bit more power,
but it will both last a lot more, and perform better.

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Clemens Eisserer
Дата:
Hi,

> The WAL writer shouldn't write anything if there's nothing to
> write. Maybe you have checkpoint_timeout set to something low and it's
> performing checkpoints regularly?

However, there is something to write - I perform a single insert
operation every 10s.
So what I would like to achieve is to have those inserts every 10s,
but a sync only every few minutes or so - to let the data written by
postgresql stay in the block cache.

> You might also rethink the quality of the media you're using...
> Indeed, from my experience, SD cards are terribly unreliable because
> they have terrible write leveling (if at all).

I already switched to a patched nilfs2 version, which makes even SD
cards happy with have horrible wear leveling - and I switched to
better cards.

However this is just my pet project, and all I ask is why a parameter
is artifically clipped ... it's not like I request some magical new
feature.

Regards, Clemens

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Andres Freund
Дата:
On 2014-02-15 09:59:25 +0100, Clemens Eisserer wrote:
> However, there is something to write - I perform a single insert
> operation every 10s.
>
> However this is just my pet project, and all I ask is why a parameter
> is artifically clipped ... it's not like I request some magical new
> feature.

Again. Changing that parameter to a higher value would change *nothing*,
because a commit *does* wake up the wal writer *independently* of the
parameter.
So it is a new feature you're requesting.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Jeff Janes
Дата:
On Sun, Feb 9, 2014 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> linuxhippy@gmail.com writes:
> > It seems wal_writer_delay is limited to 10s without any technical reason.
>
> The "technical reason" is that values much larger than that would be a
> performance and reliability disaster for typical installations.
>

How so?  I think most installations run with synchronous_commit on.

We seem to be arguing both that making it larger would do absolutely
nothing, and that making it larger would do something very bad.


While it could certainly be argued that the limit of 10s is a bit too
> tight, allowing values as large as 1h would be a large-caliber foot gun
> for most people.  So I'm very hesitant to raise it that far without a
> more convincing argument that useful behavior can be had there.
>

We let them turn fsync off.  How much bigger of a foot gun could we
possibly hand them?

If we make people turn fsync off, rather than letting them
use wal_writer_delay to do the very thing that it is intended to do, I
don't see that as doing them any favors.


>
> The bigger picture is that if you allow WAL to not reach disk for 1h,
> that's up to 1h worth of data that you lose irretrievably in a crash.
> So even if this adjustment allowed you to get that behavior, it's not
> very attractive behavior.  If you actually are OK with such little data
> security, maybe you should consider other approaches.  Perhaps you could
> keep the whole database on a RAM disk (tmpfs) and rsync it to flash every
> hour, or some variant of that?
>

And hope that the rsync didn't leave it in an inconsistent state because it
fired up just when a change was starting to get written.  Ick.  I think
that having knobs that do what they say and say what they do seems much
safer than forcing people to jump through dangerous hoops.

Cheers,

Jeff

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> We seem to be arguing both that making it larger would do absolutely
> nothing, and that making it larger would do something very bad.

The question to me is not so much whether it would do anything bad,
as whether it would do anything good.  I think it's entirely unproven
that merely raising this limit would solve anything for Clemens'
use-case; it seems likely that additional behavioral changes would
be needed, and we've not seen what's under that rock.

What it definitely would do, however, is encourage people to raise
the setting, perhaps to larger values than are good for them.  The
system is currently tuned on the assumption that wal_writer_delay
is a fractional second; we have no data on how it performs with
significantly larger settings.

Also, a closer look shows that the wal writer doesn't write anything
merely because it's woken up.  Unless I'm misreading XLogBackgroundFlush,
it will write only (1) any completed full pages of WAL, or (2) data up
to an async commit.  As for (1), writing out completed pages (which
are not going to get overwritten) doesn't seem like it should be a problem
for a flash drive.  If it is a problem, maybe recompiling with a bigger
XLOG page size is indicated.  As for (2), while I grant Clemens' claim
that he's willing to tolerate loss of async commits, it's a bit hard
to credit that he's willing to lose three hours worth of async commits.
Which is what he'd be risking with wal_writer_delay set to 1h.

There's also the existing "hibernate" mode, which has unclear interactions
with a desire to put off writes for such long intervals.

Basically, this proposal needs a lot more evidence than has been
provided, IMO.

            regards, tom lane

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Jeff Janes
Дата:
On Fri, Feb 14, 2014 at 2:50 PM, Andres Freund <andres@2ndquadrant.com>wrote:

> On 2014-02-14 23:03:44 +0100, Clemens Eisserer wrote:
> > Hi Tom,
> >
> > Thanks for taking a look at my bug report.
> >
> > > But if you're generating WAL, it's going to get written anyway
> > > whenever a transaction commits.
> >
> > As far as I understood, this is only the case with
> > synchronous_commit=on, which is the reason why I turned synchrous
> > commit off.
>
> The WAL writer will be woken up anyway if a transaction commits
> (c.f. XLogSetAsyncXactLSN()).
>


That only happens if an 8 KB boundary has been crossed.  Depending on how
big the records are that he is inserting, you can fit a lot of them in 8
KB.   If the records are small, or the table us unlogged so only commit
messages get logged, then you could go many multiples of 10s without
triggering a wake up due to crossing an 8 KB boundary.

So if we did change the upper limit (or if Clemens adds 2 zeros to guc.c
and compiles his own) he should be able to get quite a bit of mileage out
of it.

Why do we awaken it for every 8KB boundary? I thought there was code to
wake it when wal_buffers was 1/3 full, but I don't see that code anymore.

Cheers,

Jeff

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Andres Freund
Дата:
On 2014-02-17 13:48:34 -0800, Jeff Janes wrote:
> On Fri, Feb 14, 2014 at 2:50 PM, Andres Freund <andres@2ndquadrant.com>wrote:
>
> > On 2014-02-14 23:03:44 +0100, Clemens Eisserer wrote:
> > > Hi Tom,
> > >
> > > Thanks for taking a look at my bug report.
> > >
> > > > But if you're generating WAL, it's going to get written anyway
> > > > whenever a transaction commits.
> > >
> > > As far as I understood, this is only the case with
> > > synchronous_commit=on, which is the reason why I turned synchrous
> > > commit off.
> >
> > The WAL writer will be woken up anyway if a transaction commits
> > (c.f. XLogSetAsyncXactLSN()).
> >

> That only happens if an 8 KB boundary has been crossed.  Depending on how
> big the records are that he is inserting, you can fit a lot of them in 8
> KB.   If the records are small, or the table us unlogged so only commit
> messages get logged, then you could go many multiples of 10s without
> triggering a wake up due to crossing an 8 KB boundary.

Unless I miss something we wake it up if *either* a 8kb boundary has
been crossed *or* if the async lsn hasn't been flushed yet.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Jeff Janes
Дата:
On Mon, Feb 17, 2014 at 1:53 PM, Andres Freund <andres@2ndquadrant.com>wrote:

> On 2014-02-17 13:48:34 -0800, Jeff Janes wrote:
> > On Fri, Feb 14, 2014 at 2:50 PM, Andres Freund <andres@2ndquadrant.com
> >wrote:
> >
> > > On 2014-02-14 23:03:44 +0100, Clemens Eisserer wrote:
> > > > Hi Tom,
> > > >
> > > > Thanks for taking a look at my bug report.
> > > >
> > > > > But if you're generating WAL, it's going to get written anyway
> > > > > whenever a transaction commits.
> > > >
> > > > As far as I understood, this is only the case with
> > > > synchronous_commit=on, which is the reason why I turned synchrous
> > > > commit off.
> > >
> > > The WAL writer will be woken up anyway if a transaction commits
> > > (c.f. XLogSetAsyncXactLSN()).
> > >
>
> > That only happens if an 8 KB boundary has been crossed.  Depending on how
> > big the records are that he is inserting, you can fit a lot of them in 8
> > KB.   If the records are small, or the table us unlogged so only commit
> > messages get logged, then you could go many multiples of 10s without
> > triggering a wake up due to crossing an 8 KB boundary.
>
> Unless I miss something we wake it up if *either* a 8kb boundary has
> been crossed *or* if the async lsn hasn't been flushed yet.
>

I think it is either a 8kb boundary crossed, or the wal_writer is in "deep"
sleep (i.e. 25 times the ordinary sleep).

I don't see anything there which refers to async lsn in deciding whether to
wake.

Anyway, stracing the wal_writer shows that with a slow steady drip of small
transactions, it is definitely the wal_writer_delay which is in control,
not the latch wake up.

Also, referring to Tom's email, it does issues an fdatasync every time it
wakes up, because of this:

/* if we have already flushed that far, consider async commit records */

For this purpose, async commit could mean either commits down with
synchronous_commit=off, or commits of transactions which touched only
unlogged tables.  So it writes the same 8kb block as last time (but with
new data at the logical end of the block) and flushes it.

 Cheers,

Jeff

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Clemens Eisserer
Дата:
Hi Jeff,

>> > That only happens if an 8 KB boundary has been crossed.  Depending on
>> > how
>> > big the records are that he is inserting, you can fit a lot of them in 8
>> > KB.   If the records are small, or the table us unlogged so only commit
>> > messages get logged, then you could go many multiples of 10s without
>> > triggering a wake up due to crossing an 8 KB boundary.

Thanks a lot for the in-depth information.
Is the 8KB boundary configureable at compile time?

> /* if we have already flushed that far, consider async commit records */
>
> For this purpose, async commit could mean either commits down with
> synchronous_commit=off, or commits of transactions which touched only
> unlogged tables.  So it writes the same 8kb block as last time (but with new
> data at the logical end of the block) and flushes it.

Ah, that explains why my experiments using unlogged tables also
weren't sucessful in avoiding the frequent writes.

Regards, Clemens

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Tom Lane
Дата:
Clemens Eisserer <linuxhippy@gmail.com> writes:
> Thanks a lot for the in-depth information.
> Is the 8KB boundary configureable at compile time?

There is a configure option for XLOG_BLCKSZ, but it won't go beyond 64K
which is probably not enough to fix your problem.  (I'm not sure offhand
if there are technical reasons for 64K in particular as the limit, but
in any case I suspect making it really large would create performance
issues of its own.)

I think what you'd really want is to tweak the walwriter logic so that
it avoids writing individual blocks and instead waits for wal_buffers
to be, say, half full before writing.

            regards, tom lane

Re: BUG #9161: wal_writer_delay is limited to 10s

От
Andres Freund
Дата:
On 2014-02-17 14:24:09 -0800, Jeff Janes wrote:
> On Mon, Feb 17, 2014 at 1:53 PM, Andres Freund <andres@2ndquadrant.com>wrote:
>
> > On 2014-02-17 13:48:34 -0800, Jeff Janes wrote:
> > > That only happens if an 8 KB boundary has been crossed.  Depending on how
> > > big the records are that he is inserting, you can fit a lot of them in 8
> > > KB.   If the records are small, or the table us unlogged so only commit
> > > messages get logged, then you could go many multiples of 10s without
> > > triggering a wake up due to crossing an 8 KB boundary.
> >
> > Unless I miss something we wake it up if *either* a 8kb boundary has
> > been crossed *or* if the async lsn hasn't been flushed yet.
> >
>
> I think it is either a 8kb boundary crossed, or the wal_writer is in "deep"
> sleep (i.e. 25 times the ordinary sleep).

Ick. Yes, I forgot that sleeping was defined that way (which is a bad
idea for the majority of cases imo).

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services