Обсуждение: asynchronous commit risk window is overly optimistic

Поиск
Список
Период
Сортировка

asynchronous commit risk window is overly optimistic

От
Jeff Janes
Дата:
https://www.postgresql.org/docs/current/wal-async-commit.html:

"If the database crashes during the risk window between an asynchronous commit and the writing of the transaction's WAL records, then changes made during that transaction will be lost. The duration of the risk window is limited because a background process (the “WAL writer”) flushes unwritten WAL records to disk every wal_writer_delay milliseconds. The actual maximum duration of the risk window is three times wal_writer_delay because the WAL writer is designed to favor writing whole pages at a time during busy periods."

I think the phrase "actual maximum duration" here is far too reassuring. There is no guarantee that the kernel will wake WAL writer three times in a row at the times it requested, or even any other smalish multiple of that time. Even if the wal_writer does repeatedly wake on schedule and requests a fsync, that fsync itself can take a very large multiple of wal_writer_delay milliseconds before it takes effect.

If your server experiences a sudden power failure during normal operations with uncongested IO, then it is very likely that anything asynchronously committed more than three wal_writer_delay (plus two disk rotations) ago has made it to disk.  But if it crashes for some other reason than a sudden power failure, it is less likely to be on disk.  A stricken server can go wobbly for a long time before finally falling over.

Maybe it should be replaced with something less confident, like "Under normal conditions, the flush will be initiated within three times wal_writer_delay because the WAL writer is designed to favor writing whole pages at a time during busy periods."

Although the whole "because" clause seems to be more inside baseball than is warranted here.

Cheers,

Jeff

Re: asynchronous commit risk window is overly optimistic

От
Bruce Momjian
Дата:
On Wed, Mar 20, 2019 at 02:50:21PM -0400, Jeff Janes wrote:
> https://www.postgresql.org/docs/current/wal-async-commit.html:
> 
> "If the database crashes during the risk window between an asynchronous commit
> and the writing of the transaction's WAL records, then changes made during that
> transaction will be lost. The duration of the risk window is limited because a
> background process (the “WAL writer”) flushes unwritten WAL records to disk
> every wal_writer_delay milliseconds. The actual maximum duration of the risk
> window is three times wal_writer_delay because the WAL writer is designed to
> favor writing whole pages at a time during busy periods."
> 
> I think the phrase "actual maximum duration" here is far too reassuring. There
> is no guarantee that the kernel will wake WAL writer three times in a row at
> the times it requested, or even any other smalish multiple of that time. Even
> if the wal_writer does repeatedly wake on schedule and requests a fsync, that
> fsync itself can take a very large multiple of wal_writer_delay milliseconds
> before it takes effect.
> 
> If your server experiences a sudden power failure during normal operations with
> uncongested IO, then it is very likely that anything asynchronously committed
> more than three wal_writer_delay (plus two disk rotations) ago has made it to
> disk.  But if it crashes for some other reason than a sudden power failure, it
> is less likely to be on disk.  A stricken server can go wobbly for a long time
> before finally falling over.
> 
> Maybe it should be replaced with something less confident, like "Under normal
> conditions, the flush will be initiated within three times wal_writer_delay
> because the WAL writer is designed to favor writing whole pages at a time
> during busy periods."
> 
> Although the whole "because" clause seems to be more inside baseball than is
> warranted here.

I think we can go with:

    "Under normal conditions, the flush will be initiated within
    roughly three times wal_writer_delay".

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +