Re: [RFC] What should we do for reliable WAL archiving?

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: [RFC] What should we do for reliable WAL archiving?
Дата
Msg-id CAMkU=1zr5mcqZYP0eLXRMN1Le66pRNH-hjpzCHZi8racgW9Y-A@mail.gmail.com
обсуждение исходный текст
Ответ на [RFC] What should we do for reliable WAL archiving?  ("MauMau" <maumau307@gmail.com>)
Ответы Re: [RFC] What should we do for reliable WAL archiving?  ("MauMau" <maumau307@gmail.com>)
Re: [RFC] What should we do for reliable WAL archiving?  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Sun, Mar 16, 2014 at 3:23 AM, MauMau <maumau307@gmail.com> wrote:
Hello,

The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on Windows) as an example for archive_command.  However, cp/copy does not sync the copied data to disk.  As a result, the completed WAL segments would be lost in the following sequence:

1. A WAL segment fills up.

2. The archiver process archives the just filled WAL segment using archive_command.  That is, cp/copy reads the WAL segment file from pg_xlog/ and writes to the archive area.  At this point, the WAL file is not persisted to the archive area yet, because cp/copy doesn't sync the writes.

3. The checkpoint processing removes the WAL segment file from pg_xlog/.

Note that it takes two checkpoints for this to happen, at least as currently coded.

Also, if the system crashed badly enough to need media recovery, rather than just automatic crash recovery, some lost transactions are expected.  Although this could silently break your PITR chain, of a crash happened and automatic recover used the copy in pg_xlog (which of course was synced) , while copy in the archive was not synced.


4. The OS crashes.  The filled WAL segment doesn't exist anywhere any more.

Considering the "reliable" image of PostgreSQL and widespread use in enterprise systems, I think something should be done.  Could you give me your opinions on the right direction?  Although the doc certainly escapes by saying "(This is an example, not a recommendation, and might not work on all platforms.)", it seems from pgsql-xxx MLs that many people are following this example.

I use this as an example, kind of, but what I am copying to is a network mount, so any attempts to fsync it there would probably need unavailable hooks into the remote file system.

Do people really just copy the files from one directory of local storage to another directory of local storage?  I don't see the point of that.  But it seems like this is an area where there are hundreds of use cases, and often one doesn't see the point of other people's, making it hard to come up with good examples.

 

* Improve the example in the documentation.
But what command can we use to reliably sync just one file?

* Provide some command, say pg_copy, which copies a file synchronously by using fsync(), and describes in the doc something like "for simple use cases, you can use pg_copy as the standard reliable copy command."

The recommendation is to refuse to overwrite an existing file of the same name, and exit with failure.  Which essentially brings archiving to a halt, because it keeps trying but it will keep failing.  If we make a custom version, one thing it should do is determine if the existing archived file is just a truncated version of the attempting-to-be archived file, and if so overwrite it.  Because if the first archival command fails with a network glitch, it can leave behind a partial file.
 

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:
Следующее
От: Noah Misch
Дата:
Сообщение: Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To: