Re: pg_dump, pg_dumpall and data durability

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: pg_dump, pg_dumpall and data durability
Дата
Msg-id CAB7nPqRT8YWXxb-MNG6+fxVYhcjNKjMK_jVqADJvcuvVaSNs=Q@mail.gmail.com
обсуждение исходный текст
Ответ на pg_dump, pg_dumpall and data durability  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On Wed, Nov 9, 2016 at 8:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> First question: Do we even want this?  Generally, when a program
>> writes to a file, we rely on the operating system to decide when that
>> data should be written back to disk.  We have to override that
>> distinction for things internal to PostgreSQL because we need certain
>> bits of data to reach the disk in a certain order, but it's unclear to
>> me how far outside the core database system we want to extend that.
>> Are we going to have psql fsync() data it writes to a file when \o is
>> used, for example?  That would seem to me to be beyond insane, because
>> we have no idea whether the user actually needs that file to be
>> durable.  It is a better bet that a pg_dump command's output needs
>> durability, of course, but I feel that we shouldn't just go disabling
>> the filesystem cache one program at a time without some guiding
>> principle.
>
> FWIW, I find the premise pretty dubious.  People don't normally expect
> programs to fsync their standard output, and the argument that pg_dump's
> output is always critical data doesn't withstand inspection.  Also,
> I don't understand what pg_dump should do if it fails to fsync.  There
> are too many cases where that would fail (eg, output piped to a program)
> for it to be treated as an error condition.  But if it isn't reported as
> an error, then how much durability guarantee are we really adding?

If the output is piped to a program, there is no way to guarantee that
the data will be flushed and the user is responsible for that. We
cannot control all the use cases. The same applies for example with
pg_basebackup where the data is sent to stdout. IMO, the limit set is
that tools aimed at taking physical and logical backups should do a
better effort in the cases where they can do it. That's a cheap
insurance.

Based on this past thread, it seems to me that Magnus, Andres and Jim
Nasby are of the opinion that making things is useful:
https://www.postgresql.org/message-id/20160327233033.GD20662@awork2.anarazel.de
And so do I.

> I think this might be better addressed by adding something to backup.sgml
> pointing out that you'd better fsync or sync your backups before assuming
> that they can't be lost.

Perhaps. That would be better than nothing at least, but that won't
help for cases where we can help a bit.
-- 
Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: WAL logging problem in 9.4.3?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_dump, pg_dumpall and data durability