Re: Duplicate history file?

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Duplicate history file?
Дата
Msg-id CAOuzzgqVbZKagRLsg+rciNRVDmoryDL1=Sm+LoapYwNyuFy0-w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Duplicate history file?  (Julien Rouhaud <rjuju123@gmail.com>)
Список pgsql-hackers
Greetings,

On Tue, Jun 15, 2021 at 23:21 Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Jun 15, 2021 at 11:00:57PM -0400, Stephen Frost wrote:
>
> As I suggested previously- this is similar to the hooks that we provide. We
> don’t extensively document them because if you’re writing an extension
> which uses a hook, you’re going to be (or should be..) reading the code too.

I disagree, hooks allows developers to implement some new or additional
behavior which by definition can't be documented.  And it's also relying on the
C api which by definition allows you to do anything with the server.  There are
also of course some requirements but they're quite obvious (like a planner_hook
should return a valid plan and such).

The archive command is technically invoked using the shell, but the interpretation of the exit code, for example, is only discussed in the C code, but it’s far from the only consideration that someone developing an archive command needs to understand.

On the other hand the archive_command is there to do only one clear thing:
safely backup a WAL file.  And I don't think that what makes that backup "safe"
is open to discussion.  Sure, you can chose to ignore some of it if you think
you can afford to do it, but it doesn't change the fact that it's still a
requirement which should be documented.

The notion that an archive command can be distanced from backups is really not reasonable in my opinion. 

> Consider that, really, an archive command should refuse to allow archiving
> of WAL on a timeline which doesn’t have a corresponding history file in the
> archive for that timeline (excluding timeline 1).

Yes, that's a clear requirement that should be documented.

Is it a clear requirement that pgbackrest and every other organization that has developed an archive command has missed? Are you able to point to a tool which has such a check today?

This is not a trivial problem any more than PG’s use of fsync is trivial and we clearly should have understood how Linux and fsync work decades ago and made sure to always crash on any fsync failure and not believe that a later fsync would return a failure if an earlier one did and the problem didn’t resolve itself properly.

> Also, a backup tool
> should compare the result of pg_start_backup to what’s in the control file,
> using a fresh read, after start backup returns to make sure that the
> storage is sane and not, say, cache’ing pages independently (such as might
> happen with a separate NFS mount..).  Oh, and if a replica is involved, a
> check should be done to see if the replica has changed timelines and an
> appropriate message thrown if that happens complaining that the backup was
> aborted due to the promotion of the replica…

I agree, but unless I'm missing something it's unrelated to what an
archive_command should be in charge of?

I’m certainly not moved by this argument as it seems to be willfully missing the point.  Further, if we are going to claim that we must document archive command to such level then surely we need to also document all the requirements for pg_start_backup and pg_stop_backup too, so this strikes me as entirely relevant.

Thanks,

Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: snapshot too old issues, first around wraparound and then more.
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Race condition in recovery?