Re: Return codes for archive and restore commands

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Return codes for archive and restore commands
Дата
Msg-id 20181129023957.GU3415@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Return codes for archive and restore commands  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Return codes for archive and restore commands  (Michael Paquier <michael@paquier.xyz>)
Re: Return codes for archive and restore commands  (Oleg Bartunov <obartunov@postgrespro.ru>)
Список pgsql-docs
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Wed, Nov 28, 2018 at 11:00:31AM +0000, PG Doc comments form wrote:
> > For the archive command:
> > <=128 There are not errors in the PostgreSQL log (messages with severity
> > equal or higher than ERROR). Firstly 3 messages of type LOG about fault,
> > then WARNING about this and pause for 1 minute, then repeated.
> > >=129 FATAL error in the PostgeSQL log. The message about stoping an archive
> > process, but not the database. Repeated after roughly 16 seconds.
>
> This code is around for some time, and comes from this commit:
> commit: 3ad0728c817bf8abd2c76bd11d856967509b307c
> author: Tom Lane <tgl@sss.pgh.pa.us>
> date: Tue, 21 Nov 2006 20:59:53 +0000
> committer: Tom Lane <tgl@sss.pgh.pa.us>
> date: Tue, 21 Nov 2006 20:59:53 +0000
> On systems that have setsid(2) (which should be just about everything except
> Windows), arrange for each postmaster child process to be its own process
> group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
> process group not only the direct child process.  This provides saner behavior
> for archive and recovery scripts; in particular, it's possible to shut down a
> warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
> of SIGQUIT to the startup subprocess will result in killing the waiting
> recovery_command.  Also, this makes Query Cancel and statement_timeout apply
> to scripts being run from backends via system().  (There is no support in the
> core backend for that, but it's widely done using untrusted PLs.)  Per gripe
> from Stephen Harris and subsequent discussion.
>
> The relevant part if pgarch_archiveXlog() in pgarch.c, and this part
> is most relevant:
> * Per the Single Unix Spec, shells report exit status > 128 when a
> * called command died on a signal.
>
> > In this case PostgreSQL tries confirm rules for return codes of a unix
> > shell. A unix shell return 126 in the case of "command not executable", 127
> > in the case "command not found", 128+# of signal in the case if application
> > interrupted by uncatched signal.
>
> If you were to rewrite those paragraphs or make them more precise, how
> would you actually shape your suggestions?  I personally quite like the
> current formulations, but I am rather used to it to be honest.

This is another example, at least imv, of why we really need to move
away from archive_command as an interface for doing WAL archiving.

Having discussed this quite a bit lately with David Steele and Magnus,
it's pretty clear that we need to completely rip out how this works
today and rewrite it based around an extension model where a background
worker can start up and essentially take the place of the archiver
process, with flexibility to jump forward through the WAL stream,
communicate clearly with other processes, handle failure to do so
gracefully based on the specific cases, etc.

We could then possibly write an extension to be included that mimics
what archive_command does today, but imv we should immediately consider
it deprecated and encourage people to move off of it.

Thanks!

Stephen

Вложения

В списке pgsql-docs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Return codes for archive and restore commands
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Return codes for archive and restore commands