Re: unable to fail over to warm standby server

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: unable to fail over to warm standby server
Дата
Msg-id 4B615DA2.3040306@enterprisedb.com
обсуждение исходный текст
Ответ на unable to fail over to warm standby server  (Mason Hale <mason@onespot.com>)
Ответы Re: unable to fail over to warm standby server  (Mason Hale <mason@onespot.com>)
Список pgsql-bugs
Mason Hale wrote:
>  ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not
> permittedtrigger file found
>
>  ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not permitted
>
> This file was not looked until after the attempt to recover was
> aborted. Clearly the permissions on /tmp/pgsql.trigger.5432 were a
> problem,
> but we don't see how that would explain the error messages, which seem
> to indicate that data on the standby server was corrupted.

Yes, that permission problem seems to be the root cause of the troubles.
If pg_standby fails to remove the trigger file, it exit()s with whatever
return code the unlink() call returned:

>         /*
>          * If trigger file found, we *must* delete it. Here's why: When
>          * recovery completes, we will be asked again for the same file from
>          * the archive using pg_standby so must remove trigger file so we can
>          * reload file again and come up correctly.
>          */
>         rc = unlink(triggerPath);
>         if (rc != 0)
>         {
>             fprintf(stderr, "\n ERROR: could not remove \"%s\": %s", triggerPath, strerror(errno));
>             fflush(stderr);
>             exit(rc);
>         }

unlink() returns -1 on error, so pg_standby calls exit(-1). -1 is out of
the range of normal return codes, and apparently gets mangled into the
mysterious 65280 code you saw in the logs. The server treats that as a
fatal error, and dies.

That seems like a bug in pg_standby, but I'm not sure what it should do
if the unlink() fails. It could exit with some other exit code, so that
the server wouldn't die, but the lingering trigger file could cause
problems, as the comment explains. If it should indeed cause FATAL, it
should do so in a more robust way than the exit(rc) call above.

BTW, this changed in PostgreSQL 8.4; pg_standby no longer tries to
delete the trigger file (so that problematic block of code is gone), but
there's a new restore_end_command option in recovery.conf instead, where
you're supposed to put 'rm <triggerfile>'. I think in that
configuration, the standby would've started up, even though removal of
the trigger file would've still failed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: BUG #5298: emedded SQL in C to get the record type from plpgsql
Следующее
От: Giorgio Valoti
Дата:
Сообщение: Status of submitted bugs