Re: Debugging buildfarm pg_upgrade check failures

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Debugging buildfarm pg_upgrade check failures
Дата
Msg-id 55B3F7C4.8000205@dunslane.net
обсуждение исходный текст
Ответ на Debugging buildfarm pg_upgrade check failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 07/25/2015 10:59 AM, Tom Lane wrote:
> Now that we've restored proper logging of "make check", I looked into
> today's failure report from axolotl:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-07-24%2020%3A29%3A18
>
> What evidently happened there is that "pg_ctl start" gave up waiting for
> the postmaster to start too soon.  The postmaster log appears to contain
>
> LOG:  database system was shut down at 2015-07-24 16:45:40 EDT
> FATAL:  the database system is starting up
> LOG:  MultiXact member wraparound protections are now enabled
> LOG:  database system is ready to accept connections
>
> which indicates that it did successfully come up, but not till after one
> "PQping" probe from pg_ctl, which was rejected with "still starting up".
> Meanwhile we've got this log output from pg_ctl:
>
> waiting for server to start........ stopped waiting
> pg_ctl: could not start server
> Examine the log output.
>
> Counting the dots indicates that pg_ctl gave up after precisely 5 seconds.
> Now, looking at the logic in pg_ctl's test_postmaster_connection(), the
> only explanation that seems to fit the observed output is that the stat()
> on the postmaster pidfile (at line 650 in HEAD) failed.  It's not clear
> why though, since the postmaster was clearly still alive at this point,
> and we must have been able to read the pidfile earlier to construct a
> connection string, else there would have been no PQping attempt.
>
> Maybe the stat failed for some unexpected resource-exhaustion kind of
> reason?
>
> It seems plausible to me that we should change pg_ctl to only consider
> stat() failure to be a reason to give up waiting if errno is ENOENT,
> not anything else.  At a minimum, I'd like to modify it to print the
> errno if it's anything else, so that we can confirm or deny this theory
> next time we see this buildfarm failure.
>
> Comments?
>
>             

Certainly let's look at the errno.

cheers

andrdew



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: pg_dump -Fd and compression level
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: multivariate statistics / patch v7