Debugging buildfarm pg_upgrade check failures

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Debugging buildfarm pg_upgrade check failures
Дата
Msg-id 8658.1437836367@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Debugging buildfarm pg_upgrade check failures  (Andrew Dunstan <andrew@dunslane.net>)
Re: Debugging buildfarm pg_upgrade check failures  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
Now that we've restored proper logging of "make check", I looked into
today's failure report from axolotl:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-07-24%2020%3A29%3A18

What evidently happened there is that "pg_ctl start" gave up waiting for
the postmaster to start too soon.  The postmaster log appears to contain

LOG:  database system was shut down at 2015-07-24 16:45:40 EDT
FATAL:  the database system is starting up
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections

which indicates that it did successfully come up, but not till after one
"PQping" probe from pg_ctl, which was rejected with "still starting up".
Meanwhile we've got this log output from pg_ctl:

waiting for server to start........ stopped waiting
pg_ctl: could not start server
Examine the log output.

Counting the dots indicates that pg_ctl gave up after precisely 5 seconds.
Now, looking at the logic in pg_ctl's test_postmaster_connection(), the
only explanation that seems to fit the observed output is that the stat()
on the postmaster pidfile (at line 650 in HEAD) failed.  It's not clear
why though, since the postmaster was clearly still alive at this point,
and we must have been able to read the pidfile earlier to construct a
connection string, else there would have been no PQping attempt.

Maybe the stat failed for some unexpected resource-exhaustion kind of
reason?

It seems plausible to me that we should change pg_ctl to only consider
stat() failure to be a reason to give up waiting if errno is ENOENT,
not anything else.  At a minimum, I'd like to modify it to print the
errno if it's anything else, so that we can confirm or deny this theory
next time we see this buildfarm failure.

Comments?
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - allow backslash-continuations in custom scripts
Следующее
От: Marc Mamin
Дата:
Сообщение: Re: pg_dump -Fd and compression level