Re: Idea for improving buildfarm robustness

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Idea for improving buildfarm robustness
Дата
Msg-id 1555.1443556067@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Idea for improving buildfarm robustness  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: Idea for improving buildfarm robustness  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Idea for improving buildfarm robustness  (Joe Conway <mail@joeconway.com>)
Список pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:
> On 09/29/2015 11:48 AM, Tom Lane wrote:
>> But today I thought of another way: suppose that we teach the postmaster
>> to commit hara-kiri if the $PGDATA directory goes away.  Since the
>> buildfarm script definitely does remove all the temporary data directories
>> it creates, this ought to get the job done.

> This would also be useful for production.  I can't count the number of
> times I've accidentally blown away a replica's PGDATA without shutting
> the postmaster down first, and then had to do a bunch of kill -9.

> In general, having the postmaster survive deletion of PGDATA is
> suboptimal.  In rare cases of having it survive installation of a new
> PGDATA (via PITR restore, for example), I've even seen the zombie
> postmaster corrupt the data files.

Side comment on that: if you'd actually removed $PGDATA, I can't see how
that would happen.  The postmaster and children would have open CWD
handles to the now-disconnected-from-anything-else directory inode,
which would not enable them to reach files created under the new directory
inode.  (They don't ever use absolute paths, only relative, or at least
that's the way it's supposed to work.)

However ... if you'd simply deleted everything *under* $PGDATA but not
that directory itself, then this type of failure mode is 100% plausible.
And that's not an unreasonable thing to do, especially if you've set
things up so that $PGDATA's parent is not a writable directory.

Testing accessibility of "global/pg_control" would be enough to catch this
case, but only if we do it before you create a new one.  So that seems
like an argument for making the test relatively often.  The once-a-minute
option is sounding better and better.

We could possibly add additional checks, like trying to verify that
pg_control has the same inode number it used to.  But I'm afraid that
would add portability issues and false-positive hazards that would
outweigh the value.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: ON CONFLICT issues around whole row vars,
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: ON CONFLICT issues around whole row vars,