Re: emergency outage requiring database restart

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: emergency outage requiring database restart
Дата
Msg-id 17649.1478008605@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: emergency outage requiring database restart
Re: emergency outage requiring database restart
Список pgsql-hackers
Merlin Moncure <mmoncure@gmail.com> writes:
> On Mon, Oct 31, 2016 at 10:32 AM, Oskari Saarenmaa <os@ohmu.fi> wrote:
>> Your production system's postgres backends probably have a lot more open
>> files associated with them than the simple test case does.  Since Postgres
>> likes to keep files open as long as possible and only closes them when you
>> need to free up fds to open new files, it's possible that your production
>> backends have almost all allowed fds used when you execute your pl/sh
>> function.
>>
>> If that's the case, the sqsh process that's executed may not have enough fds
>> to do what it wanted to do and because of busted error handling could end up
>> writing to fds that were opened by Postgres and point to $PGDATA files.

> Does that apply?  the mechanics are a sqsh function that basically does:
> cat foo.sql  | sqsh <args>
> pipe redirection opens a new process, right?

Yeah, but I doubt that either level of the shell would attempt to close
inherited file handles.

The real problem with Oskari's theory is that it requires not merely
busted, but positively brain-dead error handling in the shell and/or
sqsh, ie ignoring open() failures altogether.  That seems kind of
unlikely.  Still, I suspect he might be onto something --- there must
be some reason you can reproduce the issue in production and not in
your test bed, and number-of-open-files is as good a theory as I've
heard.

Maybe the issue is not with open() failures, but with the resulting
FD numbers being much larger than sqsh is expecting.  It would be
weird to try to store an FD in something narrower than int, but
I could see a use of select() being unprepared for large FDs.
Still, it's hard to translate that idea into scribbling on the
wrong file...
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Improve output of BitmapAnd EXPLAIN ANALYZE
Следующее
От: Andres Freund
Дата:
Сообщение: Re: emergency outage requiring database restart