Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Дата
Msg-id 1273950.1707436232@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
Andres Freund <andres@anarazel.de> writes:
> I think we ought to understand *why* we are getting the "Too many open
> files". The AcquireExternalFD() in CreateWaitEventSet() should prevent
> that.

Actually, I think the AcquireExternalFD() in CreateWaitEventSet() is
*causing* that and needs to be removed.

What is happening in Alexander's new example is that we are doing
AcquireExternalFD() for each postgres_fdw connection
(cf. libpqsrv_connect in libpq-be-fe-helpers.h), and the example
is tuned to bring that exactly up to the limit of what
AcquireExternalFD() allows.  Then the next WaitLatchOrSocket call
fails, because it does

    WaitEventSet *set = CreateWaitEventSet(CurrentResourceOwner, 3);

Then when pgfdw_abort_cleanup tries to clean up the connections'
state, it needs to do WaitLatchOrSocket again, and that fails again,
and we PANIC because we're already in abort state.

Since WaitLatchOrSocket is going to free this WaitEventSet before it
returns, it's not apparent to me why we need to count it as a
long-lived FD: we could just as well assume that it can slide in under
the NUM_RESERVED_FDS limit.  Or perhaps use ReserveExternalFD instead
of AcquireExternalFD.  We'd need some API extension to tell latch.c to
do that, but that doesn't seem hard.  (Unless we could consider that
all WaitEventSets should use ReserveExternalFD?  Not sure I want to
argue for that though.)

I guess a third possibility is that WaitLatchOrSocket could just
permanently hang onto the WaitEventSet once it's got one.

> One annoying bit is that AcquireExternalFD() failing emits the same error as
> if epoll_create1() itself failing, including the same errno.

It's the former.  I tend to agree now that maybe using the same error
text wasn't too smart.

            regards, tom lane



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds