On Mon, Nov 27, 2023 at 12:05 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Fri, Nov 24, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> > Now that the leakage eliminated by 50c67c201/481d7d1c0 we still can observe
> > the assert-triggering half of the bug with something like that:
>
> Will look into this.
I finally had time to look into this.
IIUC I think the assertion failure was caused by an
error-during-error-recovery loop caused by the "epoll_create1 failed:
Too many open files" error raised in WaitLatchOrSocket called from
pgfdw_get_cleanup_result, which is called during abort cleanup. I
think a simple fix to avoid such a loop is to modify the PG_CATCH
block in pgfdw_get_cleanup_result so that it just ignores the passed
error, not re-throwing it, and restores InterruptHoldoffCount and the
memory context, like the attached. In the patch I also modified
callers of pgfdw_get_cleanup_result to issue a warning when ignoring
the error. I might be missing something, though.
Best regards,
Etsuro Fujita