Обсуждение: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds

Поиск
Список
Период
Сортировка

BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17828
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 15.2
Operating system:   Ubuntu 22.04
Description:

The following script:
CREATE EXTENSION postgres_fdw;
DO $d$
            BEGIN
                EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER
postgres_fdw
                    OPTIONS (dbname '$$||current_database()||$$',
                             port '$$||current_setting('port')||$$',
                             async_capable 'true'
                    )$$;
            END;
$d$;

CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;

CREATE TABLE t(a int) PARTITION BY RANGE (a);
CREATE TABLE pt1 (a int);
INSERT INTO pt1 SELECT 1;
CREATE FOREIGN TABLE fpt1 PARTITION OF t FOR VALUES FROM ( 0) TO (10) SERVER
loopback OPTIONS (table_name 'pt1');
CREATE FOREIGN TABLE fpt2 PARTITION OF t FOR VALUES FROM (10) TO (20) SERVER
loopback OPTIONS (table_name 'non_existent_table');

CREATE TABLE nt(a int);
SELECT 'INSERT INTO nt SELECT * FROM t' FROM generate_series(1, 400) 
\gexec

(assuming "ulimit -n" = 1024 and max_safe_fds / 3 = 329)
invokes errors and an assertion failure:
...
CONTEXT:  remote SQL command: SELECT a FROM public.non_existent_table
ERROR:  relation "public.non_existent_table" does not exist
CONTEXT:  remote SQL command: SELECT a FROM public.non_existent_table
ERROR:  relation "public.non_existent_table" does not exist
CONTEXT:  remote SQL command: SELECT a FROM public.non_existent_table
ERROR:  epoll_create1 failed: Too many open files
ERROR:  epoll_create1 failed: Too many open files
WARNING:  AbortTransaction while in ABORT state
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
...

Core was generated by `postgres: law regression [local] INSERT
                        '.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/3229796' in core file too small.
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=139748276750272) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=139748276750272) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=139748276750272) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=139748276750272, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x00007f19adb0e476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x00007f19adaf47f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055f6436330b5 in ExceptionalCondition (conditionName=0x55f643c7b780
<str> "TransactionIdIsValid(proc->xid)", 
    fileName=0x55f643c7e48c "procarray.c", lineNumber=683) at assert.c:66
#6  0x000055f642ae083c in ProcArrayEndTransaction (proc=0x7f19a9c1e160,
latestXid=1060) at procarray.c:683
#7  0x000055f641180763 in AbortTransaction () at xact.c:2841
#8  0x000055f64118710f in AbortCurrentTransaction () at xact.c:3346
#9  0x000055f642c0f27a in PostgresMain (dbname=0x62900001b358 "regression",
username=0x6250000020f8 "law")
    at postgres.c:4299
#10 0x000055f642765b37 in BackendRun (port=0x614000000240) at
postmaster.c:4461
#11 0x000055f64275cd90 in BackendStartup (port=0x614000000240) at
postmaster.c:4189
#12 0x000055f642756d01 in ServerLoop () at postmaster.c:1779
#13 0x000055f642753da3 in PostmasterMain (argc=3, argv=0x603000000670) at
postmaster.c:1463
#14 0x000055f641f5a0fa in main (argc=3, argv=0x603000000670) at main.c:200

Here pgfdw_get_result() calls WaitLatchOrSocket() ->
CreateWaitEventSet(), that fails due to a lack of file descriptors;
as a result, AbortCurrentTransaction() called, that calls
CallXactCallbacks() -> pgfdw_xact_callback() ->
pgfdw_abort_cleanup() -> pgfdw_cancel_query() ->
pgfdw_get_cleanup_result() -> WaitLatchOrSocket() ->
CreateWaitEventSet(), that fails and AbortCurrentTransaction() called
again...


PG Bug reporting form <noreply@postgresql.org> writes:
> The following script:
> [ leaks a file descriptor per error ]

Yeah, at least on platforms where WaitEventSets own kernel file
descriptors.  I don't think it's postgres_fdw's fault though,
but that of ExecAppendAsyncEventWait, which is ignoring the
possibility of failing partway through.  It looks like it'd be
sufficient to add a PG_CATCH or PG_FINALLY block there to make
sure the WaitEventSet is disposed of properly --- fortunately,
it doesn't need to have any longer lifespan than that one
function.

            regards, tom lane



Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds

От
Alexander Lakhin
Дата:
Hello,

09.03.2023 19:57, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> The following script:
>> [ leaks a file descriptor per error ]
> Yeah, at least on platforms where WaitEventSets own kernel file
> descriptors.  I don't think it's postgres_fdw's fault though,
> but that of ExecAppendAsyncEventWait, which is ignoring the
> possibility of failing partway through.  It looks like it'd be
> sufficient to add a PG_CATCH or PG_FINALLY block there to make
> sure the WaitEventSet is disposed of properly --- fortunately,
> it doesn't need to have any longer lifespan than that one
> function.

Now that the leakage eliminated by 50c67c201/481d7d1c0 we still can observe
the assert-triggering half of the bug with something like that:
echo "
CREATE EXTENSION postgres_fdw;
SELECT id INTO t FROM generate_series(1,100000) id;
" | psql

ns=322 # assuming "ulimit -n" = 1024 and max_safe_fds / 3 = 329
for ((i=1;i<=$ns;i++)); do
echo "
DO \$d\$
   BEGIN
     EXECUTE \$\$CREATE SERVER loopback_$i FOREIGN DATA WRAPPER postgres_fdw
       OPTIONS (
         dbname '\$\$||current_database()||\$\$',
         port '\$\$||current_setting('port')||\$\$'
       )\$\$;
   END;
\$d\$;

CREATE USER MAPPING FOR CURRENT_USER SERVER loopback_$i;
CREATE FOREIGN TABLE ft_$i (id int) SERVER loopback_$i OPTIONS (table_name 't');
"
done | psql >psql-1.log

(echo "BEGIN;";
for ((i=1;i<=$ns;i++)); do echo "DECLARE cursor_$i CURSOR FOR SELECT * FROM ft_$i;"; done
echo "SELECT * INTO nt FROM ft_1 UNION SELECT * FROM ft_2;"
) | psql >psql-2.log

That leads to:
2023-11-24 03:03:50.281 UTC [1053419] ERROR:  epoll_create1 failed: Too many open files
2023-11-24 03:03:50.281 UTC [1053419] STATEMENT:  SELECT * INTO nt FROM ft_1 UNION SELECT * FROM ft_2;
2023-11-24 03:03:50.282 UTC [1053419] ERROR:  epoll_create1 failed: Too many open files
2023-11-24 03:03:50.282 UTC [1053419] WARNING:  AbortTransaction while in ABORT state
TRAP: failed Assert("TransactionIdIsValid(proc->xid)"), File: "procarray.c", Line: 677, PID: 1053419

Call chains for the errors are the same as in the initial report [1].

[1] https://www.postgresql.org/message-id/17828-122da8cba23236be%40postgresql.org

Best regards,
Alexander



On Fri, Nov 24, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> 09.03.2023 19:57, Tom Lane wrote:
> > PG Bug reporting form <noreply@postgresql.org> writes:
> >> The following script:
> >> [ leaks a file descriptor per error ]
> > Yeah, at least on platforms where WaitEventSets own kernel file
> > descriptors.  I don't think it's postgres_fdw's fault though,
> > but that of ExecAppendAsyncEventWait, which is ignoring the
> > possibility of failing partway through.  It looks like it'd be
> > sufficient to add a PG_CATCH or PG_FINALLY block there to make
> > sure the WaitEventSet is disposed of properly --- fortunately,
> > it doesn't need to have any longer lifespan than that one
> > function.
>
> Now that the leakage eliminated by 50c67c201/481d7d1c0 we still can observe
> the assert-triggering half of the bug with something like that:

Will look into this.  Thanks for the report!

My apologies for my very late response to this thread.  I overlooked
it completely.  :(

Best regards,
Etsuro Fujita



On Mon, Nov 27, 2023 at 12:05 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Fri, Nov 24, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> > Now that the leakage eliminated by 50c67c201/481d7d1c0 we still can observe
> > the assert-triggering half of the bug with something like that:
>
> Will look into this.

I finally had time to look into this.

IIUC I think the assertion failure was caused by an
error-during-error-recovery loop caused by the "epoll_create1 failed:
Too many open files" error raised in WaitLatchOrSocket called from
pgfdw_get_cleanup_result, which is called during abort cleanup.  I
think a simple fix to avoid such a loop is to modify the PG_CATCH
block in pgfdw_get_cleanup_result so that it just ignores the passed
error, not re-throwing it, and restores InterruptHoldoffCount and the
memory context, like the attached.  In the patch I also modified
callers of pgfdw_get_cleanup_result to issue a warning when ignoring
the error.  I might be missing something, though.

Best regards,
Etsuro Fujita

Вложения
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> IIUC I think the assertion failure was caused by an
> error-during-error-recovery loop caused by the "epoll_create1 failed:
> Too many open files" error raised in WaitLatchOrSocket called from
> pgfdw_get_cleanup_result, which is called during abort cleanup.  I
> think a simple fix to avoid such a loop is to modify the PG_CATCH
> block in pgfdw_get_cleanup_result so that it just ignores the passed
> error, not re-throwing it, and restores InterruptHoldoffCount and the
> memory context, like the attached.  In the patch I also modified
> callers of pgfdw_get_cleanup_result to issue a warning when ignoring
> the error.  I might be missing something, though.

I do not think ignoring the passed error is *ever* acceptable.
You have no idea what the error condition is or whether your
hack is sufficient to recover from it.

            regards, tom lane



Hi,

On 2024-02-08 12:04:24 -0500, Tom Lane wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > IIUC I think the assertion failure was caused by an
> > error-during-error-recovery loop caused by the "epoll_create1 failed:
> > Too many open files" error raised in WaitLatchOrSocket called from
> > pgfdw_get_cleanup_result, which is called during abort cleanup.  I
> > think a simple fix to avoid such a loop is to modify the PG_CATCH
> > block in pgfdw_get_cleanup_result so that it just ignores the passed
> > error, not re-throwing it, and restores InterruptHoldoffCount and the
> > memory context, like the attached.  In the patch I also modified
> > callers of pgfdw_get_cleanup_result to issue a warning when ignoring
> > the error.  I might be missing something, though.
> 
> I do not think ignoring the passed error is *ever* acceptable.
> You have no idea what the error condition is or whether your
> hack is sufficient to recover from it.

+1


I think we ought to understand *why* we are getting the "Too many open
files". The AcquireExternalFD() in CreateWaitEventSet() should prevent
that.

One annoying bit is that AcquireExternalFD() failing emits the same error as
if epoll_create1() itself failing, including the same errno. So we don't know
if the problem is that there are too many connections and because of that we
are running out of "file descriptor slots", or whether something was using up
file descriptors outside of our system, or ...

I also wonder if postgres_fdw should strive to use a longer lived wait event
set. For efficiency, if nothing else? That'd avoid the need to create one
during error handling.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> I also wonder if postgres_fdw should strive to use a longer lived wait event
> set. For efficiency, if nothing else? That'd avoid the need to create one
> during error handling.

Seems like a plausible idea.  Having to acquire resources to clean up
an error is an anti-pattern for sure.

            regards, tom lane



Andres Freund <andres@anarazel.de> writes:
> I think we ought to understand *why* we are getting the "Too many open
> files". The AcquireExternalFD() in CreateWaitEventSet() should prevent
> that.

Actually, I think the AcquireExternalFD() in CreateWaitEventSet() is
*causing* that and needs to be removed.

What is happening in Alexander's new example is that we are doing
AcquireExternalFD() for each postgres_fdw connection
(cf. libpqsrv_connect in libpq-be-fe-helpers.h), and the example
is tuned to bring that exactly up to the limit of what
AcquireExternalFD() allows.  Then the next WaitLatchOrSocket call
fails, because it does

    WaitEventSet *set = CreateWaitEventSet(CurrentResourceOwner, 3);

Then when pgfdw_abort_cleanup tries to clean up the connections'
state, it needs to do WaitLatchOrSocket again, and that fails again,
and we PANIC because we're already in abort state.

Since WaitLatchOrSocket is going to free this WaitEventSet before it
returns, it's not apparent to me why we need to count it as a
long-lived FD: we could just as well assume that it can slide in under
the NUM_RESERVED_FDS limit.  Or perhaps use ReserveExternalFD instead
of AcquireExternalFD.  We'd need some API extension to tell latch.c to
do that, but that doesn't seem hard.  (Unless we could consider that
all WaitEventSets should use ReserveExternalFD?  Not sure I want to
argue for that though.)

I guess a third possibility is that WaitLatchOrSocket could just
permanently hang onto the WaitEventSet once it's got one.

> One annoying bit is that AcquireExternalFD() failing emits the same error as
> if epoll_create1() itself failing, including the same errno.

It's the former.  I tend to agree now that maybe using the same error
text wasn't too smart.

            regards, tom lane



Hi,

On 2024-02-08 18:50:32 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think we ought to understand *why* we are getting the "Too many open
> > files". The AcquireExternalFD() in CreateWaitEventSet() should prevent
> > that.
>
> Actually, I think the AcquireExternalFD() in CreateWaitEventSet() is
> *causing* that and needs to be removed.
>
> What is happening in Alexander's new example is that we are doing
> AcquireExternalFD() for each postgres_fdw connection
> (cf. libpqsrv_connect in libpq-be-fe-helpers.h), and the example
> is tuned to bring that exactly up to the limit of what
> AcquireExternalFD() allows.  Then the next WaitLatchOrSocket call
> fails, because it does
>
>     WaitEventSet *set = CreateWaitEventSet(CurrentResourceOwner, 3);
>
> Then when pgfdw_abort_cleanup tries to clean up the connections'
> state, it needs to do WaitLatchOrSocket again, and that fails again,
> and we PANIC because we're already in abort state.
>
> Since WaitLatchOrSocket is going to free this WaitEventSet before it
> returns, it's not apparent to me why we need to count it as a
> long-lived FD: we could just as well assume that it can slide in under
> the NUM_RESERVED_FDS limit.

Well, the AcquireExternalFD() is in more general code, that's also used for
long-lived WaitEventSets - for those it's the right thing to count it as a
long lived FD.


> Or perhaps use ReserveExternalFD instead of AcquireExternalFD.  We'd need
> some API extension to tell latch.c to do that, but that doesn't seem hard.
> (Unless we could consider that all WaitEventSets should use
> ReserveExternalFD?  Not sure I want to argue for that though.)

Yea, I don't think we want that.


> I guess a third possibility is that WaitLatchOrSocket could just
> permanently hang onto the WaitEventSet once it's got one.

Right now we don't support changing the socket FD associated with the WES, so
that'd not easily work. We've been talking about adding support for that for a
while though.


I might be missing something here, but leaving the concrete crash aside, why
is it ok for pgfdw_get_cleanup_result() etc to block during abort processing?
If I read the code right, we can end up waiting for up to 2x30s for each
connection, and there can be many connections.  The code also has a bunch of
memory allocations, so if we are in abort processing after an out-of-memory
error, we can easily cause failures again as well.

Greetings,

Andres Freund



On Fri, Feb 9, 2024 at 11:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I also wonder if postgres_fdw should strive to use a longer lived wait event
> > set. For efficiency, if nothing else? That'd avoid the need to create one
> > during error handling.
>
> Seems like a plausible idea.  Having to acquire resources to clean up
> an error is an anti-pattern for sure.

FWIW, WRT to the long lived WES idea, I have had patches to explore
several different ways to do that (as I already did successfully for
other areas), and I will write about that again later, but I concluded
that we needed to fix this Windows nonsense first:
https://commitfest.postgresql.org/46/3523/  (That got withdrawn for
being too slow, but I'll resubmit it soon enough...)



Andres Freund <andres@anarazel.de> writes:
> I might be missing something here, but leaving the concrete crash aside, why
> is it ok for pgfdw_get_cleanup_result() etc to block during abort processing?

It's not pretty, for sure.  I thought briefly about postponing the
cleanup until we next try to use the connection, but I fear the
semantic side-effects of that would be catastrophic.  We can't leave
the remote's query sitting open long after the local transaction has
been canceled --- that risks undetected deadlocks, at the least.
I think all we can do is try to reduce the risk of failure during
transaction cleanup.

            regards, tom lane



Hi,

On 2024-02-08 19:20:35 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I might be missing something here, but leaving the concrete crash aside, why
> > is it ok for pgfdw_get_cleanup_result() etc to block during abort processing?
>
> It's not pretty, for sure.  I thought briefly about postponing the
> cleanup until we next try to use the connection, but I fear the
> semantic side-effects of that would be catastrophic.  We can't leave
> the remote's query sitting open long after the local transaction has
> been canceled --- that risks undetected deadlocks, at the least.
> I think all we can do is try to reduce the risk of failure during
> transaction cleanup.

I agree that we can't just delay cleanup till, potentially, much later , but I
don't think that means that we have to wait 30s for each connection,
one-by-one.


One way we could fix the issue at hand would be to make postgres fdw reserve
one FD, for all connections, and release it before the WaitLatchOrSocket() and
reacquire it after. That way we can make sure that there's an FD available.

OTOH, as waiting for connections one-by-one isn't good, perhaps we should just
rewrite the code to create one WES for all connections and wait in parallel,
processing cancel/aborts completing as they complete.  That'd make the abort
less slow and it'd make the reserve-one-fd-for-postgres-fdw approach a bit
less ugly.  But unfortunately that's a bit big for a bugfix...

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> I agree that we can't just delay cleanup till, potentially, much later , but I
> don't think that means that we have to wait 30s for each connection,
> one-by-one.

While that behavior is theoretically possible, I've heard no field
complaints about it actually happening.  I don't think we should
move heaven and earth to prevent it, barring some evidence of it
being a real-world issue.

ISTM we do need to do something about this AcquireExternalFD failure
though, because that's just self-inflicted damage.  (And I think it's
likely new since we invented WaitEventSets, though perhaps I'm wrong
about that.)

            regards, tom lane



Hi,

On Fri, Feb 9, 2024 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ISTM we do need to do something about this AcquireExternalFD failure
> though, because that's just self-inflicted damage.  (And I think it's
> likely new since we invented WaitEventSets, though perhaps I'm wrong
> about that.)

+1; but my concern is that even if we fix that failure, it is still
possible that WaitLatchOrSocket in pgfdw_get_cleanup_result throws an
error, causing the same error-during-error-recovery loop.  (For
example, epoll_create1() can fail in CreateWaitEventSet in the
WaitLatchOrSocket after reserving an external FD, leading to an
error.)

Best regards,
Etsuro Fujita