Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Дата
Msg-id CALj2ACVFfsmAMyYVidXWfu4URST2xwO20NSYrHUQwxG070poxA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Ответы Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Список pgsql-hackers
On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2020/12/11 19:16, Bharath Rupireddy wrote:
> > On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>> +                       /* We only look for active and open remote connections. */
> >>> +                       if (entry->invalidated || !entry->conn)
> >>> +                               continue;
> >>>
> >>> We should return even invalidated entry because it has still cached connection?
> >>>
> >>
> >> I checked this point earlier, for invalidated connections, the tuple
> >> returned from the cache is also invalid and the following error will
> >> be thrown. So, we can not get the server name for that user mapping.
> >> Cache entries too would have been invalidated after the connection is
> >> marked as invalid in pgfdw_inval_callback().
> >>
> >> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
> >> if (!HeapTupleIsValid(umaptup))
> >>        elog(ERROR, "cache lookup failed for user mapping with OID %u",
> >> entry->key);
> >>
> >
> > I further checked on returning invalidated connections in the output
> > of the function. Actually, the reason I'm seeing a null tuple from sys
> > cache (and hence the error "cache lookup failed for user mapping with
> > OID xxxx") for an invalidated connection is that the user mapping
> > (with OID entry->key that exists in the cache) is getting dropped, so
> > the sys cache returns null tuple. The use case is as follows:
> >
> > 1) Create a server, role, and user mapping of the role with the server
> > 2) Run a foreign table query, so that the connection related to the
> > server gets cached
> > 3) Issue DROP OWNED BY for the role created, since the user mapping is
> > dependent on that role, it gets dropped from the catalogue table and
> > an invalidation message will be pending to clear the sys cache
> > associated with that user mapping.
> > 4) Now, if I do select * from postgres_fdw_get_connections() or for
> > that matter any query, at the beginning the txn
> > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> > gets called and marks the cached entry as invalidated. Remember the
> > reason for this invalidation message is that the user mapping with the
> > OID entry->key is dropped from 3). Later in
> > postgres_fdw_get_connections(), when we search the sys cache with
> > entry->key for that invalidated connection, since the user mapping is
> > dropped from the system, null tuple is returned.
>
> Thanks for the analysis! This means that the cached connection invalidated by drop of server or user mapping will not
beclosed even by the subsequent access to the foreign server and will remain until the backend exits. Right?
 

It will be first marked as invalidated via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(),
and on the next use of that connection invalidated connections are
disconnected and reconnected.

    if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
    {
        elog(DEBUG3, "closing connection %p for option changes to take effect",
             entry->conn);
        disconnect_pg_server(entry);
    }

> If so, this seems like a connection-leak bug, at least for me.... Thought?
>

It's not a leak. The comment before pgfdw_inval_callback() [1]
explains why we can not immediately close/disconnect the connections
in pgfdw_inval_callback() after marking them as invalidated.

Here is the scenario how in the midst of a txn we get invalidation
messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
happens):

1) select from foreign table with server1, usermapping1 in session1
2) begin a top txn in session1, run a few foreign queries that open up
sub txns internally. meanwhile alter/drop server1/usermapping1 in
session2, then at each start of sub txn also we get to process the
invalidation messages via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
So, if we disconnect right after marking invalidated in
pgfdw_inval_callback, that's a problem since we are in a sub txn under
a top txn.

I don't think we can do something here and disconnect the connections
right after the invalidation happens. Thoughts?


[1]
/*
 * Connection invalidation callback function
 *
 * After a change to a pg_foreign_server or pg_user_mapping catalog entry,
 * mark connections depending on that entry as needing to be remade.
 * We can't immediately destroy them, since they might be in the midst of
 * a transaction, but we'll remake them at the next opportunity.
 *
 * Although most cache invalidation callbacks blow away all the related stuff
 * regardless of the given hashvalue, connections are expensive enough that
 * it's worth trying to avoid that.
 *
 * NB: We could avoid unnecessary disconnection more strictly by examining
 * individual option values, but it seems too much effort for the gain.
 */
static void
pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Proposed patch for key managment
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_basebackup test coverage