Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Дата
Msg-id 3595a221-d845-c97c-96fe-5d4f739edaad@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers

On 2020/12/12 3:01, Bharath Rupireddy wrote:
> On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2020/12/11 19:16, Bharath Rupireddy wrote:
>>> On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
>>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>>> +                       /* We only look for active and open remote connections. */
>>>>> +                       if (entry->invalidated || !entry->conn)
>>>>> +                               continue;
>>>>>
>>>>> We should return even invalidated entry because it has still cached connection?
>>>>>
>>>>
>>>> I checked this point earlier, for invalidated connections, the tuple
>>>> returned from the cache is also invalid and the following error will
>>>> be thrown. So, we can not get the server name for that user mapping.
>>>> Cache entries too would have been invalidated after the connection is
>>>> marked as invalid in pgfdw_inval_callback().
>>>>
>>>> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
>>>> if (!HeapTupleIsValid(umaptup))
>>>>         elog(ERROR, "cache lookup failed for user mapping with OID %u",
>>>> entry->key);
>>>>
>>>
>>> I further checked on returning invalidated connections in the output
>>> of the function. Actually, the reason I'm seeing a null tuple from sys
>>> cache (and hence the error "cache lookup failed for user mapping with
>>> OID xxxx") for an invalidated connection is that the user mapping
>>> (with OID entry->key that exists in the cache) is getting dropped, so
>>> the sys cache returns null tuple. The use case is as follows:
>>>
>>> 1) Create a server, role, and user mapping of the role with the server
>>> 2) Run a foreign table query, so that the connection related to the
>>> server gets cached
>>> 3) Issue DROP OWNED BY for the role created, since the user mapping is
>>> dependent on that role, it gets dropped from the catalogue table and
>>> an invalidation message will be pending to clear the sys cache
>>> associated with that user mapping.
>>> 4) Now, if I do select * from postgres_fdw_get_connections() or for
>>> that matter any query, at the beginning the txn
>>> AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
>>> gets called and marks the cached entry as invalidated. Remember the
>>> reason for this invalidation message is that the user mapping with the
>>> OID entry->key is dropped from 3). Later in
>>> postgres_fdw_get_connections(), when we search the sys cache with
>>> entry->key for that invalidated connection, since the user mapping is
>>> dropped from the system, null tuple is returned.
>>
>> Thanks for the analysis! This means that the cached connection invalidated by drop of server or user mapping will
notbe closed even by the subsequent access to the foreign server and will remain until the backend exits. Right?
 
> 
> It will be first marked as invalidated via
> AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(),
> and on the next use of that connection invalidated connections are
> disconnected and reconnected.

I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in GetConnection()
cannotfind the cached connection entry invalidated by that drop. Because "user->umid" used as hash key is changed. So I
wasthinking that that invalidated connection will not be closed nor reconnected.
 


> 
>      if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
>      {
>          elog(DEBUG3, "closing connection %p for option changes to take effect",
>               entry->conn);
>          disconnect_pg_server(entry);
>      }
> 
>> If so, this seems like a connection-leak bug, at least for me.... Thought?
>>
> 
> It's not a leak. The comment before pgfdw_inval_callback() [1]
> explains why we can not immediately close/disconnect the connections
> in pgfdw_inval_callback() after marking them as invalidated.

*If* invalidated connection cannot be close immediately even in the case of drop of server or user mapping, we can
deferit to the subsequent call to GetConnection(). That is, GetConnection() closes not only the target invalidated
connectionbut also the other all invalidated connections. Of course, invalidated connections will remain until
subsequentGetConnection() is called, though.
 


> 
> Here is the scenario how in the midst of a txn we get invalidation
> messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> happens):
> 
> 1) select from foreign table with server1, usermapping1 in session1
> 2) begin a top txn in session1, run a few foreign queries that open up
> sub txns internally. meanwhile alter/drop server1/usermapping1 in
> session2, then at each start of sub txn also we get to process the
> invalidation messages via
> AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
> So, if we disconnect right after marking invalidated in
> pgfdw_inval_callback, that's a problem since we are in a sub txn under
> a top txn.

Maybe. But what is the actual problem here?

OTOH, if cached connection should not be close in the middle of transaction, postgres_fdw_disconnect() also should be
disallowedto be executed during transaction?
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Rethinking plpgsql's assignment implementation
Следующее
От: "Bossart, Nathan"
Дата:
Сообщение: Re: please update ps display for recovery checkpoint