Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

Поиск

Список

Период

Сортировка

От	Fujii Masao
Тема	Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Дата	14 декабря 2020 г. 07:08:48
Msg-id	2a0d7305-1cf6-db59-7fc6-241c55c8fc8b@oss.nttdata.com обсуждение исходный текст
Ответ на	Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы	Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения


On 2020/12/12 15:05, Bharath Rupireddy wrote:
> On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>  > I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in GetConnection()
cannotfind the cached connection entry invalidated by that drop. Because "user->umid" used as hash key is changed. So I
wasthinking that that invalidated connection will not be closed nor reconnected.
 
>  >
> 
> You are right in saying that the connection leaks.
> 
> Use case 1:
> 1) Run foreign query in session1 with server1, user mapping1
> 2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by other
sessions
> 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback().Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze
etc.),upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR:  user mapping not
foundfor "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching
GetConnection()where the connections associated with invalidated entries would have got disconnected.
 
> 
> So, the connection associated with invalidated entry would remain until the local session exits which is a problem to
solve.
> 
> Use case 2:
> 1) Run foreign query in session1 with server1, user mapping1
> 2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use CASCADE, then
thedependent user mapping1 and foreign tables get dropped too [1].
 
> 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback(),it fails because there is no foreign table and user mapping1.
 
> 
> But, note that the connection remains open in session1, which is again a problem to solve.
> 
> To solve the above connection leak problem, it looks like the right place to close all the invalid connections is
pgfdw_xact_callback(),once registered, which gets called at the end of every txn in the current session(by then all the
subtxns also would have been finished). Note that if there are too many invalidated entries, then one of the following
txnhas to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I can code a
separatepatch.
 

Thanks for further analysis! Sounds good. Also +1 for making it as separate patch. Maybe only this patch needs to be
back-patched.


> static void
> pgfdw_xact_callback(XactEvent event, void *arg)
> {
>      HASH_SEQ_STATUS scan;
>      ConnCacheEntry *entry;
> *     /* HERE WE CAN LOOK FOR ALL INVALIDATED ENTRIES AND DISCONNECT THEM*/*

This may cause the connection to be closed before sending COMMIT TRANSACTION command to the foreign server, i.e., the
connectionis closed in the middle of the transaction. So as you explained before, we should avoid that? If this my
understandingis right, probably the connection should be closed after COMMIT TRANSACTION command is sent to the foreign
server.What about changing the following code in pgfdw_xact_callback() so that it closes the connection even when it's
markedas invalidated?
 

        if (PQstatus(entry->conn) != CONNECTION_OK ||
            PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
            entry->changing_xact_state)
        {
            elog(DEBUG3, "discarding connection %p", entry->conn);
            disconnect_pg_server(entry);
        }


>      /* Quick exit if no connections were touched in this transaction. */
>      if (!xact_got_connection)
>          return;
> 
> And we can also extend postgres_fdw_disconnect() something like.
> 
> postgres_fdw_disconnect(bool invalid_only) --> default for invalid_only false. disconnects all connections. when
invalid_onlyis set to true then disconnects only invalid connections.
 
> postgres_fdw_disconnect('server_name') --> disconnections connections associated with the specified foreign server
> 
> Having said this, I'm not in favour of invalid_only flag, because if we choose to change the code in
pgfdw_xact_callbackto solve connection leak problem, we may not need this invalid_only flag at all, because at the end
oftxn (even for the txns in which the queries fail with error, pgfdw_xact_callback gets called), all the existing
invalidconnections get disconnected. Thoughts?
 

+1 not to have invalid_only flag. On the other hand, I think that postgres_fdw_get_connections() should return all the
cachedconnections including invalidated ones. Otherwise, the number of connections observed via
postgres_fdw_get_connections()may be different from the number of connections actually established, and which would be
confusingto users. BTW, even after fixing the connection-leak issue, postgres_fdw_get_connections() may see invalidated
cachedconnections when it's called during the transaction.
 


> 
> [1]
> postgres=# drop server loopback1 ;
> ERROR:  cannot drop server loopback1 because other objects depend on it
> DETAIL:  user mapping for bharath on server loopback1 depends on server loopback1
> foreign table f1 depends on server loopback1
> HINT:  Use DROP ... CASCADE to drop the dependent objects too.
> 
> postgres=# drop server loopback1 CASCADE ;
> NOTICE:  drop cascades to 2 other objects
> DETAIL:  drop cascades to user mapping for bharath on server loopback1
> drop cascades to foreign table f1
> DROP SERVER
> 
>  > >      if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
>  > >      {
>  > >          elog(DEBUG3, "closing connection %p for option changes to take effect",
>  > >               entry->conn);
>  > >          disconnect_pg_server(entry);
>  > >      }
>  > >
>  > >> If so, this seems like a connection-leak bug, at least for me.... Thought?
>  > >>
>  > >
>  > > It's not a leak. The comment before pgfdw_inval_callback() [1]
>  > > explains why we can not immediately close/disconnect the connections
>  > > in pgfdw_inval_callback() after marking them as invalidated.
>  >
>  > *If* invalidated connection cannot be close immediately even in the case of drop of server or user mapping, we can
deferit to the subsequent call to GetConnection(). That is, GetConnection() closes not only the target invalidated
connectionbut also the other all invalidated connections. Of course, invalidated connections will remain until
subsequentGetConnection() is called, though.
 
>  >
> 
> I think my detailed response to the above comment clarifies this.
> 
>  > > Here is the scenario how in the midst of a txn we get invalidation
>  > > messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
>  > > happens):
>  > >
>  > > 1) select from foreign table with server1, usermapping1 in session1
>  > > 2) begin a top txn in session1, run a few foreign queries that open up
>  > > sub txns internally. meanwhile alter/drop server1/usermapping1 in
>  > > session2, then at each start of sub txn also we get to process the
>  > > invalidation messages via
>  > > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
>  > > So, if we disconnect right after marking invalidated in
>  > > pgfdw_inval_callback, that's a problem since we are in a sub txn under
>  > > a top txn.
>  >
>  > Maybe. But what is the actual problem here?
>  >
>  > OTOH, if cached connection should not be close in the middle of transaction, postgres_fdw_disconnect() also should
bedisallowed to be executed during transaction?
 
> 
> +1. Yeah that makes sense. We can avoid closing the connection if (entry->xact_depth > 0). I will modify it in
disconnect_cached_connections().
> 
>  > >> Could you tell me why ConnectionHash needs to be destroyed?
>  > >
>  > > Say, in a session there are hundreds of different foreign server
>  > > connections made and if users want to disconnect all of them with the
>  > > new function and don't want any further foreign connections in that
>  > > session, they can do it. But then why keep the cache just lying around
>  > > and holding those many entries? Instead we can destroy the cache and
>  > > if necessary it will be allocated later on next foreign server
>  > > connections.
>  > >
>  > > IMHO, it is better to destroy the cache in case of disconnect all,
>  > > hoping to save memory, thinking that (next time if required) the cache
>  > > allocation doesn't take much time. Thoughts?
>  >
>  > Ok, but why is ConnectionHash destroyed only when "all" is true? Even when "all" is false, for example, the
followingquery can disconnect all the cached connections. Even in this case, i.e., whenever there are no cached
connections,ConnectionHash should be destroyed?
 
>  >
>  >      SELECT postgres_fdw_disconnect(srvname) FROM pg_foreign_server ;
> 
> +1. I can check all the cache entries to see if there are any active connections, in the same loop where I try to
findthe cache entry for the given foreign server,  if none exists, then I destroy the cache. Thoughts?
 

+1

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Jammie
Дата: 14 декабря 2020 г., 06:59:54
Сообщение: Movement of restart_lsn position movement of logical replication slots is very slow

Следующее

От: "tsunakawa.takay@fujitsu.com"
Дата: 14 декабря 2020 г., 07:22:50
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

Предыдущее

Следующее