Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Дата
Msg-id 6da38393-6ae5-4d87-2690-11c932123403@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers

On 2020/12/14 14:36, Bharath Rupireddy wrote:
> On Mon, Dec 14, 2020 at 9:38 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>  > On 2020/12/12 15:05, Bharath Rupireddy wrote:
>  > > On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com<mailto:masao.fujii@oss.nttdata.com>>> wrote:
 
>  > >  > I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in
GetConnection()cannot find the cached connection entry invalidated by that drop. Because "user->umid" used as hash key
ischanged. So I was thinking that that invalidated connection will not be closed nor reconnected.
 
>  > >  >
>  > >
>  > > You are right in saying that the connection leaks.
>  > >
>  > > Use case 1:
>  > > 1) Run foreign query in session1 with server1, user mapping1
>  > > 2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by
othersessions
 
>  > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback().Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze
etc.),upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR:  user mapping not
foundfor "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching
GetConnection()where the connections associated with invalidated entries would have got disconnected.
 
>  > >
>  > > So, the connection associated with invalidated entry would remain until the local session exits which is a
problemto solve.
 
>  > >
>  > > Use case 2:
>  > > 1) Run foreign query in session1 with server1, user mapping1
>  > > 2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use CASCADE,
thenthe dependent user mapping1 and foreign tables get dropped too [1].
 
>  > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback(),it fails because there is no foreign table and user mapping1.
 
>  > >
>  > > But, note that the connection remains open in session1, which is again a problem to solve.
>  > >
>  > > To solve the above connection leak problem, it looks like the right place to close all the invalid connections
ispgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all
thesub txns also would have been finished). Note that if there are too many invalidated entries, then one of the
followingtxn has to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I
cancode a separate patch.
 
>  >
>  > Thanks for further analysis! Sounds good. Also +1 for making it as separate patch. Maybe only this patch needs to
beback-patched.
 
> 
> Thanks. Yeah once agreed on the fix, +1 to back patch. Shall I start a separate thread for connection leak issue and
patch,so that others might have different thoughts??
 

Yes, of course!


> 
>  > > static void
>  > > pgfdw_xact_callback(XactEvent event, void *arg)
>  > > {
>  > >      HASH_SEQ_STATUS scan;
>  > >      ConnCacheEntry *entry;
>  > > *     /* HERE WE CAN LOOK FOR ALL INVALIDATED ENTRIES AND DISCONNECT THEM*/*
>  >
>  > This may cause the connection to be closed before sending COMMIT TRANSACTION command to the foreign server, i.e.,
theconnection is closed in the middle of the transaction. So as you explained before, we should avoid that? If this my
understandingis right, probably the connection should be closed after COMMIT TRANSACTION command is sent to the foreign
server.What about changing the following code in pgfdw_xact_callback() so that it closes the connection even when it's
markedas invalidated?
 
> 
> You are right! I'm posting what I have in my mind for fixing this connection leak problem.
> 
> /* tracks whether any work is needed in callback functions */
> static bool xact_got_connection = false;
> /* tracks whether there exists at least one invalid connection in the connection cache */
> *static bool invalid_connections_exist = false;*
> 
> static void
> pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue)
> {
> 
>          /* hashvalue == 0 means a cache reset, must clear all state */
>          if (hashvalue == 0 ||
>              (cacheid == FOREIGNSERVEROID &&
>               entry->server_hashvalue == hashvalue) ||
>              (cacheid == USERMAPPINGOID &&
>               entry->mapping_hashvalue == hashvalue))
>          {
>              entry->invalidated = true;
> *            invalid_connections_exist = true;*
>      }
> 
> static void
> pgfdw_xact_callback(XactEvent event, void *arg)
> {
>      HASH_SEQ_STATUS scan;
>      ConnCacheEntry *entry;
> 
>      /* Quick exit if no connections were touched in this transaction or there are no invalid connections in the
cache.*/
 
>      if (!xact_got_connection *&& !invalid_connections_exist)*
>          return;
> 
>          /*
>           * If the connection isn't in a good idle state, discard it to
>           * recover. Next GetConnection will open a new connection.
>           */
>          if (PQstatus(entry->conn) != CONNECTION_OK ||
>              PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>              entry->changing_xact_state ||
> *            entry->invalidated)*
>          {
>              elog(DEBUG3, "discarding connection %p", entry->conn);
>              disconnect_pg_server(entry);
>          }
> 
> /*
> * Regardless of the event type, we can now mark ourselves as out of the
> * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
> * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
> */
> xact_got_connection = false;
> 
> /* We are done with closing all the invalidated connections so reset. */
> *invalid_connections_exist = false;*
> }
> 
>  > > And we can also extend postgres_fdw_disconnect() something like.
>  > >
>  > > postgres_fdw_disconnect(bool invalid_only) --> default for invalid_only false. disconnects all connections. when
invalid_onlyis set to true then disconnects only invalid connections.
 
>  > > postgres_fdw_disconnect('server_name') --> disconnections connections associated with the specified foreign
server
>  > >
>  > > Having said this, I'm not in favour of invalid_only flag, because if we choose to change the code in
pgfdw_xact_callbackto solve connection leak problem, we may not need this invalid_only flag at all, because at the end
oftxn (even for the txns in which the queries fail with error, pgfdw_xact_callback gets called), all the existing
invalidconnections get disconnected. Thoughts?
 
>  >
>  > +1 not to have invalid_only flag. On the other hand, I think that postgres_fdw_get_connections() should return all
thecached connections including invalidated ones. Otherwise, the number of connections observed via
postgres_fdw_get_connections()may be different from the number of connections actually established, and which would be
confusingto users.
 
>  >
> 
> If postgres_fdw_get_connections() has to return invalidated connections, I have few things mentioned in [1] to be
clarified.Thoughts? Please have a look at the below comment before we decide to show up the invalid entries or not.
 
> 
> [1] - https://www.postgresql.org/message-id/CALj2ACUv%3DArQXs0U9PM3YXKCeSzJ1KxRokDY0g_0aGy--kDScA%40mail.gmail.com
<https://www.postgresql.org/message-id/CALj2ACUv%3DArQXs0U9PM3YXKCeSzJ1KxRokDY0g_0aGy--kDScA%40mail.gmail.com>

I was thinking to display the records having the columns for server name and boolean flag indicating whether it's
invalidatedor not. But I'm not sure if this is the best design for now. Probably we should revisit this after
determininghow to fix the connection-leak issue.
 


> 
>  > BTW, even after fixing the connection-leak issue, postgres_fdw_get_connections() may see invalidated cached
connectionswhen it's called during the transaction.
 
> 
> We will not output if the invalidated entry has no active connection[2], so if we fix the connection leak issue with
theabove discussed fix i.e closing all the invalidated connections at the end of next xact, there are less chances that
wewill output invalidated entries in the postgres_fdw_get_connections() output. Only case we may show up invalidated
connections(whichhave active connections entry->conn) in the postgres_fdw_get_connections() output is as follows:
 
> 
> 1) say we have few cached active connections exists in session 1
> 2) drop the user mapping (in another session) associated with any of the cached connections to make that entry
invalid
> 3) run select * from postgres_fdw_get_connections(); in session 1.  At the start of the xact, the invalidation
messagegets processed and the corresponding entry gets marked as invalid. If we allow invalid connections (that have
entry->conn)to show up in the output, then we show them in the result of the query. At the end of xact, we close these
invalidconnections, in this case, user might think that he still have invalid connections active.
 

What about the case where the transaction started at the above 1) at session 1, and postgres_fdw_get_connections() in
theabove 3) is called within that transaction at session 1? In this case, postgres_fdw_get_connections() can return
eveninvalidated connections?
 


> 
> If the query ran in 3) is not postgres_fdw_get_connections() and something else, then postgres_fdw_get_connections()
willnever get to show invalid connections as they would have closed the connections.
 
> 
> IMO, better not choose the invalid connections to show up in the postgres_fdw_get_connections() output, if we fix the
connectionleak issue with the above discussed fix i.e closing all the invalidated connections at the end of next xact
 
> 
> [2]
> +Datum
> +postgres_fdw_get_connections(PG_FUNCTION_ARGS)
> +{
> + ArrayBuildState *astate = NULL;
> +
> + if (ConnectionHash)
> + {
> + HASH_SEQ_STATUS scan;
> + ConnCacheEntry *entry;
> +
> + hash_seq_init(&scan, ConnectionHash);
> + while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
> + {
> + Form_pg_user_mapping umap;
> + HeapTuple umaptup;
> + Form_pg_foreign_server fsrv;
> + HeapTuple fsrvtup;
> +
> + /* We only look for active and open remote connections. */
> + if (!entry->conn)
> + continue;
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
> 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amul Sul
Дата:
Сообщение: Re: [Patch] ALTER SYSTEM READ ONLY
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Parallel Inserts in CREATE TABLE AS