RE: Forget close an open relation in ReorderBufferProcessTXN()

Поиск
Список
Период
Сортировка
От osumi.takamichi@fujitsu.com
Тема RE: Forget close an open relation in ReorderBufferProcessTXN()
Дата
Msg-id OSBPR01MB48886572CB48DC7AE7EDCE21ED2C9@OSBPR01MB4888.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Forget close an open relation in ReorderBufferProcessTXN()  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Forget close an open relation in ReorderBufferProcessTXN()  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Monday, May 17, 2021 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, May 14, 2021 at 2:20 PM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> >
> > On Thursday, May 13, 2021 7:21 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> > > I don't think we can reproduce it with core plugins as they don't
> > > lock user catalog tables.
> > OK. My current understanding about how the deadlock happens is below.
> >
> > 1. TRUNCATE command is performed on user_catalog_table.
> > 2. TRUNCATE command locks the table and index with ACCESS
> EXCLUSIVE LOCK.
> > 3. TRUNCATE waits for the subscriber's synchronization
> >         when synchronous_standby_names is set.
> > 4. Here, the walsender stops, *if* it tries to acquire a lock on the
> user_catalog_table
> >         because the table where it wants to see is locked by the
> TRUNCATE already.
> >
> > If this is right,
> >
> 
> Yeah, the above steps are correct, so if we take a lock on user_catalog_table
> when walsender is processing the WAL, it would lead to a problem.
> 
> > we need to go back to a little bit higher level discussion, since
> > whether we should hack any plugin to simulate the deadlock caused by
> > user_catalog_table reference with locking depends on the assumption if
> the plugin takes a lock on the user_catalog_table or not.
> > In other words, if the plugin does read only access to that type of
> > table with no lock (by RelationIdGetRelation for example ?), the
> > deadlock concern disappears and we don't need to add anything to plugin
> sides, IIUC.
> >
> 
> True, if the plugin doesn't acquire any lock on user_catalog_table, then it is
> fine but we don't prohibit plugins to acquire locks on user_catalog_tables.
> This is similar to system catalogs, the plugins and decoding code do acquire
> lock on those.
Thanks for sharing this. I'll take the idea
that plugin can take a lock on user_catalog_table into account.


> > Here, we haven't gotten any response about whether output plugin takes
> > (should take) the lock on the user_catalog_table. But, I would like to
> > make a consensus about this point before the implementation.
> >
> > By the way, Amit-san already mentioned the main reason of this is that
> > we allow decoding of TRUNCATE operation for user_catalog_table in
> synchronous mode.
> > The choices are provided by Amit-san already in the past email in [1].
> > (1) disallow decoding of TRUNCATE operation for user_catalog_tables
> > (2) disallow decoding of any operation for user_catalog_tables like
> > system catalog tables
> >
> > Yet, I'm not sure if either option solves the deadlock concern completely.
> > If application takes an ACCESS EXCLUSIVE lock by LOCK command (not
> by
> > TRUNCATE !) on the user_catalog_table in a transaction, and if the
> > plugin tries to take a lock on it, I think the deadlock happens. Of
> > course, having a consensus that the plugin takes no lock at all would
> remove this concern, though.
> >
> 
> This is true for system catalogs as well. See the similar report [1]
> 
> > Like this, I'd like to discuss those two items in question together at first.
> > * the plugin should take a lock on user_catalog_table or not
> > * the range of decoding related to user_catalog_table
> >
> > To me, taking no lock on the user_catalog_table from plugin is fine
> >
> 
> We allow taking locks on system catalogs, so why prohibit
> user_catalog_tables? However, I agree that if we want plugins to acquire the
> lock on user_catalog_tables then we should either prohibit decoding of such
> relations or do something else to avoid deadlock hazards.
OK.

Although we have not concluded the range of logical decoding of user_catalog_table
(like we should exclude TRUNCATE command only or all operations on that type of table),
I'm worried that disallowing the logical decoding of user_catalog_table produces
the deadlock still. It's because disabling it by itself does not affect the
lock taken by TRUNCATE command. What I have in mind is an example below.

(1) plugin (e.g. pgoutput) is designed to take a lock on user_catalog_table.
(2) logical replication is set up in synchronous mode.
(3) TRUNCATE command takes an access exclusive lock on the user_catalog_table.
(4) This time, we don't do anything for the TRUNCATE decoding.
(5) the plugin tries to take a lock on the truncated table
    but, it can't due to the lock by TRUNCATE command.

I was not sure that the place where the plugin takes the lock is in truncate_cb
or somewhere else not directly related to decoding of the user_catalog_table itself,
so I might be wrong. However, in this case,
the solution would be not disabling the decoding of user_catalog_table
but prohibiting TRUNCATE command on user_catalog_table in synchronous_mode.
If this is true, I need to extend an output plugin and simulate the deadlock first
and remove it by fixing the TRUNCATE side. Thoughts ?


Best Regards,
    Takamichi Osumi


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Testing autovacuum wraparound (including failsafe)
Следующее
От: Sandeep Thakkar
Дата:
Сообщение: Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.