Re: logical decoding and replication of sequences, take 2

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: logical decoding and replication of sequences, take 2
Дата
Msg-id d8251e3b-49d6-2f78-ce04-cdfee7dc4f1b@enterprisedb.com
обсуждение исходный текст
Ответ на Re: logical decoding and replication of sequences, take 2  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
On 7/5/23 16:51, Ashutosh Bapat wrote:
> 0005, 0006 and 0007 are all related to the initial sequence sync. [3]
> resulted in 0007 and I think we need it. That leaves 0005 and 0006 to
> be reviewed in this response.
> 
> I followed the discussion starting [1] till [2]. The second one
> mentions the interlock mechanism which has been implemented in 0005
> and 0006. While I don't have an objection to allowing LOCKing a
> sequence using the LOCK command, I am not sure whether it will
> actually work or is even needed.
> 
> The problem described in [1] seems to be the same as the problem
> described in [2]. In both cases we see the sequence moving backwards
> during CATCHUP. At the end of catchup the sequence is in the right
> state in both the cases. [2] actually deems this behaviour OK. I also
> agree that the behaviour is ok. I am confused whether we have solved
> anything using interlocking and it's really needed.
> 
> I see that the idea of using an LSN to decide whether or not to apply
> a change to sequence started in [4]. In [5] Tomas proposed to use page
> LSN. Looking at [6], it actually seems like a good idea. In [7] Tomas
> agreed that LSN won't be sufficient. But I don't understand why. There
> are three LSNs in the picture - restart LSN of sync slot,
> confirmed_flush LSN of sync slot and page LSN of the sequence page
> from where we read the initial state of the sequence. I think they can
> be used with the following rules:
> 1. The publisher will not send any changes with LSN less than
> confirmed_flush so we are good there.
> 2. Any non-transactional changes that happened between confirmed_flush
> and page LSN should be discarded while syncing. They are already
> visible to SELECT.
> 3. Any transactional changes with commit LSN between confirmed_flush
> and page LSN should be discarded while syncing. They are already
> visible to SELECT.
> 4. A DDL acquires a lock on sequence. Thus no other change to that
> sequence can have an LSN between the LSN of the change made by DDL and
> the commit LSN of that transaction. Only DDL changes to sequence are
> transactional. Hence any transactional changes with commit LSN beyond
> page LSN would not have been seen by the SELECT otherwise SELECT would
> see the page LSN committed by that transaction. so they need to be
> applied while syncing.
> 5. Any non-transactional changes beyond page LSN should be applied.
> They are not seen by SELECT.
> 
> Am I missing something?
> 

Hmmm, I think you're onto something and the interlock may not be
actually necessary ...

IIRC there were two examples of the non-MVCC sequence behavior, leading
me to add the interlock.


1) going "backwards" during catchup

Sequences are not MVCC, and if there are increments between the slot
creation and the SELECT, the sequence will go backwards. But it will
ultimately end with the correct value. The LSN checks were an attempt to
prevent this.

I don't recall why I concluded this would not be sufficient (there's no
link for [7] in your message), but maybe it was related to the sequence
increments not being WAL-logged and thus not guaranteed to update the
page LSN, or something like that.

But if we agree we only guarantee consistency at the end of the catchup,
this does not matter - it's OK to go backwards as long as the sequence
ends with the correct value.


2) missing an increment because of ALTER SEQUENCE

My concern here was that we might have a transaction that does ALTER
SEQUENCE before the tablesync slot gets created, and the SELECT still
sees the old sequence state because we start decoding after the ALTER.

But now that I think about it again, this probably can't happen, because
the slot won't be created until the ALTER commits. So we shouldn't miss
anything.

I suspect I got confused by some other bug in the patch at that time,
leading me to a faulty conclusion.


I'll try removing the interlock, and make sure it actually works OK.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: psql: Add role's membership options to the \du+ command
Следующее
От: Pavel Luzanov
Дата:
Сообщение: Re: psql: Add role's membership options to the \du+ command