RE: Newly created replication slot may be invalidated by checkpoint
От | Hayato Kuroda (Fujitsu) |
---|---|
Тема | RE: Newly created replication slot may be invalidated by checkpoint |
Дата | |
Msg-id | OSCPR01MB14966C191ECEC11D051148D6DF5E0A@OSCPR01MB14966.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | RE: Newly created replication slot may be invalidated by checkpoint ("Vitaly Davydov" <v.davydov@postgrespro.ru>) |
Список | pgsql-hackers |
Dear Vitaly, > > Per my understanding, this happened because there is a lag that restart_lsn of > > the slot is set, and it is protected by the system. Your idea is to ensure the > > restart_lsn is protected by the system before obtaining on-memory LSN, right? > > Not sure what you mean by on-memory LSN, but, the issue happens because we > have > a lag between restart_lsn assignment and update of > XLogCtl->replicationSlotsMinLSN which is used to protect the WAL. Sorry I should say "before obtaining replicationSlotMinLSN". > Yes, I > propose > to ensure that the protection happens when we assign restart_lsn. It seems to be > wrong that we invalidate slots by its restart_lsn but protect the wal for > slots using XLogCtl->replicationSlotsMinLSN. Seems valid. There is another corner case that another restart_lsn can be set in-between, but they have larger LSN than RedoRecPtr, right? > Below I tried to write some summary and propose the patch which fixes the > problem. Sorry but it is too long to understand properly for me :-(. > There is one subtle thing. Once, the operation of restart_lsn assignment is not > an atomic, the following scenario may happen theoretically: > 1. Read GetRedoRecPtr() in the backend (ReplicationSlotReserveWal) > 2. Assign a new redo LSN in the checkpointer > 3. Call ReplicationSlotsComputeRequiredLSN() in the checkpointer > 3. Assign the old redo LSN to restart_lsn > > In this scenario, the restart_lsn will point to a previous redo LSN and it will > be not protected by the new redo LSN. This scenario is unlikely, but it can > happen theoretically. I have no ideas how to deal with it, except of assigning > restart_lsn under XLogCtl->info_lck lock to avoid concurrent modification of > XLogCtl->RecoRecPtr until it is assigned to restart_lsn of a creating slot. Oh, your point is there is another race condition, right? Do you have the reproducer for it? Best regards, Hayato Kuroda FUJITSU LIMITED
В списке pgsql-hackers по дате отправления: