Re: Race condition in InvalidateObsoleteReplicationSlots()
От | Álvaro Herrera |
---|---|
Тема | Re: Race condition in InvalidateObsoleteReplicationSlots() |
Дата | |
Msg-id | 202106102202.mjw4huiix7lo@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: Race condition in InvalidateObsoleteReplicationSlots() (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Race condition in InvalidateObsoleteReplicationSlots()
|
Список | pgsql-hackers |
Here's a version that I feel is committable (0001). There was an issue when returning from the inner loop, if in a previous iteration we had released the lock. In that case we need to return with the lock not held, so that the caller can acquire it again, but weren't. This seems pretty hard to hit in practice (I suppose somebody needs to destroy the slot just as checkpointer killed the walsender, but before checkpointer marks it as its own) ... but if it did happen, I think checkpointer would block with no recourse. Also added some comments and slightly restructured the code. There are plenty of conflicts in pg13, but it's all easy to handle. I wrote a test (0002) to cover the case of signalling a walsender, which is currently not covered (we only deal with the case of a standby that's not running). There are some sharp edges in this code -- I had to make it use background_psql() to send a CHECKPOINT, which hangs, because I previously send a SIGSTOP to the walreceiver. Maybe there's a better way to achieve a walreceiver that remains connected but doesn't consume input from the primary, but I don't know what it is. Anyway, the code becomes covered with this. I would like to at least see it in master, to gather some reactions from buildfarm. -- Álvaro Herrera Valdivia, Chile <Schwern> It does it in a really, really complicated way <crab> why does it need to be complicated? <Schwern> Because it's MakeMaker.
Вложения
В списке pgsql-hackers по дате отправления: