Orphaned records in pg_replication_origin_status after subscription drop
| От | Michael Paquier |
|---|---|
| Тема | Orphaned records in pg_replication_origin_status after subscription drop |
| Дата | |
| Msg-id | aUTekQTg4OYnw-Co@paquier.xyz обсуждение исходный текст |
| Ответы |
Re: Orphaned records in pg_replication_origin_status after subscription drop
|
| Список | pgsql-hackers |
Hi all, (Added Amit K. in CC.) Some colleagues have reported that it is possible to finish with orphaned records in pg_replication_origin_status, as an effect of table synchronization workers that miss some cleanup actions around replorigin_drop_by_name() when a subscription that spawned these workers (which themselves create origins through replorigin_create() called in LogicalRepSyncTableStart()) is dropped. Please find attached a small script, courtesy of Tenglong Gu and Daisuke Higuchi, that have been digging into all that. This problem is true since ce0fdbfe9722 that has added the creation of a replication origin for table synchronization workers, as of 14 up to HEAD. One issue with these origin states is that they are sticky: a restart of the node that held the subscription keeps them around due to them getting flushed in replorigin_checkpoint. If my understanding is right, this also means that origin slots are still tracked as in use, implying that replorigin_session_setup() would not be able to reuse them as far as I understand. So, in summary (please correct me if necessary), DropSubscription() attempts a round of cleanup for all the replication origins with a loop over ReplicationOriginNameForLogicalRep(), and while in this case we attempt to drop the origins created by the workers that are tracked as orphaned, the syscache lookup of REPLORIGIDENT fails within the DROP SUBSCRIPTION because replorigin_by_name() cannot find an entry: the replication origin is already gone, but its state has persisted in memory. Then, why would the replication origin be already gone with its state in memory not cleaned up yet? Well, the problematic test case shows that the subscription is dropped while the spawned tablesync workers do the initial table copy, and the replication origin is created in the same transaction as the COPY. DROP SUBSCRIPTION tells the tablesync workers to stop, they stop with the COPY failing and the origin is not seen as something that exists for the session that drops the subscription. The replication state in memory goes out of sync. So it looks like to me that we are missing a code path where the replication origin is dropped but we just ignore to reset the replication state in memory, leading to bloat of the origin slots available. worker.c does things differently: a small transaction is used for the origin creation, hence we would never really see the replication state memory going out-of-sync with the replorigin catalog. One solution would be for tablesync.c to do the same as worker.c: we could use a short transaction that sets the memory state and creates the replication origin, then move to a second transaction for the COPY. A second solution, slightly more complicated, is to create some specific logic to reset the progress state of the origin that's been created in the transaction that runs the initial COPY, which I guess should be in the shape of a transaction abort callback. All these symptoms are pointing out that it is not a good idea, IMO, to expect a replication origin to exist when dropping a subscription when an origin has been created in the context of a transaction that can take a very long time to run, aka the initial tablesync's COPY, so using a short transaction feels like a good thing to do here. I have to admit that this code has become much more complicated over the last couple of years, hence I may not have the full context of how these cleanups actions should be achieved. They cleary should happen though, but I am not completely sure which solution would be better over the other. There may be an entirely different solution, as well. Note that it would not be complicated to create a regression test based on an injection point: waiting during the COPY of a tablesync worker would be enough while issuing a DROP SUBSCRIPTION. So, thoughts or comments? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: