pg_createsubscriber: Fix incorrect handling of cleanup flags
От | Nisha Moond |
---|---|
Тема | pg_createsubscriber: Fix incorrect handling of cleanup flags |
Дата | |
Msg-id | CABdArM5V9QKK1PkLY9dpgAcZa3kUp84-wPqPovxvdLOri4=69w@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: pg_createsubscriber: Fix incorrect handling of cleanup flags
|
Список | pgsql-hackers |
Hi Hackers, In pg_createsubscriber, the flags 'made_publication' and 'made_replslot' are used to track whether the tool itself created a publication or replication slot on the primary (publisher) node. If a failure happens during the process, these flags help the tool decide whether it should clean up those objects using cleanup_objects_atexit(). However, there are cases where these flags are wrongly set to false due to failures on the subscriber side, which causes the tool to skip cleanup of some objects on the primary even when it should not. Example: for made_publication In drop_publication(), if dropping a publication on the subscriber fails (either a replicated publication or an existing one being removed with --remove=publications), the made_publication flag is wrongly set to false. The process continues without exiting, but if a later step fails, cleanup_objects_atexit() will see made_publication = false and skip dropping the publication on the primary, even though it was created earlier by the tool. This leads to orphaned publication. Example: for made_replslot A similar issue exists for replication slots. In drop_replication_slot(), if dropping a physical replication slot on the primary, or a failover-synced slot on the subscriber, fails — the made_replslot flag is set to false. Again, if the process fails later, cleanup_objects_atexit() will incorrectly skip dropping the logical replication slot created earlier on the primary, leaving it behind. Solution: The fix ensures that failures in dropping subscriber-side or non-internal objects should not reset made_publication or made_replslot. These flags should only be reset if dropping the internally created objects on the primary fails. That way, cleanup_objects_atexit() can still correctly clean up what the tool created if something else goes wrong later in the process. Attached is the patch implementing the above proposed solution. Reviews and feedback are most welcome. -- Thanks, Nisha
Вложения
В списке pgsql-hackers по дате отправления: