Re: Single transaction in the tablesync worker?
От | Amit Kapila |
---|---|
Тема | Re: Single transaction in the tablesync worker? |
Дата | |
Msg-id | CAA4eK1LQ4vQx44FE=mhmqhQa4Kz_CH8TnoTh8Q0q71kY-qhX=g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Single transaction in the tablesync worker? (Peter Smith <smithpb2250@gmail.com>) |
Ответы |
RE: Single transaction in the tablesync worker?
Re: Single transaction in the tablesync worker? |
Список | pgsql-hackers |
On Fri, Feb 5, 2021 at 7:09 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > ... > > > Thanks. I have fixed one of the issues reported by me earlier [1] > > wherein the tablesync worker can repeatedly fail if after dropping the > > slot there is an error while updating the SYNCDONE state in the > > database. I have moved the drop of the slot just before commit of the > > transaction where we are marking the state as SYNCDONE. Additionally, > > I have removed unnecessary includes in tablesync.c, updated the docs > > for Alter Subscription, and updated the comments at various places in > > the patch. I have also updated the commit message this time. > > > > Below are my feedback comments for V17 (nothing functional) > > ~~ > > 1. > V27 Commit message: > For the initial table data synchronization in logical replication, we use > a single transaction to copy the entire table and then synchronizes the > position in the stream with the main apply worker. > > Typo: > "synchronizes" -> "synchronize" > Fixed and added a note about Alter Sub .. Refresh .. command can't be executed in the transaction block. > ~~ > > 2. > @@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable > class="parameter">name</replaceable> RENAME TO < > (Currently, all subscription owners must be superusers, so the owner checks > will be bypassed in practice. But this might change in the future.) > </para> > + > + <para> > + When refreshing a publication we remove the relations that are no longer > + part of the publication and we also remove the tablesync slots if there are > + any. It is necessary to remove tablesync slots so that the resources > + allocated for the subscription on the remote host are released. If due to > + network breakdown or some other error, we are not able to remove the slots, > + we give WARNING and the user needs to manually remove such slots later as > + otherwise, they will continue to reserve WAL and might eventually cause > + the disk to fill up. See also <xref > linkend="logical-replication-subscription-slot"/>. > + </para> > > I think the content is good, but the 1st-person wording seemed strange. > e.g. > "we are not able to remove the slots, we give WARNING and the user needs..." > Maybe it should be like: > "... PostgreSQL is unable to remove the slots, so a WARNING is > reported. The user needs... " > Changed as per suggestion with a minor tweak. > ~~ > > 3. > @@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub, > bool copy_data) > ... > + * XXX If there is a network break down while dropping the > > "network break down" -> "network breakdown" > > ~~ > > 4. > @@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) > (errmsg("could not connect to the publisher: %s", err))); > ... > + * XXX We could also instead try to drop the slot, last time we failed > + * but for that, we might need to clean up the copy state as it might > + * be in the middle of fetching the rows. Also, if there is a network > + * break down then it wouldn't have succeeded so trying it next time > + * seems like a better bet. > > "network break down" -> "network breakdown" > Changed as per suggestion. > ~~ > > 5. > @@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int > cacheid, uint32 hashvalue) > ... > + > + /* > + * Cleanup the tablesync slot. > + * > + * This has to be done after updating the state because otherwise if > + * there is an error while doing the database operations we won't be > + * able to rollback dropped slot. > + */ > + ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid, > + MyLogicalRepWorker->relid, > + syncslotname); > + > + ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */); > + > > Should this comment also describe why the missing_ok is false for this case? > Yeah that makes sense, so added a comment. Additionally, I have changed the errorcode in RemoveSubscriptionRel, moved the setup of origin before copy_table in LogicalRepSyncTableStart to avoid doing copy again due to an error in setting up origin. I have made a few comment changes as well. -- With Regards, Amit Kapila.
Вложения
В списке pgsql-hackers по дате отправления: