Re: Single transaction in the tablesync worker?
От | Amit Kapila |
---|---|
Тема | Re: Single transaction in the tablesync worker? |
Дата | |
Msg-id | CAA4eK1JUzeDFZgLLe8O2tTeTGNoUhCfeD3Yv+YLxMDT5L-C8zg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Single transaction in the tablesync worker? (Peter Smith <smithpb2250@gmail.com>) |
Ответы |
Re: Single transaction in the tablesync worker?
(Peter Smith <smithpb2250@gmail.com>)
|
Список | pgsql-hackers |
On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Few other comments: > > ================== > > Thanks for your feedback. > > > 1. > > * FIXME 3 - Crashed tablesync workers may also have remaining slots > > because I don't think > > + * such workers are even iterated by this loop, and nobody else is > > removing them. > > + */ > > + if (slotname) > > + { > > > > The above FIXME is not clear to me. Actually, the crashed workers > > should restart, finish their work, and drop the slots. So not sure > > what exactly this FIXME refers to? > > Yes, normally if the tablesync can complete it should behave like that. > But I think there are other scenarios where it may be unable to > clean-up after itself. For example: > > i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert > handled by tablesync can give a PK violation which also will crash > again and again for each re-launched/replacement tablesync worker. > This can be reproduced in the debugger. If the DropSubscription > doesn't clean-up the tablesync's slot then nobody will. > > ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to > ensure that the launcher doesn't restart new worker during dropping > the subscription". > Yeah, I have also read that comment but do you know how it is preventing relaunch? How does the subscription lock help? > So executing DROP SUBSCRIPTION will prevent a newly > crashed tablesync from re-launching, so it won’t be able to take care > of its own slot. If the DropSubscription doesn't clean-up that > tablesync's slot then nobody will. > > > > > 2. > > DropSubscription() > > { > > .. > > ReplicationSlotDropAtPubNode( > > + NULL, > > + conninfo, /* use conninfo to make a new connection. */ > > + subname, > > + syncslotname); > > .. > > } > > > > With the above call, it will form a connection with the publisher and > > drop the required slots. I think we need to save the connection info > > so that we don't need to connect/disconnect for each slot to be > > dropped. Later in this function, we again connect and drop the apply > > worker slot. I think we should connect just once drop the apply and > > table sync slots if any. > > OK. IIUC this is a suggestion for more efficient connection usage, > rather than actual bug right? > Yes, it is for effective connection usage. > I have added this suggestion to my TODO > list. > > > > > 3. > > ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn_given, char > > *conninfo, char *subname, char *slotname) > > { > > .. > > + PG_TRY(); > > .. > > + PG_CATCH(); > > + { > > + /* NOP. Just gobble any ERROR. */ > > + } > > + PG_END_TRY(); > > > > Why are we suppressing the error instead of handling it the error in > > the same way as we do while dropping the apply worker slot in > > DropSubscription? > > This function is common - it is also called from the tablesync > finish_sync_worker. But in the finish_sync_worker case I wanted to > avoid throwing an ERROR which would cause the tablesync to crash and > relaunch (and crash/relaunch/repeat...) when all it was trying to do > in the first place was just cleanup and exit the process. Perhaps the > error suppression should be conditional depending where this function > is called from? > Yeah, that could be one way and if you follow my previous suggestion this function might change a bit more. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления:
Следующее
От: Amit KapilaДата:
Сообщение: Re: [Patch] Optimize dropping of relation buffers using dlist