Re: Improve the connection failure error messages

Поиск
Список
Период
Сортировка
От Peter Smith
Тема Re: Improve the connection failure error messages
Дата
Msg-id CAHut+Ps_KpQHstxJNR6Hb6G_=cLdoCR5kbysZmvnYk5_vOZWgw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve the connection failure error messages  (Nisha Moond <nisha.moond412@gmail.com>)
Список pgsql-hackers
On Wed, Jan 17, 2024 at 7:15 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> >
> > ~~
> >
> > BTW, while experimenting with the bad connection ALTER I also tried
> > setting 'disable_on_error' like below:
> >
> > ALTER SUBSCRIPTION sub4 SET (disable_on_error);
> > ALTER SUBSCRIPTION sub4 CONNECTION 'port = -1';
> >
> > ...but here the subscription did not become DISABLED as I expected it
> > would do on the next connection error iteration. It remains enabled
> > and just continues to loop relaunch/ERROR indefinitely same as before.
> >
> > That looks like it may be a bug. Thoughts?
> >
> Ideally, if the already running apply worker in
> "LogicalRepApplyLoop()" has any exception/error it will be handled and
> the subscription will be disabled if 'disable_on_error' is set -
>
> start_apply(XLogRecPtr origin_startpos)
> {
> PG_TRY();
> {
> LogicalRepApplyLoop(origin_startpos);
> }
> PG_CATCH();
> {
> if (MySubscription->disableonerr)
> DisableSubscriptionAndExit();
> ...
>
> What is happening in this case is that the control reaches the function -
> run_apply_worker() -> start_apply() -> LogicalRepApplyLoop ->
> maybe_reread_subscription()
> ...
> /*
> * Exit if any parameter that affects the remote connection was changed.
> * The launcher will start a new worker but note that the parallel apply
> * worker won't restart if the streaming option's value is changed from
> * 'parallel' to any other value or the server decides not to stream the
> * in-progress transaction.
> */
> if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
> ...
>
> and it sees a change in the parameter and calls apply_worker_exit().
> This will exit the current process, without throwing an exception to
> the caller and the postmaster will try to restart the apply worker.
> The new apply worker, before reaching the start_apply() [where we
> handle exception], will hit the code to establish the connection to
> the publisher -
>
> ApplyWorkerMain() -> run_apply_worker() -
> ...
> LogRepWorkerWalRcvConn = walrcv_connect(MySubscription->conninfo,
> true /* replication */ ,
> true,
> must_use_password,
> MySubscription->name, &err);
>
> if (LogRepWorkerWalRcvConn == NULL)
>   ereport(ERROR,
>   (errcode(ERRCODE_CONNECTION_FAILURE),
>    errmsg("could not connect to the publisher: %s", err)));
> ...
> and due to the bad connection string in the subscription, it will error out.
> [28680] ERROR:  could not connect to the publisher: invalid port number: "-1"
> [3196] LOG:  background worker "logical replication apply worker" (PID
> 28680) exited with exit code 1
>
> Now, the postmaster keeps trying to restart the apply worker and it
> will keep failing until the connection string is corrected or the
> subscription is disabled manually.
>
> I think this is a bug that needs to be handled in run_apply_worker()
> when disable_on_error is set.
> IMO, this bug-fix discussion deserves a separate thread. Thoughts?

Hi Nisha,

Thanks for your analysis -- it is the same as my understanding.

As suggested, I have created a new thread for any further discussion
related to this 'disable_on_error' topic [1].

======
[1]
https://www.postgresql.org/message-id/flat/CAHut%2BPuEsekA3e7ThwzWr%2BUs4x%3DLzkF7DSrED1UsZTUqNrhCUQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: subscription disable_on_error not working after ALTER SUBSCRIPTION set bad conninfo
Следующее
От: Cédric Villemain
Дата:
Сообщение: linux cachestat in file Readv and Prefetch