Re: Sync Rep and shutdown Re: Sync Rep v19

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Sync Rep and shutdown Re: Sync Rep v19
Дата	17 марта 2011 г. 13:34:03
Msg-id	AANLkTi=+0mNL289VGDdT4dKCYWd572toA3-6KyrptXVx@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Sync Rep and shutdown Re: Sync Rep v19 (Fujii Masao <masao.fujii@gmail.com>)
Ответы	Re: Sync Rep and shutdown Re: Sync Rep v19
Список	pgsql-hackers

Дерево обсуждения

On Thu, Mar 17, 2011 at 2:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> This occurs to me; we should ensure that, in shutdown case, walwriter
> should exit after all the backends have gone out? I'm not sure if it's worth
> thinking of the case, but what if synchronous_standby_names is unset
> and config file is reloaded after smart shutdown is requested? In this
> case, the reload cannot wake up the waiting backends since walwriter
> has already exited. This behavior looks a bit inconsistent.

I agree we need to fix smart shutdown.  I think that's going to have
to be a separate patch, however; it's got more problems than this
patch can fix without expanding into a monster.

> In the patch, in order to read the latest value, you take a light-weight lock.
> But I wonder why taking a lock can ensure that the value is up-to-date.

A lock acquisition acts as a memory sequence point.

> + * WAL writer calls this as needed to update the shared sync_standbys_needed
>
> Typo: s/sync_standbys_needed/sync_standbys_defined

Fixed.

> + * we exit normally, or SYNC_REP_MUST_DISCONNECT in abnormal cases.
>
> Typo: the reference to SYNC_REP_MUST_DISCONNECT is not required.

Fixed.

> +                * So in this case we issue a NOTICE (which some clients may
>
> Typo: s/NOTICE/WARNING

Fixed.

> +               if (ProcDiePending)
> +               {
> +                       ereport(WARNING,
> +                                       (errcode(ERRCODE_ADMIN_SHUTDOWN),
> +                                        errmsg("canceling the wait for replication and terminating
> connection due to administrator command"),
> +                                        errdetail("The transaction has already been committed locally
> but might have not been replicated to the standby.")));
> +                       whereToSendOutput = DestNone;
> +                       LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
> +                       MyProc->syncRepState = SYNC_REP_WAIT_COMPLETE;
> +                       SHMQueueDelete(&(MyProc->syncRepLinks));
>
> SHMQueueIsDetached() should be checked before calling SHMQueueDelete().
> Walsender can already delete the backend from the queue before reaching here.

Fixed.  But come to think of it, doesn't this mean
SyncRepCleanupAtProcExit() needs to repeat the test after acquiring
the lock?

> +               if (QueryCancelPending)
> +               {
> +                       QueryCancelPending = false;
> +                       ereport(WARNING,
> +                                       (errmsg("canceling wait for synchronous replication due to user request"),
> +                                        errdetail("The transaction has committed locally, but may not
> have replicated to the standby.")));
> +                       LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
> +                       MyProc->syncRepState = SYNC_REP_WAIT_COMPLETE;
> +                       SHMQueueDelete(&(MyProc->syncRepLinks));
> +                       LWLockRelease(SyncRepLock);
>
> Same as above.

Fixed.

> +               if (!PostmasterIsAlive(true))
> +               {
> +                       whereToSendOutput = DestNone;
> +                       proc_exit(1);
>
> proc_exit() should not be called at that point because it leads PANIC.

Fixed, although I'm not sure it matters.

> I think that it's better to check ProcDiePending, QueryCancelPending
> and PostmasterIsAlive *before* waiting on the latch, not after. Because
> those events can occur before reaching there, and it's not worth waiting
> for 60 seconds to detect them.

Not necessary.  Inspired by one of your earlier patches, I made die()
and StatementCancelHandler() set the latch.  We could still wait up to
60 s to detect postmaster death, but that's a very rare situation and
it's not worth slowing down the common case for it, especially since
there is a race condition no matter what.

Thanks for the review!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

sync-rep-wait-fixes-v2.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Cédric Villemain
Дата: 17 марта 2011 г., 13:26:56
Сообщение: Re: really lazy vacuums?

Следующее

От: Robert Haas
Дата: 17 марта 2011 г., 13:40:52
Сообщение: Re: Sync Rep and shutdown Re: Sync Rep v19

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Sync Rep and shutdown Re: Sync Rep v19

Вложения

Предыдущее

Следующее