Обсуждение: Resetting crash time of background worker

Поиск
Список
Период
Сортировка

Resetting crash time of background worker

От
Amit Khandekar
Дата:
When the postmaster recovers from a backend or worker crash, it resets bg worker's crash time (rw->rw_crashed_at) so that the bgworker will immediately restart (ResetBackgroundWorkerCrashTimes).

But resetting rw->rw_crashed_at to 0 means that we have lost the information that the bgworker had actuallly crashed. So later when postmaster tries to find any workers that should start (maybe_start_bgworker), it treats this worker as a new worker, as against treating it as one that had crashed and is to be restarted. So for this bgworker, it does not consider BGW_NEVER_RESTART :

if (rw->rw_crashed_at != 0) { if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART) { ForgetBackgroundWorker(&iter); continue; } .... ....
That means, it will not remove the worker, and it will be restarted. Now if the worker again crashes, postmaster would keep on repeating the crash and restart cycle for the whole system.

From what I understand, BGW_NEVER_RESTART applies even to a crashed server. But let me know if I am missing anything.

I think we either have to retain the knowledge that the worker has crashed using some new field, or else, we should reset the crash time only if it is not flagged BGW_NEVER_RESTART.


-Amit Khandekar

Re: Resetting crash time of background worker

От
Robert Haas
Дата:
On Tue, Mar 17, 2015 at 1:33 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> When the postmaster recovers from a backend or worker crash, it resets bg
> worker's crash time (rw->rw_crashed_at) so that the bgworker will
> immediately restart (ResetBackgroundWorkerCrashTimes).
>
> But resetting rw->rw_crashed_at to 0 means that we have lost the information
> that the bgworker had actuallly crashed. So later when postmaster tries to
> find any workers that should start (maybe_start_bgworker), it treats this
> worker as a new worker, as against treating it as one that had crashed and
> is to be restarted. So for this bgworker, it does not consider
> BGW_NEVER_RESTART :
>
> if (rw->rw_crashed_at != 0) { if (rw->rw_worker.bgw_restart_time ==
> BGW_NEVER_RESTART) { ForgetBackgroundWorker(&iter); continue; } .... ....
> That means, it will not remove the worker, and it will be restarted. Now if
> the worker again crashes, postmaster would keep on repeating the crash and
> restart cycle for the whole system.
>
> From what I understand, BGW_NEVER_RESTART applies even to a crashed server.
> But let me know if I am missing anything.
>
> I think we either have to retain the knowledge that the worker has crashed
> using some new field, or else, we should reset the crash time only if it is
> not flagged BGW_NEVER_RESTART.

I think you're right, and I think we should do the second of those.
Thanks for tracking this down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Resetting crash time of background worker

От
Amit Khandekar
Дата:


On 17 March 2015 at 19:12, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 17, 2015 at 1:33 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> I think we either have to retain the knowledge that the worker has crashed
> using some new field, or else, we should reset the crash time only if it is
> not flagged BGW_NEVER_RESTART.

I think you're right, and I think we should do the second of those.
Thanks for tracking this down.

Thanks. Attached a patch accordingly. Put this into the June 2015 commitfest. 
Вложения

Re: Resetting crash time of background worker

От
Robert Haas
Дата:
On Sun, Mar 22, 2015 at 10:55 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 17 March 2015 at 19:12, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 17, 2015 at 1:33 AM, Amit Khandekar <amitdkhan.pg@gmail.com>
>> wrote:
>> > I think we either have to retain the knowledge that the worker has
>> > crashed
>> > using some new field, or else, we should reset the crash time only if it
>> > is
>> > not flagged BGW_NEVER_RESTART.
>>
>> I think you're right, and I think we should do the second of those.
>> Thanks for tracking this down.
>
> Thanks. Attached a patch accordingly. Put this into the June 2015
> commitfest.

Committed and back-patched to 9.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company