Re: [HACKERS] Instability in select_parallel regression test

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] Instability in select_parallel regression test
Дата
Msg-id CAA4eK1JdQ=dsfYpGZCF0a1zgTebSujYgHguAmv2deGGVkirn3w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Instability in select_parallel regression test  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Instability in select_parallel regression test  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Sun, Feb 19, 2017 at 5:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 19, 2017 at 2:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Such a change can be made, but as I pointed out in the part you didn't
>> quote, there are reasons to wonder whether that will be a constructive
>> change in real life even if it's better for the regression tests.
>> Optimizing PostgreSQL for the use case of running regression tests in
>> the buildfarm at the expense of other use cases wouldn't be very
>> smart.  Maybe such a change is better in real-world applications too,
>> but that deserves at least a little bit of thought and substantive
>> discussion.
>
> Rewind.  Wait a minute.  Looking at this code again, it looks like
> we're supposed to ALREADY BE DOING THIS.
>
> DestroyParallelContext() calls WaitForParallelWorkersToExit() which
> calls WaitForBackgroundWorkerShutdown() for each worker.  That
> function returns only when the postmaster dies (which causes an error
> with that specific complaint) or when GetBackgroundWorkerPid() sets
> the status to BGWH_STOPPED. GetBackgroundWorkerPid() only returns
> BGWH_STOPPED when either (a) handle->generation != slot->generation
> (meaning that the slot got reused, and therefore must have been freed)
> or when (b) slot->pid == 0.  The pid only gets set to 0 in
> BackgroundWorkerStateChange() when slot->terminate is set, or in
> ReportBackgroundWorkerPID() when it's called from
> CleanupBackgroundWorker.  So this function should not be returning
> until after all workers have actually exited.
>

Yeah, I have also noticed this point and was thinking of the way to
close this gap.

> However, it looks like there's a race condition here, because the slot
> doesn't get freed up at the same time that the PID gets set to 0.
> That actually happens later, when the postmaster calls
> maybe_start_bgworker() or DetermineSleepTime() and one of those
> functions calls ForgetBackgroundWorker(). We could tighten this up by
> changing CleanupBackgroundWorker() to also call
> ForgetBackgroundWorker() immediately after calling
> ReportBackgroundWorker() if rw->rw_terminate ||
> rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART.  If we do that
> BEFORE sending the notification to the starting process, that closes
> this hole.  Almost.
>

To close the remaining gap, don't you think we can check slot->in_use
flag when generation number for handle and slot are same.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] SCRAM authentication, take three
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys