Re: strange parallel query behavior after OOM crashes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: strange parallel query behavior after OOM crashes
Дата
Msg-id 24a143e0-c9bb-8f4f-1472-9d6faae4c92e@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: strange parallel query behavior after OOM crashes  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
Ответы Re: strange parallel query behavior after OOM crashes
Список pgsql-hackers

On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> On 04/04/2017 06:52 PM, Robert Haas wrote:
>>>
>>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmhaas@gmail.com>
>>>> wrote:
>>>>>
>>>>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>>>>> <kuntalghosh.2007@gmail.com> wrote:
>>>>>>
>>>>>> 2. the server restarts automatically, initialize
>>>>>> BackgroundWorkerData->parallel_register_count and
>>>>>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>>>>>> After that, it calls ForgetBackgroundWorker and it increments
>>>>>> parallel_terminate_count.
>>>>>
>>>>>
>>>>> Hmm.  So this seems like the root of the problem.  Presumably those
>>>>> things need to be reset AFTER forgetting any background workers from
>>>>> before the crash.
>>>>>
>>>> IMHO, the fix would be not to increase the terminated parallel worker
>>>> count whenever ForgetBackgroundWorker is called due to a bgworker
>>>> crash. I've attached a patch for the same. PFA.
>>>
>>>
>>> While I'm not opposed to that approach, I don't think this is a good
>>> way to implement it.  If you want to pass an explicit flag to
>>> ForgetBackgroundWorker telling it whether or not it should performing
>>> the increment, fine.  But with what you've got here, you're
>>> essentially relying on "spooky action at a distance".  It would be
>>> easy for future code changes to break this, not realizing that
>>> somebody's got a hard dependency on 0 having a specific meaning.
>>>
>>
>> I'm probably missing something, but I don't quite understand how these
>> values actually survive the crash. I mean, what I observed is OOM followed
>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
>> some reason?
> AFAICU, during crash recovery, we wait for all non-syslogger children
> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
> StartupDataBase. While starting the startup process we check if any
> bgworker is scheduled for a restart. If a bgworker is crashed and not
> meant for a restart(parallel worker in our case), we call
> ForgetBackgroundWorker() to discard it.
> 

OK, so essentially we reset the counters, getting
    parallel_register_count = 0    parallel_terminate_count = 0

and then ForgetBackgroundWworker() runs and increments the 
terminate_count, breaking the logic?

And you're using (parallel_register_count > 0) to detect whether we're 
still in the init phase or not. Hmmm.

>>
>> In any case, the comment right before BackgroundWorkerArray says this:
>>
>>   * These counters can of course overflow, but it's not important here
>>   * since the subtraction will still give the right number.
>>
>> which means that this assert
>>
>> +       Assert(BackgroundWorkerData->parallel_register_count >=
>> +               BackgroundWorkerData->parallel_terminate_count);
>>
>> is outright broken, just like any other attempts to rely on simple
>> comparisons of these two counters, no?
>>
> IIUC, the assert statements ensures that register count should always
> be greater than or equal to the terminate count.
> Whereas, the comment refers to the fact that register count and
> terminate count indicate the total number of parallel workers spawned
> and terminated respectively since the server has been started. At
> every session, we're not resetting the counts, hence they can
> overflow. But, their substraction should give you the number of active
> parallel worker at any instance.
> So, I'm not able to see how the assert is broken w.r.t the aforesaid
> comment. Am I missing something here?
> 

The comment says that the counters are allowed to overflow, i.e. after a 
long uptime you might get these values
    parallel_register_count = UINT_MAX + 1 = 1    parallel_terminate_count = UINT_MAX

which is fine, because the C handles the overflow during subtraction and so
    parallel_register_count - parallel_terminate_count = 1

But the assert is not doing subtraction, it's comparing the values 
directly:
    Assert(parallel_register_count >= parallel_terminate_count);

and the (perfectly valid) values trivially violate this comparison.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Khandekar
Дата:
Сообщение: Re: Parallel Append implementation
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: multivariate statistics (v25)