Re: strange parallel query behavior after OOM crashes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: strange parallel query behavior after OOM crashes
Дата
Msg-id eaf07984-078f-1918-1304-237934b9fe20@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: strange parallel query behavior after OOM crashes  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
Список pgsql-hackers

On 04/05/2017 01:09 PM, Kuntal Ghosh wrote:
> On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>>>
>>>> The comment says that the counters are allowed to overflow, i.e. after a
>>>> long uptime you might get these values
>>>>
>>>>       parallel_register_count = UINT_MAX + 1 = 1
>>>>       parallel_terminate_count = UINT_MAX
>>>>
>>>> which is fine, because the C handles the overflow during subtraction and
>>>> so
>>>>
>>>>       parallel_register_count - parallel_terminate_count = 1
>>>>
>>>> But the assert is not doing subtraction, it's comparing the values
>>>> directly:
>>>>
>>>>       Assert(parallel_register_count >= parallel_terminate_count);
>>>>
>>>> and the (perfectly valid) values trivially violate this comparison.
>>>>
>>> Thanks for the explanation. So, we can't use the above assert
>>> statement. Even the following assert statement will not be helpful:
>>> Assert(parallel_register_count - parallel_terminate_count >= 0);
>>> Because, it'll fail to track the scenario when parallel_register_count
>>> is not overflowed, still less than parallel_terminate_count. :(
>>>
>>
>> Actually, that assert would work, because C does handle overflows on uint
>> values during subtraction just fine. That is,
>>
>>      (UINT_MAX+x) - UINT_MAX = x
>>
>> Also, the comment about overflows before BackgroundWorkerArray claims this
>> is the case.
>>
> Agreed on the overflowed case. But, my concern is when an overflow has
> not yet happened:
> 
> Suppose,
> uint parallel_register_count = 1; /* Didn't overflow* /
> uint parallel_terminate_count = 2; /* Didn't overflow */
> 
> Assert(parallel_register_count - parallel_terminate_count >= 0);
> We want the assert statement to fail here, but I think it won't since
> -1 has a valid representation in unsigned int and it is greater than
> 0, no?

I don't follow. How exactly do you get into this situation, assuming you 
actually enforce the (register_count > terminate_count) invariant 
consistently? In the modulo arithmetic of course.

But now that I'm thinking about it, the subtraction actually happens in 
unsigned ints, so the result will be unsigned int too, i.e. always >=0. 
Whether it makes sense depends on the invariant being true.

But I think this would work:

Assert(parallel_register_count - parallel_terminate_count <= 
max_parallel_workers);

If the difference gets 'negative' and wraps around, it'll be close to 
UINT_MAX.

But man, this unsigned int arithmetic makes my brain hurt ...

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: increasing the default WAL segment size
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Functions Immutable but not parallel safe?