Обсуждение: condition variable cleanup and subtransactions

Поиск
Список
Период
Сортировка

condition variable cleanup and subtransactions

От
Andres Freund
Дата:
Hi,

On 2017-12-21 02:42:25 -0800, Andres Freund wrote:
> Trying to debug this I found another issue. I'd placed a sleep(10) in
> ExecParallelHashCloseBatchAccessors() and then ctrl-c'ed the server for
> some reason. Segfault time:
> 
> #0  0x000055bfbac42539 in tas (lock=0x7fcd82ae14ac <error: Cannot access memory at address 0x7fcd82ae14ac>) at
/home/andres/src/postgresql/src/include/storage/s_lock.h:228
> #1  0x000055bfbac42b4d in ConditionVariableCancelSleep () at
/home/andres/src/postgresql/src/backend/storage/lmgr/condition_variable.c:173
> #2  0x000055bfba8e24ae in AbortTransaction () at /home/andres/src/postgresql/src/backend/access/transam/xact.c:2478
> #3  0x000055bfba8e4a2a in AbortOutOfAnyTransaction () at
/home/andres/src/postgresql/src/backend/access/transam/xact.c:4387

> So, afaics no workers had yet attached, the leader accepted the cancel
> interrupt, the dsm segments were destroyed, and as part of cleanup
> cv_sleep_target was supposed to be reset, which fails, because it's
> memory has since been freed.   Looking at how that can happen.

Oh. This seems to be a condition variable bug independent of PHJ. The
problem is that the DSM segment etc all get cleaned up in
*subtransaction* abort.

Afaict it's a bug that AbortTransaction() does
ConditionVariableCancelSleep() but AbortSubTransaction() does not,
despite the latter releasing dsm segments via
ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS).

Adding that seems to fix the crash.

This seems like something we need to backpatch.

Greetings,

Andres Freund


Re: condition variable cleanup and subtransactions

От
Thomas Munro
Дата:
On Fri, Dec 22, 2017 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote:
> Oh. This seems to be a condition variable bug independent of PHJ. The
> problem is that the DSM segment etc all get cleaned up in
> *subtransaction* abort.
>
> Afaict it's a bug that AbortTransaction() does
> ConditionVariableCancelSleep() but AbortSubTransaction() does not,
> despite the latter releasing dsm segments via
> ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS).
>
> Adding that seems to fix the crash.
>
> This seems like something we need to backpatch.

Agreed.  That affects any user of condition variables inside DSM
segments, including the released Parallel Index Scan and Parallel
Bitmap Heap Scan code.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: condition variable cleanup and subtransactions

От
Thomas Munro
Дата:
On Fri, Dec 22, 2017 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote:
> Oh. This seems to be a condition variable bug independent of PHJ. The
> problem is that the DSM segment etc all get cleaned up in
> *subtransaction* abort.
>
> Afaict it's a bug that AbortTransaction() does
> ConditionVariableCancelSleep() but AbortSubTransaction() does not,
> despite the latter releasing dsm segments via
> ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS).
>
> Adding that seems to fix the crash.
>
> This seems like something we need to backpatch.

Agreed.  That affects any user of condition variables inside DSM
segments, including the released Parallel Index Scan and Parallel
Bitmap Heap Scan code.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: condition variable cleanup and subtransactions

От
Robert Haas
Дата:
On Thu, Dec 21, 2017 at 6:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Dec 22, 2017 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote:
>> Oh. This seems to be a condition variable bug independent of PHJ. The
>> problem is that the DSM segment etc all get cleaned up in
>> *subtransaction* abort.
>>
>> Afaict it's a bug that AbortTransaction() does
>> ConditionVariableCancelSleep() but AbortSubTransaction() does not,
>> despite the latter releasing dsm segments via
>> ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS).
>>
>> Adding that seems to fix the crash.
>>
>> This seems like something we need to backpatch.
>
> Agreed.  That affects any user of condition variables inside DSM
> segments, including the released Parallel Index Scan and Parallel
> Bitmap Heap Scan code.

Fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: condition variable cleanup and subtransactions

От
Robert Haas
Дата:
On Thu, Dec 21, 2017 at 6:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Dec 22, 2017 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote:
>> Oh. This seems to be a condition variable bug independent of PHJ. The
>> problem is that the DSM segment etc all get cleaned up in
>> *subtransaction* abort.
>>
>> Afaict it's a bug that AbortTransaction() does
>> ConditionVariableCancelSleep() but AbortSubTransaction() does not,
>> despite the latter releasing dsm segments via
>> ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS).
>>
>> Adding that seems to fix the crash.
>>
>> This seems like something we need to backpatch.
>
> Agreed.  That affects any user of condition variables inside DSM
> segments, including the released Parallel Index Scan and Parallel
> Bitmap Heap Scan code.

Fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company