Обсуждение: Error during hash index scans can cause postgres halt!

Поиск
Список
Период
Сортировка

Error during hash index scans can cause postgres halt!

От
"ykhuang"
Дата:
recurred through deadlock.

client1:
 create table test(a int);
 create index id on test using hash(a);
 insert into test values(1);
 insert into test values(2);
 set enable_seqscan=off;
 begin;
 update test set a=a+1 where a=1;

client2:
 set enable_seqscan=off;
 begin;
 update test set a=a+1 where a=2;

client1:
  update test set a=a+1 where a=2;

client2:
 update test set a=a+1 where a=1;

Re: Error during hash index scans can cause postgres halt!

От
"Heikki Linnakangas"
Дата:
ykhuang wrote:
> recurred through deadlock.
>
> client1:
>  create table test(a int);
>  create index id on test using hash(a);
>  insert into test values(1);
>  insert into test values(2);
>  set enable_seqscan=off;
>  begin;
>  update test set a=a+1 where a=1;
>
> client2:
>  set enable_seqscan=off;
>  begin;
>  update test set a=a+1 where a=2;
>
> client1:
>   update test set a=a+1 where a=2;
>
> client2:
>  update test set a=a+1 where a=1;

I can reproduce this, with --enable-cassert. It crashes when aborting
the transaction, in ReleaseResources_hash. The HashScanList items are
allocated in ExecutorState memory context, but that context has already
been deleted by the time we get to ReleaseResources_hash.

Not sure how to fix this. There's this piece of code in AtAbort_Portals:

>         /*
>          * Any resources belonging to the portal will be released in the
>          * upcoming transaction-wide cleanup; they will be gone before we run
>          * PortalDrop.
>          */
>         portal->resowner = NULL;
>
>         /*
>          * Although we can't delete the portal data structure proper, we can
>          * release any memory in subsidiary contexts, such as executor state.
>          * The cleanup hook was the last thing that might have needed data
>          * there.
>          */
>         MemoryContextDeleteChildren(PortalGetHeapMemory(portal));

MemoryContextDeleteChildren is where we free the HashScanList item we
later try to access. It seems like a simple fix would be to call
ResourceOwnerRelease before that, instead of relying on the
transaction-wide cleanup.

Another idea would be to allocate the HashScanList items in a
longer-lived memory context. The IndexScanDesc struct pointed to by the
HashScanList would still be in ExecutorState context, but that's all
right because we don't need to access it in ReleaseResources_hash.


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Error during hash index scans can cause postgres halt!

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> I can reproduce this, with --enable-cassert. It crashes when aborting
> the transaction, in ReleaseResources_hash. The HashScanList items are
> allocated in ExecutorState memory context, but that context has already
> been deleted by the time we get to ReleaseResources_hash.

Ouch.  So this has been broken (by me, I think :-() since 8.0.  Tells
you something about how many people use hash indexes :-(

> Another idea would be to allocate the HashScanList items in a
> longer-lived memory context. The IndexScanDesc struct pointed to by the
> HashScanList would still be in ExecutorState context, but that's all
> right because we don't need to access it in ReleaseResources_hash.

That seems like a winner to me.  We can rely on the resource owner
mechanism to free up individual HashScanList items, so there's no real
need to keep them in a short-lived context.  I'm inclined to just drop
them into TopMemoryContext.  We could make a hash-specific context but
I'm not convinced it's worth the extra code to do it.

            regards, tom lane

Re: Error during hash index scans can cause postgres halt!

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> I can reproduce this, with --enable-cassert. It crashes when aborting
>> the transaction, in ReleaseResources_hash. The HashScanList items are
>> allocated in ExecutorState memory context, but that context has already
>> been deleted by the time we get to ReleaseResources_hash.
>
> Ouch.  So this has been broken (by me, I think :-() since 8.0.  Tells
> you something about how many people use hash indexes :-(

Yeah. Also, this is very hard to trigger without --enable-cassert (or
just CLOBBER_FREED_MEMORY). It's extremely unlikely that something new
is allocated on the piece of memory that was used by an HashScanList
item, during AbortTransaction processing.

ykhuang, were you running an assertion-enabled build as well?

>> Another idea would be to allocate the HashScanList items in a
>> longer-lived memory context. The IndexScanDesc struct pointed to by the
>> HashScanList would still be in ExecutorState context, but that's all
>> right because we don't need to access it in ReleaseResources_hash.
>
> That seems like a winner to me.  We can rely on the resource owner
> mechanism to free up individual HashScanList items, so there's no real
> need to keep them in a short-lived context.  I'm inclined to just drop
> them into TopMemoryContext.  We could make a hash-specific context but
> I'm not convinced it's worth the extra code to do it.

Want me to hack up a patch, or you going to just commit that yourself?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Error during hash index scans can cause postgres halt!

От
Tom Lane
Дата:
I wrote:
> Ouch.  So this has been broken (by me, I think :-() since 8.0.  Tells
> you something about how many people use hash indexes :-(

Actually it's not that bad --- this particular test case doesn't crash
in 8.0 or 8.1.  I'm guessing we rearranged the order of transaction
abort processing in 8.2 in a way that exposed the problem.  I patched
8.0 and 8.1 anyway, since it seems likely that some other more
complicated case might still show the problem there (eg something with
subtransactions).

            regards, tom lane

Re: Error during hash index scans can cause postgres halt!

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> Ouch.  So this has been broken (by me, I think :-() since 8.0.  Tells
>> you something about how many people use hash indexes :-(

> Yeah. Also, this is very hard to trigger without --enable-cassert (or
> just CLOBBER_FREED_MEMORY). It's extremely unlikely that something new
> is allocated on the piece of memory that was used by an HashScanList
> item, during AbortTransaction processing.

> ykhuang, were you running an assertion-enabled build as well?

The pfree() would touch the context control data, not just the
HashScanList struct itself.  Also on some platforms it's possible that
malloc would've returned the freed context to the system, resulting in
a segfault because we touch unmapped memory.  But I agree that it'd
be far more probable to see the problem with --enable-cassert.

> Want me to hack up a patch, or you going to just commit that yourself?

Already done, I didn't see your message till just now.

            regards, tom lane