Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c
Дата
Msg-id CAEepm=2RN2AEksa2v=5cPQhO1awM-J5=CMK4GBKKvDb=vN9wPg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Sun, Jul 23, 2017 at 8:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Meanwhile, it's still pretty unclear what happened yesterday on
> culicidae.

That failure is indeed baffling.  The only code that inserts
(HASH_ENTER[_NULL]) into PredicateLockTargetHash:

1.  CreatePredicateLock().  I would be a bug if that ever tried to
insert a { 0, 0, 0, 0 } tag, and in any case it holds
SerializablePredicateLockListLock in LW_SHARED.

2.  TransferPredicateLocksToNewTarget(), which removes and restores
the scratch entry and also explicitly inserts a transferred entry.  It
asserts that it holds SerializablePredicateLockListLock and is called
only by PredicateLockPageSplit() which acquires it in LW_EXCLUSIVE.

3.  DropAllPredicateLocksFromTable(), which removes and restores the
scratch entry and also explicitly inserts a transferred entry.
Acquires SerializablePredicateLockListLock in LW_EXCLUSIVE.

I wondered if DropAllPredicateLocksFromTable() had itself inserted a
tag that accidentally looks like the scratch tag in between removing
and restoring, perhaps because the relation passed in had a bogus 0 DB
OID etc, but it constructs a tag with
SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag, dbId, heapId) which
sets locktag_field3 to InvalidBlockNumber == -1, not 0 so that can't
explain it.

I wondered if a concurrent PredicateLockPageSplit() called
TransferPredicateLocksToNewTarget() using a newtargettag built from a
Relation that somehow had a bogus relation with DB OID 0, rel OID 0
and newblkno 0, but that doesn't help because
SerializablePredicateLockListLock is acquired at LW_EXCLUSIVE so it
can't run concurrently.

It looks a bit like something at a lower level needs to be broken (GCC
6.3 released 6 months ago, maybe interacts badly with some clever
memory model-dependent code of ours?) or something needs to be
trashing memory.

Here's the set of tests that ran concurrently with select_into, whose
backtrace we see ("DROP SCHEMA selinto_schema CASCADE;"):

parallel group (20 tests):  select_distinct_on delete select_having
random btree_index select_distinct namespace update case hash_index
select_implicit subselect select_into arrays prepared_xacts
transactions portals aggregates join union

Of those I see that prepared_xacts, portals and transactions
explicitly use SERIALIZABLE (which may or may not be important).  I
wonder if the thing to do here is to run selinto (or maybe just its
setup and tear-down, "DROP SCHEMA ...") concurrently with those others
in tight loops and burn some CPU.

-- 
Thomas Munro
http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location