Re: BUG #17949: Adding an index introduces serialisation anomalies.

Поиск
Список
Период
Сортировка
От Dmitry Dolgov
Тема Re: BUG #17949: Adding an index introduces serialisation anomalies.
Дата
Msg-id 20230624135910.zufuollwp57otawo@ddolgov.remote.csb
обсуждение исходный текст
Ответ на Re: BUG #17949: Adding an index introduces serialisation anomalies.  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: BUG #17949: Adding an index introduces serialisation anomalies.  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
> On Fri, Jun 23, 2023 at 04:05:42PM +0200, Dmitry Dolgov wrote:
> > I spent some more time trying to grok this today.  FTR it reproduces
> > faster without the extra tuple that repro I posted inserts after
> > TRUNCATE (the point of that was to find out whether it was an
> > empty-to-non-empty transition).  I still don't know what's wrong but I
> > am beginning to suspect the "fast" code.  It seems as though, under
> > high concurrency, we sometimes don't scan a recently inserted
> > (invisible to our snapshot, but needed for SSI checks) tuple, but I
> > don't yet know why.
>
> Yep, it's definitely something in the "fast" path. Testing the same, but
> with an index having (fastupdate=off) works just fine for me.

I've managed to resolve it, or at least reduce the chances for the issue to
appear, via semi-randomly adding more CheckForSerializableConflictIn /
PredicateLock around the new sublist that has to be created in
ginHeapTupleFastInsert. I haven't seen the reproducer failing with this
changeset after running it multiple times for a couple of minutes, where on the
main branch, with the two fixes from Thomas included, it was failing within a
couple of seconds.

    diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
    @@ -198,6 +199,7 @@ makeSublist(Relation index, IndexTuple *tuples, int32 ntuples,
            /*
             * Write last page
             */
    +       CheckForSerializableConflictIn(index, NULL, BufferGetBlockNumber(curBuffer));
            res->tail = BufferGetBlockNumber(curBuffer);
            res->tailFreeSize = writeListPage(index, curBuffer,

    @@ -273,6 +275,11 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
                            separateList = true;
                            LockBuffer(metabuffer, GIN_UNLOCK);
                    }
    +               else
    +               {
    +                       CheckForSerializableConflictIn(index, NULL, metadata->head);
    +               }
            }

    diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
    @@ -140,7 +140,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
             * Predicate lock entry leaf page, following pages will be locked by
             * moveRightIfItNeeded()
             */
    -       PredicateLockPage(btree->index, stack->buffer, snapshot);
    +       PredicateLockPage(btree->index, BufferGetBlockNumber(stack->buffer), snapshot);

            for (;;)
            {
    @@ -1925,6 +1926,8 @@ gingetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
            /*
             * Set up the scan keys, and check for unsatisfiable query.
             */
    +       PredicateLockRelation(scan->indexRelation, scan->xs_snapshot);
            ginFreeScanKeys(so);            /* there should be no keys yet, but just to be

Now the last PredicateLockRelation does look rather weird. But without it, or
with this locking happening in scanPendingInsert (instead of locking the meta
page), or without other changes in ginfast.c, the reproducer is still failing.
This of course makes it not a proper solution by any mean, but hopefully it
will help to understand the problem a bit better.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Server closed the connection unexpectedly (memory leak)