Обсуждение: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

Поиск
Список
Период
Сортировка

Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Andres Freund
Дата:
Hi,

While looking at https://www.postgresql.org/message-id/20190430070552.jzqgcy4ihalx7nur%40alap3.anarazel.de
I noticed that

/*
 * ReindexIndex
 *        Recreate a specific index.
 */
void
ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
{
    Oid            indOid;
    Oid            heapOid = InvalidOid;
    Relation    irel;
    char        persistence;

    /*
     * Find and lock index, and check permissions on table; use callback to
     * obtain lock on table first, to avoid deadlock hazard.  The lock level
     * used here must match the index lock obtained in reindex_index().
     */
    indOid = RangeVarGetRelidExtended(indexRelation,
                                      concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
                                      0,
                                      RangeVarCallbackForReindexIndex,
                                      (void *) &heapOid);

doesn't pass concurrent-ness to RangeVarCallbackForReindexIndex(). Which
then goes on to lock the table

static void
RangeVarCallbackForReindexIndex(const RangeVar *relation,
                                Oid relId, Oid oldRelId, void *arg)

        if (OidIsValid(*heapOid))
            LockRelationOid(*heapOid, ShareLock);

without knowing that it should use ShareUpdateExclusive. Which
e.g. ReindexTable knows:

    /* The lock level used here should match reindex_relation(). */
    heapOid = RangeVarGetRelidExtended(relation,
                                       concurrent ? ShareUpdateExclusiveLock : ShareLock,
                                       0,
                                       RangeVarCallbackOwnsTable, NULL);

so there's a lock upgrade hazard.

Creating a table
CREATE TABLE blarg(id serial primary key);
and then using pgbench to reindex it:
REINDEX INDEX CONCURRENTLY blarg_pkey;

indeed proves that there's a problem:

2019-04-30 08:12:58.679 PDT [30844][7/925] ERROR:  40P01: deadlock detected
2019-04-30 08:12:58.679 PDT [30844][7/925] DETAIL:  Process 30844 waits for ShareUpdateExclusiveLock on relation 50661
ofdatabase 13408; blocked by process 30848.
 
    Process 30848 waits for ShareUpdateExclusiveLock on relation 50667 of database 13408; blocked by process 30844.
    Process 30844: REINDEX INDEX CONCURRENTLY blarg_pkey;
    Process 30848: REINDEX INDEX CONCURRENTLY blarg_pkey;
2019-04-30 08:12:58.679 PDT [30844][7/925] HINT:  See server log for query details.
2019-04-30 08:12:58.679 PDT [30844][7/925] LOCATION:  DeadLockReport, deadlock.c:1140
2019-04-30 08:12:58.679 PDT [30844][7/925] STATEMENT:  REINDEX INDEX CONCURRENTLY blarg_pkey;

I assume the fix woudl be to pass a struct {LOCKMODE lockmode; Oid
heapOid;} to RangeVarCallbackForReindexIndex().

Greetings,

Andres Freund



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Peter Eisentraut
Дата:
On 2019-04-30 17:17, Andres Freund wrote:
>     indOid = RangeVarGetRelidExtended(indexRelation,
>                                       concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
>                                       0,
>                                       RangeVarCallbackForReindexIndex,
>                                       (void *) &heapOid);
> 
> doesn't pass concurrent-ness to RangeVarCallbackForReindexIndex(). Which
> then goes on to lock the table
> 
> static void
> RangeVarCallbackForReindexIndex(const RangeVar *relation,
>                                 Oid relId, Oid oldRelId, void *arg)
> 
>         if (OidIsValid(*heapOid))
>             LockRelationOid(*heapOid, ShareLock);
> 
> without knowing that it should use ShareUpdateExclusive. Which
> e.g. ReindexTable knows:
> 
>     /* The lock level used here should match reindex_relation(). */
>     heapOid = RangeVarGetRelidExtended(relation,
>                                        concurrent ? ShareUpdateExclusiveLock : ShareLock,
>                                        0,
>                                        RangeVarCallbackOwnsTable, NULL);
> 
> so there's a lock upgrade hazard.

Confirmed.

What seems weird to me is that the existing callback argument heapOid
isn't used at all.  It seems to have been like that since the original
commit of the callback infrastructure.  Therefore also, this code

    if (relId != oldRelId && OidIsValid(oldRelId))
    {
        /* lock level here should match reindex_index() heap lock */
        UnlockRelationOid(*heapOid, ShareLock);

in RangeVarCallbackForReindexIndex() can't ever do anything useful.

Patch to remove the unused code attached; but needs some checking for
this dubious conditional block.

Thoughts?


-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Andres Freund
Дата:
On 2019-05-02 10:44:44 +0200, Peter Eisentraut wrote:
> On 2019-04-30 17:17, Andres Freund wrote:
> >     indOid = RangeVarGetRelidExtended(indexRelation,
> >                                       concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
> >                                       0,
> >                                       RangeVarCallbackForReindexIndex,
> >                                       (void *) &heapOid);
> > 
> > doesn't pass concurrent-ness to RangeVarCallbackForReindexIndex(). Which
> > then goes on to lock the table
> > 
> > static void
> > RangeVarCallbackForReindexIndex(const RangeVar *relation,
> >                                 Oid relId, Oid oldRelId, void *arg)
> > 
> >         if (OidIsValid(*heapOid))
> >             LockRelationOid(*heapOid, ShareLock);
> > 
> > without knowing that it should use ShareUpdateExclusive. Which
> > e.g. ReindexTable knows:
> > 
> >     /* The lock level used here should match reindex_relation(). */
> >     heapOid = RangeVarGetRelidExtended(relation,
> >                                        concurrent ? ShareUpdateExclusiveLock : ShareLock,
> >                                        0,
> >                                        RangeVarCallbackOwnsTable, NULL);
> > 
> > so there's a lock upgrade hazard.
> 
> Confirmed.
> 
> What seems weird to me is that the existing callback argument heapOid
> isn't used at all.  It seems to have been like that since the original
> commit of the callback infrastructure.  Therefore also, this code

Hm? But that's a different callback from the one used from
reindex_index()?  reindex_relation() uses the
RangeVarCallbackOwnsTable() callback and passes in NULL as the argument,
whereas reindex_index() passses in RangeVarCallbackForReindexIndex() and
passes in &heapOid?

And RangeVarCallbackForReindexIndex() pretty clearly sets it *heapOid:

         * Lock level here should match reindex_index() heap lock. If the OID
         * isn't valid, it means the index as concurrently dropped, which is
         * not a problem for us; just return normally.
         */
        *heapOid = IndexGetRelation(relId, true);


> Patch to remove the unused code attached; but needs some checking for
> this dubious conditional block.
> 
> Thoughts?

I might miss something here, and it's actually unused. But if so the fix
would be to make it being used, because it's actually
important. Otherwise ReindexIndex() becomes racy or has even more
deadlock hazards.

Greetings,

Andres Freund



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Peter Eisentraut
Дата:
On 2019-05-02 16:33, Andres Freund wrote:
> And RangeVarCallbackForReindexIndex() pretty clearly sets it *heapOid:
> 
>          * Lock level here should match reindex_index() heap lock. If the OID
>          * isn't valid, it means the index as concurrently dropped, which is
>          * not a problem for us; just return normally.
>          */
>         *heapOid = IndexGetRelation(relId, true);

It sets it but uses it only internally.  There is no code path that
passes in a non-zero heapOid, and there is no code path that does
anything with the heapOid passed back out.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Andres Freund
Дата:
Hi,

On 2019-05-02 22:39:15 +0200, Peter Eisentraut wrote:
> On 2019-05-02 16:33, Andres Freund wrote:
> > And RangeVarCallbackForReindexIndex() pretty clearly sets it *heapOid:
> > 
> >          * Lock level here should match reindex_index() heap lock. If the OID
> >          * isn't valid, it means the index as concurrently dropped, which is
> >          * not a problem for us; just return normally.
> >          */
> >         *heapOid = IndexGetRelation(relId, true);
> 
> It sets it but uses it only internally.  There is no code path that
> passes in a non-zero heapOid, and there is no code path that does
> anything with the heapOid passed back out.

RangeVarGetRelidExtended() can call the callback multiple times, if
there are any concurrent schema changes. That's why it's unlocking the
previously locked heap oid.

Greetings,

Andres Freund



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Peter Eisentraut
Дата:
On 2019-05-02 22:42, Andres Freund wrote:
> RangeVarGetRelidExtended() can call the callback multiple times, if
> there are any concurrent schema changes. That's why it's unlocking the
> previously locked heap oid.

Ah that explains it then.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Peter Eisentraut
Дата:
On 2019-05-02 10:44, Peter Eisentraut wrote:
>> so there's a lock upgrade hazard.
> Confirmed.

Here is a patch along the lines of your sketch.  I cleaned up the
variable naming a bit too.

REINDEX CONCURRENTLY is still deadlock prone because of
WaitForOlderSnapshots(), so this doesn't actually fix your test case,
but that seems unrelated to this particular issue.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Andres Freund
Дата:
Hi,

On 2019-05-03 09:37:07 +0200, Peter Eisentraut wrote:
> REINDEX CONCURRENTLY is still deadlock prone because of
> WaitForOlderSnapshots(), so this doesn't actually fix your test case,
> but that seems unrelated to this particular issue.

Right.

I've not tested the change, but it looks reasonable to me. The change
of moving the logic the reset of *heapOid to the unlock perhaps is
debatable, but I think it's OK.

Greetings,

Andres Freund



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Michael Paquier
Дата:
On Fri, May 03, 2019 at 08:23:21AM -0700, Andres Freund wrote:
> I've not tested the change, but it looks reasonable to me. The change
> of moving the logic the reset of *heapOid to the unlock perhaps is
> debatable, but I think it's OK.

I have not checked the patch in details yet, but it strikes me that
we should have an isolation test case which does the following:
- Take a lock on the table created, without committing yet the
transaction where the lock is taken.
- Run two REINDEX CONCURRENTLY in two other sessions.
- Commit the first transaction.
The result should be no deadlocks happening in the two sessions
running the reindex.  I can see the deadlock easily with three psql
sessions, running manually the queries.
--
Michael

Вложения

Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Michael Paquier
Дата:
On Sat, May 04, 2019 at 09:59:20PM +0900, Michael Paquier wrote:
> The result should be no deadlocks happening in the two sessions
> running the reindex.  I can see the deadlock easily with three psql
> sessions, running manually the queries.

+        * If the OID isn't valid, it means the index as concurrently dropped,
+        * which is not a problem for us; just return normally.
Typo here s/as/is/.

I have looked closer at the patch and the change proposed looks good
to me.

Now, what do we do about the potential deadlock issues in
WaitForOlderSnapshots?  The attached is an isolation test able to
reproduce the deadlock within WaitForOlderSnapshots() with two
parallel REINDEX CONCURRENTLY.  I'd like to think that the best way to
do that would be to track in vacuumFlags the backends running a
REINDEX and just exclude them from GetCurrentVirtualXIDs() because
we don't actually care about missing index entries in this case like
VACUUM.  But it looks also to me that is issue is broader and goes
down to utility commands which can take a lock on a table which cannot
be run in transaction blocks, hence code paths used by CREATE INDEX
CONCURRENTLY and DROP INDEX CONCURRENTLY could also cause a similar
deadlock, no?
--
Michael

Вложения

Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Michael Paquier
Дата:
On Tue, May 07, 2019 at 12:07:56PM +0900, Michael Paquier wrote:
> Now, what do we do about the potential deadlock issues in
> WaitForOlderSnapshots?  The attached is an isolation test able to
> reproduce the deadlock within WaitForOlderSnapshots() with two
> parallel REINDEX CONCURRENTLY.  I'd like to think that the best way to
> do that would be to track in vacuumFlags the backends running a
> REINDEX and just exclude them from GetCurrentVirtualXIDs() because
> we don't actually care about missing index entries in this case like
> VACUUM.  But it looks also to me that is issue is broader and goes
> down to utility commands which can take a lock on a table which cannot
> be run in transaction blocks, hence code paths used by CREATE INDEX
> CONCURRENTLY and DROP INDEX CONCURRENTLY could also cause a similar
> deadlock, no?

More to the point, one can just do that without REINDEX:
- session 1:
create table aa (a int);
begin;
lock aa in row exclusive mode;
- session 2:
create index concurrently aai on aa(a); --blocks
- session 3:
create index concurrently aai2 on aa(a); --blocks
- session 1:
commit;

Then session 2 deadlocks while session 3 finishes correctly.  I don't
know if this is a class of problems we'd want to address for v12, but
if we do then CIC (and DROP INDEX CONCURRENTLY?) could benefit from
it.
--
Michael

Вложения

Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Andres Freund
Дата:
Hi,

On 2019-05-07 12:25:43 +0900, Michael Paquier wrote:
> On Tue, May 07, 2019 at 12:07:56PM +0900, Michael Paquier wrote:
> > Now, what do we do about the potential deadlock issues in
> > WaitForOlderSnapshots?  The attached is an isolation test able to
> > reproduce the deadlock within WaitForOlderSnapshots() with two
> > parallel REINDEX CONCURRENTLY.  I'd like to think that the best way to
> > do that would be to track in vacuumFlags the backends running a
> > REINDEX and just exclude them from GetCurrentVirtualXIDs() because
> > we don't actually care about missing index entries in this case like
> > VACUUM.  But it looks also to me that is issue is broader and goes
> > down to utility commands which can take a lock on a table which cannot
> > be run in transaction blocks, hence code paths used by CREATE INDEX
> > CONCURRENTLY and DROP INDEX CONCURRENTLY could also cause a similar
> > deadlock, no?
> 
> More to the point, one can just do that without REINDEX:
> - session 1:
> create table aa (a int);
> begin;
> lock aa in row exclusive mode;
> - session 2:
> create index concurrently aai on aa(a); --blocks
> - session 3:
> create index concurrently aai2 on aa(a); --blocks
> - session 1:
> commit;
> 
> Then session 2 deadlocks while session 3 finishes correctly.  I don't
> know if this is a class of problems we'd want to address for v12, but
> if we do then CIC (and DROP INDEX CONCURRENTLY?) could benefit from
> it.

This seems like a pre-existing issue to me. We probably should improve
that, but I don't think it has to be tied to 12.

Greetings,

Andres Freund



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-07 12:25:43 +0900, Michael Paquier wrote:
>> Then session 2 deadlocks while session 3 finishes correctly.  I don't
>> know if this is a class of problems we'd want to address for v12, but
>> if we do then CIC (and DROP INDEX CONCURRENTLY?) could benefit from
>> it.

> This seems like a pre-existing issue to me. We probably should improve
> that, but I don't think it has to be tied to 12.

Yeah.  CREATE INDEX CONCURRENTLY has always had a deadlock hazard,
so it's hardly surprising that REINDEX CONCURRENTLY does too.
I don't think that fixing that is in-scope for v12, even if we had
an idea how to do it, which we don't.

We do need to fix the wrong-lock-level problem of course, but
that seems straightforward.

            regards, tom lane



Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Michael Paquier
Дата:
On Tue, May 07, 2019 at 06:45:36PM -0400, Tom Lane wrote:
> Yeah.  CREATE INDEX CONCURRENTLY has always had a deadlock hazard,
> so it's hardly surprising that REINDEX CONCURRENTLY does too.
> I don't think that fixing that is in-scope for v12, even if we had
> an idea how to do it, which we don't.

The most straight-forward approach I can think of would be to
determine if non-transactional commands taking a lock on a table can
be safely skipped or not when checking for older snapshots than the
minimum where the index is marked as valid.  That's quite complex to
target v12, so I agree to keep it out of the stability work.

> We do need to fix the wrong-lock-level problem of course, but
> that seems straightforward.

Sure.
--
Michael

Вложения

Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?

От
Peter Eisentraut
Дата:
On 2019-05-07 05:07, Michael Paquier wrote:
> On Sat, May 04, 2019 at 09:59:20PM +0900, Michael Paquier wrote:
>> The result should be no deadlocks happening in the two sessions
>> running the reindex.  I can see the deadlock easily with three psql
>> sessions, running manually the queries.
> 
> +        * If the OID isn't valid, it means the index as concurrently dropped,
> +        * which is not a problem for us; just return normally.
> Typo here s/as/is/.
> 
> I have looked closer at the patch and the change proposed looks good
> to me.

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services