Обсуждение: REINDEX INDEX results in a crash for an index of pg_class since 9.6

Поиск
Список
Период
Сортировка

REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Michael Paquier
Дата:
Hi all,

Fujii-san has sent me a report offline about REINDEX.  And since 9.6,
trying to REINDEX directly an index of pg_class fails lamentably on an
assertion failure (mbsync failure if bypassing the assert):
#2  0x000055a9c5bfcc2c in ExceptionalCondition
 (conditionName=0x55a9c5ca9750
 "!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))",
     errorType=0x55a9c5ca969f "FailedAssertion",
 fileName=0x55a9c5ca9680 "indexam.c", lineNumber=204) at assert.c:54
#3  0x000055a9c5686dcd in index_insert (indexRelation=0x7f58402fe5d8,
 values=0x7fff450c3270, isnull=0x7fff450c3250,
 heap_t_ctid=0x55a9c7e2c05c,
     heapRelation=0x7f584031eb68, checkUnique=UNIQUE_CHECK_YES,
 indexInfo=0x55a9c7e30520) at indexam.c:204
#4  0x000055a9c5714a12 in CatalogIndexInsert
 (indstate=0x55a9c7e30408, heapTuple=0x55a9c7e2c058) at indexing.c:140
#5  0x000055a9c5714b1d in CatalogTupleUpdate (heapRel=0x7f584031eb68,
 otid=0x55a9c7e2c05c, tup=0x55a9c7e2c058) at indexing.c:215
#6  0x000055a9c5beda8a in RelationSetNewRelfilenode
 (relation=0x7f58402fe5d8, persistence=112 'p') at relcache.c:3531

Doing a REINDEX TABLE directly on pg_class proves to work correctly,
and CONCURRENTLY is not supported for catalog tables.

Bisecting my way through it, the first commit causing the breakage is
that:
commit: 01e386a325549b7755739f31308de4be8eea110d
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Wed, 23 Dec 2015 20:09:01 -0500
Avoid VACUUM FULL altogether in initdb.

Commit ed7b3b3811c5836a purported to remove initdb's use of VACUUM
FULL,
as had been agreed to in a pghackers discussion back in Dec 2014.
But it missed this one ...

The reason why this does not work is that CatalogIndexInsert() tries
to do an index_insert directly on the index worked on.  And the reason
why this works at table level is that we have tweaks in
reindex_relation() to enforce the list of valid indexes in the
relation cache with RelationSetIndexList().  It seems to me that the
logic in reindex_index() is wrong from the start, and that all the
index list handling done in reindex_relation() should just be in
reindex_index() so as REINDEX INDEX gets the right call.

Thoughts?
--
Michael

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Michael Paquier
Дата:
On Thu, Apr 18, 2019 at 10:14:30AM +0900, Michael Paquier wrote:
> Doing a REINDEX TABLE directly on pg_class proves to work correctly,
> and CONCURRENTLY is not supported for catalog tables.
>
> Bisecting my way through it, the first commit causing the breakage is
> that:
> commit: 01e386a325549b7755739f31308de4be8eea110d
> author: Tom Lane <tgl@sss.pgh.pa.us>
> date: Wed, 23 Dec 2015 20:09:01 -0500
> Avoid VACUUM FULL altogether in initdb.

This brings down to a first, simple, solution which is to issue a
VACUUM FULL on pg_class at the end of make_template0() in initdb.c to
avoid any subsequent problems if trying to issue a REINDEX on anything
related to pg_class, and it won't fix any existing deployments:
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2042,6 +2042,11 @@ make_template0(FILE *cmdfd)
          * Finally vacuum to clean up dead rows in pg_database
          */
         "VACUUM pg_database;\n\n",
+
+        /*
+         * And rebuild pg_class.
+         */
+        "VACUUM FULL pg_class;\n\n",
         NULL
     };
Now...

> The reason why this does not work is that CatalogIndexInsert() tries
> to do an index_insert directly on the index worked on.  And the reason
> why this works at table level is that we have tweaks in
> reindex_relation() to enforce the list of valid indexes in the
> relation cache with RelationSetIndexList().  It seems to me that the
> logic in reindex_index() is wrong from the start, and that all the
> index list handling done in reindex_relation() should just be in
> reindex_index() so as REINDEX INDEX gets the right call.

I got to wonder if this dance with the relation cache is actually
necessary, because we could directly tell CatalogIndexInsert() to not
insert a tuple into an index which is bring rebuilt, and the index
rebuild would cause an entry to be added to pg_class anyway thanks to
RelationSetNewRelfilenode().  This can obviously only happen for
pg_class indexes.

Any thoughts about both approaches?
--
Michael

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Fujii-san has sent me a report offline about REINDEX.  And since 9.6,
> trying to REINDEX directly an index of pg_class fails lamentably on an
> assertion failure (mbsync failure if bypassing the assert):

So ... I can't reproduce this on HEAD.  Nor the back branches.

regression=# \d pg_class
...
Indexes:
    "pg_class_oid_index" UNIQUE, btree (oid)
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

regression=# reindex index pg_class_relname_nsp_index;
REINDEX
regression=# reindex index pg_class_oid_index;
REINDEX
regression=# reindex index pg_class_tblspc_relfilenode_index;
REINDEX
regression=# reindex table pg_class;         
REINDEX
regression=# reindex index pg_class_tblspc_relfilenode_index;
REINDEX

Is there some precondition you're not mentioning?

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Alvaro Herrera
Дата:
On 2019-Apr-18, Michael Paquier wrote:

> Fujii-san has sent me a report offline about REINDEX.  And since 9.6,
> trying to REINDEX directly an index of pg_class fails lamentably on an
> assertion failure (mbsync failure if bypassing the assert):

Hmm, yeah, I ran into this crash too, more than a year ago, but I don't
recall what came out of the investigation, and my search-fu is failing
me.  I'll have a look at my previous laptop's drive ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Michael Paquier
Дата:
On Tue, Apr 23, 2019 at 04:47:19PM -0400, Tom Lane wrote:
> regression=# reindex index pg_class_relname_nsp_index;
> REINDEX
> regression=# reindex index pg_class_oid_index;
> REINDEX
> regression=# reindex index pg_class_tblspc_relfilenode_index;
> REINDEX
> regression=# reindex table pg_class;
> REINDEX
> regression=# reindex index pg_class_tblspc_relfilenode_index;
> REINDEX
>
> Is there some precondition you're not mentioning?

Hm.  In my own init scripts, I create a new database just after
starting the instance.  That seems to help in reproducing the
failure, because each time I create a new database, connect to it and
reindex then I can see the crash.  If I do a reindex of pg_class
first, I don't see a crash of some rebuilds already happened, but if I
do directly a reindex of one of the indexes first, then the failure is
plain.  If I also add some regression tests, say in create_index.sql
to stress a reindex of pg_class and its indexes, the crash also shows
up.  If I apply my previous patch to make CatalogIndexInsert() not do
an insert on a catalog index being rebuilt, then things turn to be
fine.
--
Michael

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Apr 23, 2019 at 04:47:19PM -0400, Tom Lane wrote:
>> Is there some precondition you're not mentioning?

> Hm.  In my own init scripts, I create a new database just after
> starting the instance.

Ah, there we go:

regression=# create database d1;
CREATE DATABASE
regression=# \c d1
You are now connected to database "d1" as user "postgres".
d1=# reindex index pg_class_relname_nsp_index;
psql: server closed the connection unexpectedly

log shows

TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))", File: "indexam.c", Line: 204)

#2  0x00000000008c74ed in ExceptionalCondition (
    conditionName=<value optimized out>, errorType=<value optimized out>,
    fileName=<value optimized out>, lineNumber=<value optimized out>)
    at assert.c:54
#3  0x00000000004e4f8c in index_insert (indexRelation=0x7f80f849a5d8,
    values=0x7ffc4f65b030, isnull=0x7ffc4f65b130, heap_t_ctid=0x2842c0c,
    heapRelation=0x7f80f84bab68, checkUnique=UNIQUE_CHECK_YES,
    indexInfo=0x2843230) at indexam.c:204
#4  0x000000000054c290 in CatalogIndexInsert (indstate=<value optimized out>,
    heapTuple=0x2842c08) at indexing.c:140
#5  0x000000000054c472 in CatalogTupleUpdate (heapRel=0x7f80f84bab68,
    otid=0x2842c0c, tup=0x2842c08) at indexing.c:215
#6  0x00000000008bca77 in RelationSetNewRelfilenode (relation=0x7f80f849a5d8,
    persistence=112 'p') at relcache.c:3531
#7  0x0000000000548b3a in reindex_index (indexId=2663,
    skip_constraint_checks=false, persistence=112 'p', options=0)
    at index.c:3339
#8  0x00000000005ed099 in ReindexIndex (indexRelation=<value optimized out>,
    options=0, concurrent=false) at indexcmds.c:2304
#9  0x00000000007b5925 in standard_ProcessUtility (pstmt=0x281fd70,

> If I apply my previous patch to make CatalogIndexInsert() not do
> an insert on a catalog index being rebuilt, then things turn to be
> fine.

That seems like pretty much of a hack :-(, in particular I'm not
convinced that we'd not end up with a missing index entry afterwards.
Maybe it's the only way, but I think first we need to trace down
exactly why this broke.  I remember we had some special-case code
for reindexing pg_class, maybe something broke that?

It also seems quite odd that it doesn't fail every time; surely it's
not conditional whether we'll try to insert a new pg_class tuple or not?
We need to understand that, too.  Maybe the old code never really
worked in all cases?  It seems clear that the commit you bisected to
just allowed a pre-existing misbehavior to become exposed (more easily).

No time to look closer right now.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
I wrote:
> It also seems quite odd that it doesn't fail every time; surely it's
> not conditional whether we'll try to insert a new pg_class tuple or not?
> We need to understand that, too.

Oh!  One gets you ten it "works" as long as the pg_class update is a
HOT update, so that we don't actually end up touching the indexes.
This explains why the crash is less likely to happen in a database
where one's done some work (and, probably, created some dead space in
pg_class).  On the other hand, it doesn't quite fit the observation
that a VACUUM FULL masked the problem ... wouldn't that have ended up
with densely packed pg_class?  Maybe not, if it rebuilt everything
else after pg_class...

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Michael Paquier
Дата:
On Tue, Apr 23, 2019 at 07:54:52PM -0400, Tom Lane wrote:
> That seems like pretty much of a hack :-(, in particular I'm not
> convinced that we'd not end up with a missing index entry afterwards.
> Maybe it's the only way, but I think first we need to trace down
> exactly why this broke.  I remember we had some special-case code
> for reindexing pg_class, maybe something broke that?

Yes, reindex_relation() has some infra to enforce the list of indexes
in the cache for pg_class which has been introduced by a56a016 as far
as it goes.

> It also seems quite odd that it doesn't fail every time; surely it's
> not conditional whether we'll try to insert a new pg_class tuple or not?
> We need to understand that, too.  Maybe the old code never really
> worked in all cases?  It seems clear that the commit you bisected to
> just allowed a pre-existing misbehavior to become exposed (more easily).
>
> No time to look closer right now.

Yeah, there is a fishy smell underneath which comes from 9.6.  When
testing with 9.5 or older a database creation does not create any
crash on a subsequent reindex.  Not sure I'll have the time to look at
that more today, perhaps tomorrow depending on the odds.
--
Michael

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Shaoqi Bai
Дата:
On Wed, Apr 24, 2019 at 7:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Apr 23, 2019 at 04:47:19PM -0400, Tom Lane wrote:
>> Is there some precondition you're not mentioning?

> Hm.  In my own init scripts, I create a new database just after
> starting the instance.

Ah, there we go:

regression=# create database d1;
CREATE DATABASE
regression=# \c d1
You are now connected to database "d1" as user "postgres".
d1=# reindex index pg_class_relname_nsp_index;
psql: server closed the connection unexpectedly

log shows

TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))", File: "indexam.c", Line: 204)

Could reproduce TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))", File: "indexam.c", Line: 204) in postgres log file.
#2  0x00000000008c74ed in ExceptionalCondition (
    conditionName=<value optimized out>, errorType=<value optimized out>,
    fileName=<value optimized out>, lineNumber=<value optimized out>)
    at assert.c:54
#3  0x00000000004e4f8c in index_insert (indexRelation=0x7f80f849a5d8,
    values=0x7ffc4f65b030, isnull=0x7ffc4f65b130, heap_t_ctid=0x2842c0c,
    heapRelation=0x7f80f84bab68, checkUnique=UNIQUE_CHECK_YES,
    indexInfo=0x2843230) at indexam.c:204
#4  0x000000000054c290 in CatalogIndexInsert (indstate=<value optimized out>,
    heapTuple=0x2842c08) at indexing.c:140
#5  0x000000000054c472 in CatalogTupleUpdate (heapRel=0x7f80f84bab68,
    otid=0x2842c0c, tup=0x2842c08) at indexing.c:215
#6  0x00000000008bca77 in RelationSetNewRelfilenode (relation=0x7f80f849a5d8,
    persistence=112 'p') at relcache.c:3531
#7  0x0000000000548b3a in reindex_index (indexId=2663,
    skip_constraint_checks=false, persistence=112 'p', options=0)
    at index.c:3339
#8  0x00000000005ed099 in ReindexIndex (indexRelation=<value optimized out>,
    options=0, concurrent=false) at indexcmds.c:2304
#9  0x00000000007b5925 in standard_ProcessUtility (pstmt=0x281fd70,
But could only see these stack in lldb -c corefile after type bt. Is there a way to also print these stack in postgres log file , and how?

Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Michael Paquier
Дата:
On Tue, Apr 23, 2019 at 08:03:37PM -0400, Tom Lane wrote:
> Oh!  One gets you ten it "works" as long as the pg_class update is a
> HOT update, so that we don't actually end up touching the indexes.
> This explains why the crash is less likely to happen in a database
> where one's done some work (and, probably, created some dead space in
> pg_class).  On the other hand, it doesn't quite fit the observation
> that a VACUUM FULL masked the problem ... wouldn't that have ended up
> with densely packed pg_class?  Maybe not, if it rebuilt everything
> else after pg_class...

I have been able to spend a bit more time testing and looking at the
root of the problem, and I have found two things:
1) The problem is reproducible with REL9_5_STABLE.
2) Bisecting between the merge base points of REL9_4_STABLE/master and
REL9_5_STABLE/master, I am being pointed to the introduction of
replication origins:
commit: 5aa2350426c4fdb3d04568b65aadac397012bbcb
author: Andres Freund <andres@anarazel.de>
date: Wed, 29 Apr 2015 19:30:53 +0200
Introduce replication progress tracking infrastructure.

In order to see the problem, also one needs to patch initdb.c so as
the final VACUUM FULL on pg_database is replaced by VACUUM as on
9.6~.  The root of the problem is actually surprising, but manually
testing on 5aa2350 commit and 5aa2350~1 the difference shows up as the
issue is easily reproducible here.
--
Michael

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Alvaro Herrera
Дата:
On 2019-Apr-25, Michael Paquier wrote:

> 2) Bisecting between the merge base points of REL9_4_STABLE/master and
> REL9_5_STABLE/master, I am being pointed to the introduction of
> replication origins:
> commit: 5aa2350426c4fdb3d04568b65aadac397012bbcb
> author: Andres Freund <andres@anarazel.de>
> date: Wed, 29 Apr 2015 19:30:53 +0200
> Introduce replication progress tracking infrastructure.
> 
> In order to see the problem, also one needs to patch initdb.c so as
> the final VACUUM FULL on pg_database is replaced by VACUUM as on
> 9.6~.  The root of the problem is actually surprising, but manually
> testing on 5aa2350 commit and 5aa2350~1 the difference shows up as the
> issue is easily reproducible here.

Hmm ... I suspect the problem is even older, and that this commit made
it possible to see as a side effect of changing the catalog contents
(since it creates one more view and does a REVOKE, which becomes an
update on pg_class.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Apr 23, 2019 at 08:03:37PM -0400, Tom Lane wrote:
>> Oh!  One gets you ten it "works" as long as the pg_class update is a
>> HOT update, so that we don't actually end up touching the indexes.

> I have been able to spend a bit more time testing and looking at the
> root of the problem, and I have found two things:
> 1) The problem is reproducible with REL9_5_STABLE.

Actually, as far as I can tell, this has been broken since day 1.
I can reproduce the assertion failure back to 9.1, and I think the
only reason it doesn't happen in older branches is that they lack
the ReindexIsProcessingIndex() check in RELATION_CHECKS :-(.

What you have to do to get it to crash is to ensure that
RelationSetNewRelfilenode's update of pg_class will be a non-HOT
update.  You can try to set that up with "vacuum full pg_class"
but it turns out that that tends to leave the pg_class entries
for pg_class's indexes in the last page of the relation, which
is usually not totally full, so that a HOT update works and the
bug doesn't manifest.

A recipe like the following breaks every branch, by ensuring that
the page containing pg_class_relname_nsp_index's entry is full:

regression=# vacuum full pg_class;
VACUUM
regression=# do $$ begin
for i in 100 .. 150 loop
execute 'create table dummy'||i||'(f1 int)';
end loop;
end $$;
DO
regression=# reindex index pg_class_relname_nsp_index;
psql: server closed the connection unexpectedly


As for an actual fix, I tried just moving reindex_index's
SetReindexProcessing call from where it is down to after
RelationSetNewRelfilenode, but that isn't sufficient:

regression=# reindex index pg_class_relname_nsp_index;
psql: ERROR:  could not read block 3 in file "base/16384/41119": read only 0 of 8192 bytes

#0  errfinish (dummy=0) at elog.c:411
#1  0x00000000007a9453 in mdread (reln=<value optimized out>,
    forknum=<value optimized out>, blocknum=<value optimized out>,
    buffer=0x7f608e6a7d00 "") at md.c:633
#2  0x000000000077a9af in ReadBuffer_common (smgr=<value optimized out>,
    relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL,
    strategy=0x0, hit=0x7fff6a7452ef) at bufmgr.c:896
#3  0x000000000077b67e in ReadBufferExtended (reln=0x7f608db5d670,
    forkNum=MAIN_FORKNUM, blockNum=3, mode=<value optimized out>,
    strategy=<value optimized out>) at bufmgr.c:664
#4  0x00000000004ea95a in _bt_getbuf (rel=0x7f608db5d670,
    blkno=<value optimized out>, access=1) at nbtpage.c:805
#5  0x00000000004eb67a in _bt_getroot (rel=0x7f608db5d670, access=2)
    at nbtpage.c:323
#6  0x00000000004f2237 in _bt_search (rel=0x7f608db5d670, key=0x1d5a0c0,
    bufP=0x7fff6a7456a8, access=2, snapshot=0x0) at nbtsearch.c:99
#7  0x00000000004e8caf in _bt_doinsert (rel=0x7f608db5d670, itup=0x1c85e58,
    checkUnique=UNIQUE_CHECK_YES, heapRel=0x1ccb8d0) at nbtinsert.c:219
#8  0x00000000004efc17 in btinsert (rel=0x7f608db5d670,
    values=<value optimized out>, isnull=<value optimized out>,
    ht_ctid=0x1d12dc4, heapRel=0x1ccb8d0, checkUnique=UNIQUE_CHECK_YES,
    indexInfo=0x1c857f8) at nbtree.c:205
#9  0x000000000054c320 in CatalogIndexInsert (indstate=<value optimized out>,
    heapTuple=0x1d12dc0) at indexing.c:140
#10 0x000000000054c502 in CatalogTupleUpdate (heapRel=0x1ccb8d0,
    otid=0x1d12dc4, tup=0x1d12dc0) at indexing.c:215
#11 0x00000000008bcba7 in RelationSetNewRelfilenode (relation=0x7f608db5d670,
    persistence=112 'p') at relcache.c:3531
#12 0x0000000000548b16 in reindex_index (indexId=2663,
    skip_constraint_checks=false, persistence=112 'p', options=0)
    at index.c:3336
#13 0x00000000005ed129 in ReindexIndex (indexRelation=<value optimized out>,
    options=0, concurrent=false) at indexcmds.c:2304
#14 0x00000000007b5a45 in standard_ProcessUtility (pstmt=0x1c66d70,
    queryString=0x1c65f68 "reindex index pg_class_relname_nsp_index;",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x1c66e68, completionTag=0x7fff6a745e40 "") at utility.c:787

The problem here is that RelationSetNewRelfilenode is aggressively
changing the index's relcache entry before it's written out the
updated tuple, so that the tuple update tries to make an index
entry in the new storage which isn't filled yet.  I think we can
fix it by *not* doing that, but leaving it to the relcache inval
during the CommandCounterIncrement call to update the relcache
entry.  However, it looks like that will take some API refactoring,
because the storage-creation functions expect to get the new
relfilenode out of the relcache entry, and they'll have to be
changed to not do it that way.

I'll work on a patch ...

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
I wrote:
> The problem here is that RelationSetNewRelfilenode is aggressively
> changing the index's relcache entry before it's written out the
> updated tuple, so that the tuple update tries to make an index
> entry in the new storage which isn't filled yet.  I think we can
> fix it by *not* doing that, but leaving it to the relcache inval
> during the CommandCounterIncrement call to update the relcache
> entry.  However, it looks like that will take some API refactoring,
> because the storage-creation functions expect to get the new
> relfilenode out of the relcache entry, and they'll have to be
> changed to not do it that way.

So looking at that, it seems like the table_relation_set_new_filenode
API is pretty darn ill-designed.  It assumes that it's passed an
already-entirely-valid relcache entry, but it also supposes that
it can pass back information that needs to go into the relation's
pg_class entry.  One or the other side of that has to give, unless
you want to doom everything to updating pg_class twice.

I'm not really sure what's the point of giving the tableam control
of relfrozenxid+relminmxid at all, and I notice that index_create
for one is just Asserting that constant values are returned.

I think we need to do one or possibly both of these things:

* split table_relation_set_new_filenode into two functions,
one that doesn't take a relcache entry at all and returns
appropriate relfrozenxid+relminmxid for a new rel, and then
one that just creates storage without dealing with the xid
values;

* change table_relation_set_new_filenode so that it is told
the relfilenode etc to use without assuming that it has a
valid relcache entry to work with.

Thoughts?

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-25 12:29:16 -0400, Tom Lane wrote:
> I wrote:
> > The problem here is that RelationSetNewRelfilenode is aggressively
> > changing the index's relcache entry before it's written out the
> > updated tuple, so that the tuple update tries to make an index
> > entry in the new storage which isn't filled yet.  I think we can
> > fix it by *not* doing that, but leaving it to the relcache inval
> > during the CommandCounterIncrement call to update the relcache
> > entry.  However, it looks like that will take some API refactoring,
> > because the storage-creation functions expect to get the new
> > relfilenode out of the relcache entry, and they'll have to be
> > changed to not do it that way.
> 
> So looking at that, it seems like the table_relation_set_new_filenode
> API is pretty darn ill-designed.  It assumes that it's passed an
> already-entirely-valid relcache entry, but it also supposes that
> it can pass back information that needs to go into the relation's
> pg_class entry.  One or the other side of that has to give, unless
> you want to doom everything to updating pg_class twice.

I'm not super happy about it either - but I think that's somewhat of an
outgrowth of how this worked before.  I mean there's two differences:

1) Previously the RelationCreateStorage() was called unconditionally,
now it's

        case RELKIND_INDEX:
        case RELKIND_SEQUENCE:
            RelationCreateStorage(relation->rd_node, persistence);
            RelationOpenSmgr(relation);
            break;

        case RELKIND_RELATION:
        case RELKIND_TOASTVALUE:
        case RELKIND_MATVIEW:
            table_relation_set_new_filenode(relation, persistence,
                                            &freezeXid, &minmulti);
            break;
    }

That seems pretty obviously necessary.


2) Previously AddNewRelationTuple() relation tuple determined the
initial horizon for table like things:
    /* Initialize relfrozenxid and relminmxid */
    if (relkind == RELKIND_RELATION ||
        relkind == RELKIND_MATVIEW ||
        relkind == RELKIND_TOASTVALUE)
    {
        /*
         * Initialize to the minimum XID that could put tuples in the table.
         * We know that no xacts older than RecentXmin are still running, so
         * that will do.
         */
        new_rel_reltup->relfrozenxid = RecentXmin;

        /*
         * Similarly, initialize the minimum Multixact to the first value that
         * could possibly be stored in tuples in the table.  Running
         * transactions could reuse values from their local cache, so we are
         * careful to consider all currently running multis.
         *
         * XXX this could be refined further, but is it worth the hassle?
         */
        new_rel_reltup->relminmxid = GetOldestMultiXactId();
    }

and inserted that. Now it's determined previously below heap_create(),
and passed as an argument to AddNewRelationTuple().

and similarly the caller to RelationSetNewRelfilenode() determined the
new horizons, but they also just were written into the relcache entry
and then updated:

11:
    classform->relfrozenxid = freezeXid;
    classform->relminmxid = minmulti;
    classform->relpersistence = persistence;

    CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
master:
    classform->relfrozenxid = freezeXid;
    classform->relminmxid = minmulti;
    classform->relpersistence = persistence;

    CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);


I'm not quite sure why the current situation is any worse?


Perhaps that's because I don't quite understand what you mean with "It
assumes that it's passed an already-entirely-valid relcache entry". What
do you mean by that / where does it assume that?  I guess we could warn
a bit more about the underlying tuple not necessarily existing yet it in
the callback's docs, but other than that?  Previously heap.c also was
dealing with an relcache entry without backing pg_class entry but with
existing storage, no?


> I'm not really sure what's the point of giving the tableam control
> of relfrozenxid+relminmxid at all

Well, because not every AM is going to need those. It'd make very little
sense to e.g. set them for an undo based design like zheap's - there is
need to freeze ever. The need for each page to rewritten multiple times
(original write, hint bit sets, freezing for heap) imo is one of the
major reasons people are working on alternative AMs.  That seems to
fundamentally require AMs having control over the relfrozenxid


> and I notice that index_create for one is just Asserting that constant
> values are returned.

Well, that's not going to call into tableam at all?  Those asserts
previously were in RelationSetNewRelfilenode() itself:

    /* Indexes, sequences must have Invalid frozenxid; other rels must not */
    Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
            relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
           freezeXid == InvalidTransactionId :
           TransactionIdIsNormal(freezeXid));

but given that e.g. not every tableam is going to have those values

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-25 12:29:16 -0400, Tom Lane wrote:
>> So looking at that, it seems like the table_relation_set_new_filenode
>> API is pretty darn ill-designed.  It assumes that it's passed an
>> already-entirely-valid relcache entry, but it also supposes that
>> it can pass back information that needs to go into the relation's
>> pg_class entry.  One or the other side of that has to give, unless
>> you want to doom everything to updating pg_class twice.

> I'm not super happy about it either - but I think that's somewhat of an
> outgrowth of how this worked before.

I'm not saying that the previous code was nice; I'm just saying that
what is there in HEAD needs to be factored differently so that we
can solve this problem in a reasonable way.

> Perhaps that's because I don't quite understand what you mean with "It
> assumes that it's passed an already-entirely-valid relcache entry". What
> do you mean by that / where does it assume that?

Well, I can see heapam_relation_set_new_filenode touching all of these
fields right now:

rel->rd_node
rel->rd_rel->relpersistence (and why is it looking at that rather than
the passed-in persistence???)
rel->rd_rel->relkind
whatever RelationOpenSmgr touches
rel->rd_smgr

As far as I can see, there is no API restriction on what parts of the
relcache entry it may presume are valid.  It *certainly* thinks that
rd_rel is valid, which is rather at odds with the fact that this has
to be called before the pg_class entry exists all (for the creation
case) or has been updated (for the set-new-relfilenode case).  Unless
you want to redefine things so that we create/update the pg_class
entry, put it into rel->rd_rel, call relation_set_new_filenode, and
then update the pg_class entry again with what that function gives back
for the xmin fields.

That's obviously stupid, of course.  But my point is that we need to
restrict what the function can touch or assume valid, if it's going
to be called before the pg_class update happens.  And I'd rather that
we did so by restricting its argument list so that it hasn't even got
access to stuff we don't want it assuming valid.

Also, in order to fix this problem, we cannot change the actual
relcache entry contents until after we've performed the tuple update.
So if we want the tableam code to be getting the relfilenode or
persistence info out of the relcache entry, rather than passing
those as standalone parameters, the call can't happen till after
the tuple update and CCI call.  That's why I was thinking about
splitting it into two functions.  Get the xid values, update the
pg_class tuple, CCI, then do relation_set_new_filenode with the
updated relcache entry would work.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-25 14:50:09 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Perhaps that's because I don't quite understand what you mean with "It
> > assumes that it's passed an already-entirely-valid relcache entry". What
> > do you mean by that / where does it assume that?
>
> Well, I can see heapam_relation_set_new_filenode touching all of these
> fields right now:
>
> rel->rd_node
> rel->rd_rel->relpersistence (and why is it looking at that rather than
> the passed-in persistence???)

Ugh.

> rel->rd_rel->relkind
> whatever RelationOpenSmgr touches
> rel->rd_smgr


> As far as I can see, there is no API restriction on what parts of the
> relcache entry it may presume are valid.  It *certainly* thinks that
> rd_rel is valid, which is rather at odds with the fact that this has
> to be called before the pg_class entry exists all (for the creation
> case) or has been updated (for the set-new-relfilenode case).

Well, that's just what we did before. And given the way the code is
structured, I am not sure I see a decent alternative that's not a
disproportionate amount of work.  I mean, heap.c's heap_create() and
heap_create_with_catalog() basically work off the Relation after the
RelationBuildLocalRelation() call, a good bit before the underlying
storage is valid.


> But my point is that we need to restrict what the function can touch
> or assume valid, if it's going to be called before the pg_class update
> happens.  And I'd rather that we did so by restricting its argument
> list so that it hasn't even got access to stuff we don't want it
> assuming valid.

OTOH, that'd mean we'd need to separately look up the amhandler, pass in
a lot more arguments etc. ISTM it'd be easier to just declare that only
the fields RelationBuildLocalRelation() sets are to be considered valid.
See the end of the email for a proposal.


> Also, in order to fix this problem, we cannot change the actual
> relcache entry contents until after we've performed the tuple update.
> So if we want the tableam code to be getting the relfilenode or
> persistence info out of the relcache entry, rather than passing
> those as standalone parameters, the call can't happen till after
> the tuple update and CCI call.  That's why I was thinking about
> splitting it into two functions.  Get the xid values, update the
> pg_class tuple, CCI, then do relation_set_new_filenode with the
> updated relcache entry would work.

I think that'd be hard for the initial relation creation. At the moment
we intentionally create the storage for the new relation before
inserting the catalog contents.

Currently the only thing that table_relation_set_new_filenode() accesses
that already is updated is the RelFileNode. I wonder if we shouldn't
change the API so that table_relation_set_new_filenode() will get a
relcache entry *without* any updates passed in, then internally does
GetNewRelFileNode() (if so desired by the AM), and returns the new rnode
via a new out parameter.  So RelationSetNewRelfilenode() would basically
work like this:

    switch (relation->rd_rel->relkind)
    {
        case RELKIND_INDEX:
        case RELKIND_SEQUENCE:
            newrelfilenode = GetNewRelFileNode(...);
            RelationCreateStorage(newrelfilenode, persistence);
            RelationOpenSmgr(relation);
            break;
        case RELKIND_RELATION:
        case RELKIND_TOASTVALUE:
        case RELKIND_MATVIEW:
            table_relation_set_new_filenode(relation, persistence,
                                            &newrnode, &freezeXid, &minmulti);
            break;
    }

    /* Now update the pg_class row. */
    if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
    {
        classform->relpages = 0;    /* it's empty until further notice */
        classform->reltuples = 0;
        classform->relallvisible = 0;
    }
    classform->relfrozenxid = freezeXid;
    classform->relminmxid = minmulti;
    classform->relpersistence = persistence;

    /*
     * If we're dealing with a mapped index, pg_class.relfilenode doesn't
     * change; instead we'll have to send the update to the relation mapper.
     * But we can do so only after doing the catalog update, otherwise the
     * contents of the old data is going to be invalid.
     *
     * XXX: Can this actually validly be reached for a mapped table?
     */
    if (!RelationIsMapped(relation))
        classform->relfilenode = newrelfilenode;

    CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);

    /* now that the catalog is updated, update relmapper if necessary */
    if (RelationIsMapped(relation))
        RelationMapUpdateMap(RelationGetRelid(relation),
                             newrelfilenode,
                             relation->rd_rel->relisshared,
                             true);

    /*
     * Make the pg_class row change visible, as well as the relation map
     * change if any.  This will cause the relcache entry to get updated, too.
     */
    CommandCounterIncrement();

    // XXX: Previously we called RelationInitPhysicalAddr() in this routine
    // but I don't think that should be needed due to the CCI?

and the table AM would do the necessary
            *newrelfilenode = GetNewRelFileNode(...);
            RelationCreateStorage(*newrelfilenode, persistence);
            RelationOpenSmgr(relation);
dance *iff* it wants to do so.

That seems like it'd go towards allowing a fair bit more flexibility for
table AMs (and possibly index AMs in the future).

We'd also have to make the equivalent set of changes to
ATExecSetTableSpace(). But that seems like it'd architecturally be
cleaner anyway. I think we'd move the FlushRelationBuffers(),
GetNewRelFileNode() logic into the callback.  It'd probably make sense
to have ATExecSetTableSpace() first call
table_relation_set_new_filenode() and then call
table_relation_copy_data() with the new relfilenode, but not yet updated
relcache entry.

We don't currently allow that, but as far as I can see the current
coding of ATExecSetTableSpace() also has bad problems with system
catalog updates. It copies the data and *then* does
CatalogTupleUpdate(), but *witout* updating the reclache - which ijust
would cause the update to be lost.


I could come up with a patch for that if you want me to.


Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-25 14:50:09 -0400, Tom Lane wrote:
>> As far as I can see, there is no API restriction on what parts of the
>> relcache entry it may presume are valid.  It *certainly* thinks that
>> rd_rel is valid, which is rather at odds with the fact that this has
>> to be called before the pg_class entry exists all (for the creation
>> case) or has been updated (for the set-new-relfilenode case).

> Well, that's just what we did before. And given the way the code is
> structured, I am not sure I see a decent alternative that's not a
> disproportionate amount of work.  I mean, heap.c's heap_create() and
> heap_create_with_catalog() basically work off the Relation after the
> RelationBuildLocalRelation() call, a good bit before the underlying
> storage is valid.

You could imagine restructuring that ... but I agree it'd be a lot
of work.

> Currently the only thing that table_relation_set_new_filenode() accesses
> that already is updated is the RelFileNode. I wonder if we shouldn't
> change the API so that table_relation_set_new_filenode() will get a
> relcache entry *without* any updates passed in, then internally does
> GetNewRelFileNode() (if so desired by the AM), and returns the new rnode
> via a new out parameter.

That could work.  The important API spec is then that the relcache entry
reflects the *previous* state of the relation, and is not to be modified
by the tableam call.  After the call, we perform the pg_class update and
do CCI, and the relcache inval seen by the CCI causes the relcache entry
to be brought into sync with the new reality.  So relcache entries change
at CCI boundaries, not in between.

In the creation case, it works more or less the same, with the
understanding that the "previous state" is some possibly-partly-dummy
state set up by RelationBuildLocalRelation.  But we might need to add
a CCI call that wasn't there before; not sure.

> We don't currently allow that, but as far as I can see the current
> coding of ATExecSetTableSpace() also has bad problems with system
> catalog updates. It copies the data and *then* does
> CatalogTupleUpdate(), but *witout* updating the reclache - which ijust
> would cause the update to be lost.

Well, I imagine it's expecting the subsequent CCI to update the relcache
entry, which I think is correct behavior in this worldview.  We're
basically trying to make the relcache state follow transaction/command
boundary semantics.

> I could come up with a patch for that if you want me to.

I'm happy to let you take a whack at it if you want.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-25 16:02:03 -0400, Tom Lane wrote:
> > Currently the only thing that table_relation_set_new_filenode() accesses
> > that already is updated is the RelFileNode. I wonder if we shouldn't
> > change the API so that table_relation_set_new_filenode() will get a
> > relcache entry *without* any updates passed in, then internally does
> > GetNewRelFileNode() (if so desired by the AM), and returns the new rnode
> > via a new out parameter.
> 
> That could work.  The important API spec is then that the relcache entry
> reflects the *previous* state of the relation, and is not to be modified
> by the tableam call.

Right.

I was wondering if we should just pass in the pg_class tuple as an "out"
argument, instead of pointers to relfilnode/relfrozenxid/relminmxid.


> > We don't currently allow that, but as far as I can see the current
> > coding of ATExecSetTableSpace() also has bad problems with system
> > catalog updates. It copies the data and *then* does
> > CatalogTupleUpdate(), but *witout* updating the reclache - which ijust
> > would cause the update to be lost.
> 
> Well, I imagine it's expecting the subsequent CCI to update the relcache
> entry, which I think is correct behavior in this worldview.  We're
> basically trying to make the relcache state follow transaction/command
> boundary semantics.

My point was that given the current coding the code in
ATExecSetTableSpace() would make changes to the *old* relfilenode, after
having already copied the contents to the new relfilenode. Which means
that if ATExecSetTableSpace() is ever used on pg_class or one of it's
indexes, it'd just loose those changes, afaict.


> > I could come up with a patch for that if you want me to.
> 
> I'm happy to let you take a whack at it if you want.

I'll give it a whack (after writing one more email on a separate but
loosely related topic).

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-25 16:02:03 -0400, Tom Lane wrote:
>> That could work.  The important API spec is then that the relcache entry
>> reflects the *previous* state of the relation, and is not to be modified
>> by the tableam call.

> Right.

> I was wondering if we should just pass in the pg_class tuple as an "out"
> argument, instead of pointers to relfilnode/relfrozenxid/relminmxid.

Yeah, possibly.  The whole business with xids is perhaps heapam specific,
so decoupling this function's signature from them would be good.

> My point was that given the current coding the code in
> ATExecSetTableSpace() would make changes to the *old* relfilenode, after
> having already copied the contents to the new relfilenode. Which means
> that if ATExecSetTableSpace() is ever used on pg_class or one of it's
> indexes, it'd just loose those changes, afaict.

Hmm.

There's another reason why we'd like the relcache contents to only change
at CCI, which is that if we get a relcache invalidation somewhere before
we get to the CCI, relcache.c would proceed to reload it based on the
*current* catalog contents (ie, pre-update, thanks to the magic of MVCC),
so that the entry would revert back to its previous state until we did get
to CCI.  I wonder whether there's any demonstrable bug there.  Though
you'd think the CLOBBER_CACHE_ALWAYS animals would've found it if so.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > My point was that given the current coding the code in
> > ATExecSetTableSpace() would make changes to the *old* relfilenode, after
> > having already copied the contents to the new relfilenode. Which means
> > that if ATExecSetTableSpace() is ever used on pg_class or one of it's
> > indexes, it'd just loose those changes, afaict.
> 
> Hmm.

I think there's no a live bug in because we a) require
allow_system_table_mods to modify system tables, and then b) have
another check

    /*
     * We cannot support moving mapped relations into different tablespaces.
     * (In particular this eliminates all shared catalogs.)
     */
    if (RelationIsMapped(rel))
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("cannot move system relation \"%s\"",
                        RelationGetRelationName(rel))));

that triggers even when allow_system_table_mods is off.


> There's another reason why we'd like the relcache contents to only change
> at CCI, which is that if we get a relcache invalidation somewhere before
> we get to the CCI, relcache.c would proceed to reload it based on the
> *current* catalog contents (ie, pre-update, thanks to the magic of MVCC),
> so that the entry would revert back to its previous state until we did get
> to CCI.  I wonder whether there's any demonstrable bug there.  Though
> you'd think the CLOBBER_CACHE_ALWAYS animals would've found it if so.

I think we basically assume that there's nothing triggering relcache
invals here inbetween updating the relcache entry, and doing the actual
catalog modification. That looks like it's currently somewhat OK in the
places we've talked about so far.

This made me look at cluster.c and damn, I'd forgotten about that.
Look at the following code in copy_table_data():

        /*
         * When doing swap by content, any toast pointers written into NewHeap
         * must use the old toast table's OID, because that's where the toast
         * data will eventually be found.  Set this up by setting rd_toastoid.
         * This also tells toast_save_datum() to preserve the toast value
         * OIDs, which we want so as not to invalidate toast pointers in
         * system catalog caches, and to avoid making multiple copies of a
         * single toast value.
         *
         * Note that we must hold NewHeap open until we are done writing data,
         * since the relcache will not guarantee to remember this setting once
         * the relation is closed.  Also, this technique depends on the fact
         * that no one will try to read from the NewHeap until after we've
         * finished writing it and swapping the rels --- otherwise they could
         * follow the toast pointers to the wrong place.  (It would actually
         * work for values copied over from the old toast table, but not for
         * any values that we toast which were previously not toasted.)
         */
        NewHeap->rd_toastoid = OldHeap->rd_rel->reltoastrelid;
    }
    else
        *pSwapToastByContent = false;

which then goes on to do things like a full blown sort or index
scan. Sure, that's for the old relation, but that's so ugly. It works
because RelationClearRelation() copies over rd_toastoid :/.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-25 16:02:03 -0400, Tom Lane wrote:
> >> That could work.  The important API spec is then that the relcache entry
> >> reflects the *previous* state of the relation, and is not to be modified
> >> by the tableam call.
>
> > Right.
>
> > I was wondering if we should just pass in the pg_class tuple as an "out"
> > argument, instead of pointers to relfilnode/relfrozenxid/relminmxid.
>
> Yeah, possibly.  The whole business with xids is perhaps heapam specific,
> so decoupling this function's signature from them would be good.

I've left that out in the attached. Currently VACUUM FULL / CLUSTER also
needs to handle those, and the callback for transactional rewrite
(table_relation_copy_for_cluster()), also returns those as output
parameter.  I think I can see a way how we could clean up the relevant
cluster.c code, but until that's done, I don't see much point in a
different interface (I'll probably write apatch .

The attached patch fixes the problem for me, and passes all existing
tests. It contains a few changes that are not strictly necessary, but
imo clear improvements.

We probably could split the tableam changes and related refactoring from
the fix to make backpatching simpler. I've not done that yet, but I
think we should before committing.

Questions:
- Should we move the the CommandCounterIncrement() from
  RelationSetNewRelfilenode() to the callers? That'd allow them to do
  other things to the new relation (e.g. fill it), before making the
  changes visible. Don't think it'd currently help, but it seems like it
  could make code more robust in the future.

- Should we introduce an assertion into CatalogIndexInsert()'s
  HeapTupleIsHeapOnly() path, that asserts that all the relevant indexes
  aren't ReindexIsProcessingIndex()? Otherwise it seems way too easy to
  re-introduce bugs like this one.  Dirty hack for that included.

- Wonder if we shouldn't introduce something akin to
  SetReindexProcessing() for table rewrites (e.g. VACUUM FULL), to
  prevent the related error of inserting/updating a catalog table that's
  currently being rewritten.

Taking this as a WIP, what do you think?

Greetings,

Andres Freund

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Michael Paquier
Дата:
On Thu, Apr 25, 2019 at 11:32:21AM -0400, Tom Lane wrote:
> What you have to do to get it to crash is to ensure that
> RelationSetNewRelfilenode's update of pg_class will be a non-HOT
> update.  You can try to set that up with "vacuum full pg_class"
> but it turns out that that tends to leave the pg_class entries
> for pg_class's indexes in the last page of the relation, which
> is usually not totally full, so that a HOT update works and the
> bug doesn't manifest.

Indeed, I can see that the update difference after and before the
commit.  This could have blowed up on basically anything when
bisecting.  Changing the page size would have given something else
perhaps..

> As for an actual fix, I tried just moving reindex_index's
> SetReindexProcessing call from where it is down to after
> RelationSetNewRelfilenode, but that isn't sufficient:
>
> regression=# reindex index pg_class_relname_nsp_index;
> psql: ERROR:  could not read block 3 in file "base/16384/41119":
> read only 0 of 8192 bytes

Yeah, that's one of the first things I tried as well when first
looking at the problem.  Turns out it is not that simple.
--
Michael

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> I was wondering if we should just pass in the pg_class tuple as an "out"
>>> argument, instead of pointers to relfilnode/relfrozenxid/relminmxid.

>> Yeah, possibly.  The whole business with xids is perhaps heapam specific,
>> so decoupling this function's signature from them would be good.

> I've left that out in the attached. Currently VACUUM FULL / CLUSTER also
> needs to handle those, and the callback for transactional rewrite
> (table_relation_copy_for_cluster()), also returns those as output
> parameter.  I think I can see a way how we could clean up the relevant
> cluster.c code, but until that's done, I don't see much point in a
> different interface (I'll probably write apatch .

OK, we can leave that for later.  I suppose there's little hope that
v12's version of the tableam API can be chiseled onto stone tablets yet.

> Questions:
> - Should we move the the CommandCounterIncrement() from
>   RelationSetNewRelfilenode() to the callers? That'd allow them to do
>   other things to the new relation (e.g. fill it), before making the
>   changes visible. Don't think it'd currently help, but it seems like it
>   could make code more robust in the future.

No, I don't think so.  The intermediate state where the relcache entry
is inconsistent with the on-disk state is not something we want to
be propagating all over the place.  As for robustness, I'd be more
worried about somebody forgetting the CCI than about possibly being
able to squeeze additional updates into the same CCI cycle.

> - Should we introduce an assertion into CatalogIndexInsert()'s
>   HeapTupleIsHeapOnly() path, that asserts that all the relevant indexes
>   aren't ReindexIsProcessingIndex()? Otherwise it seems way too easy to
>   re-introduce bugs like this one.  Dirty hack for that included.

Good idea, but I think I'd try to keep the code the same in a non-assert
build, that is more like

+#ifndef USE_ASSERT_CHECKING
    /* HOT update does not require index inserts */
    if (HeapTupleIsHeapOnly(heapTuple))
        return;
+#endif

    /* required setup here ... */

+#ifdef USE_ASSERT_CHECKING
+    /* HOT update does not require index inserts, but check we could have */
+    if (HeapTupleIsHeapOnly(heapTuple))
+    {
+        /* checking here */
+        return;
+    }
+#endif

> - Wonder if we shouldn't introduce something akin to
>   SetReindexProcessing() for table rewrites (e.g. VACUUM FULL), to
>   prevent the related error of inserting/updating a catalog table that's
>   currently being rewritten.

Not terribly excited about that, but if you are, maybe a follow-on
patch could do that.

> Taking this as a WIP, what do you think?

Seems generally about right.  One note is that in reindex_index,
the right fix is to push the intermediate code to above the PG_TRY:
there's no reason to start the TRY block any sooner than the
SetReindexProcessing call.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Taking this as a WIP, what do you think?

> Seems generally about right.

Andres, are you pushing this forward?  Next week's minor releases
are coming up fast, and we're going to need to adapt the HEAD patch
significantly for the back branches AFAICS.  So there's little time
to spare.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-29 18:07:07 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> Taking this as a WIP, what do you think?
> 
> > Seems generally about right.
> 
> Andres, are you pushing this forward?  Next week's minor releases
> are coming up fast, and we're going to need to adapt the HEAD patch
> significantly for the back branches AFAICS.  So there's little time
> to spare.

Yea. I'm testing the backbranch'd bits (much simpler) and writing the
commit message atm.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-29 15:09:24 -0700, Andres Freund wrote:
> On 2019-04-29 18:07:07 -0400, Tom Lane wrote:
> > I wrote:
> > > Andres Freund <andres@anarazel.de> writes:
> > >> Taking this as a WIP, what do you think?
> > 
> > > Seems generally about right.
> > 
> > Andres, are you pushing this forward?  Next week's minor releases
> > are coming up fast, and we're going to need to adapt the HEAD patch
> > significantly for the back branches AFAICS.  So there's little time
> > to spare.
> 
> Yea. I'm testing the backbranch'd bits (much simpler) and writing the
> commit message atm.

I've pushed the master bits, and the other branches are running
check-world right now and I'll push soon unless something breaks (it's a
bit annoying that <= 9.6 can't run check-world in parallel...).

Turns out, I was confused, and there wasn't much pre-existing breakage
in RelationSetNewRelfilenode() (I guess I must have been thinking of
ATExecSetTableSpace?). That part was broken in d25f519107, I should have
caught this while reviewing and signficantly evolving the code in that
commit, mea culpa.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I've pushed the master bits, and the other branches are running
> check-world right now and I'll push soon unless something breaks (it's a
> bit annoying that <= 9.6 can't run check-world in parallel...).

Seems like putting reindexes of pg_class into a test script that runs
in parallel with other DDL wasn't a hot idea.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Andres Freund
Дата:
Hi,

On April 29, 2019 9:37:33 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> I've pushed the master bits, and the other branches are running
>> check-world right now and I'll push soon unless something breaks
>(it's a
>> bit annoying that <= 9.6 can't run check-world in parallel...).
>
>Seems like putting reindexes of pg_class into a test script that runs
>in parallel with other DDL wasn't a hot idea.

Saw that. Will try to reproduce (and if necessary either run separately or revert). But isn't that somewhat broken?
They'renot run in a transaction, so the locking shouldn't be deadlock prone. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On April 29, 2019 9:37:33 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Seems like putting reindexes of pg_class into a test script that runs
>> in parallel with other DDL wasn't a hot idea.

> Saw that. Will try to reproduce (and if necessary either run separately or revert). But isn't that somewhat broken?
They'renot run in a transaction, so the locking shouldn't be deadlock prone. 

Hm?  REINDEX INDEX is deadlock-prone by definition, because it starts
by opening/locking the index and then it has to open/lock the index's
table.  Every other operation locks tables before their indexes.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-30 00:50:20 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On April 29, 2019 9:37:33 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Seems like putting reindexes of pg_class into a test script that runs
> >> in parallel with other DDL wasn't a hot idea.
> 
> > Saw that. Will try to reproduce (and if necessary either run separately or revert). But isn't that somewhat broken?
They'renot run in a transaction, so the locking shouldn't be deadlock prone.
 
> 
> Hm?  REINDEX INDEX is deadlock-prone by definition, because it starts
> by opening/locking the index and then it has to open/lock the index's
> table.  Every other operation locks tables before their indexes.

We claim to have solved that:

/*
 * ReindexIndex
 *        Recreate a specific index.
 */
void
ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)


    /*
     * Find and lock index, and check permissions on table; use callback to
     * obtain lock on table first, to avoid deadlock hazard.  The lock level
     * used here must match the index lock obtained in reindex_index().
     */
    indOid = RangeVarGetRelidExtended(indexRelation,
                                      concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
                                      0,
                                      RangeVarCallbackForReindexIndex,
                                      (void *) &heapOid);

and I don't see an obvious hole in the general implementation. Minus the
comment that code exists back to 9.4.

I suspect the problem isn't REINDEX INDEX in general, it's REINDEX INDEX
over catalog tables modified during reindex. The callback acquires a
ShareLock lock on the index's table, but *also* during the reindex needs
a RowExclusiveLock on pg_class, etc. E.g. in RelationSetNewRelfilenode()
on pg_class, and on pg_index in index_build(). Which means there's a
lock-upgrade hazard (Share to RowExclusive - well, that's more a
side-grade, but still deadlock prone).

I can think of ways to fix that (e.g. if reindex is on pg_class or
index, use SHARE ROW EXCLUSIVE, rather than SHARE), but we'd probably
not want to backpatch that.

I'll try to reproduce tomorrow.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-30 00:50:20 -0400, Tom Lane wrote:
>> Hm?  REINDEX INDEX is deadlock-prone by definition, because it starts
>> by opening/locking the index and then it has to open/lock the index's
>> table.  Every other operation locks tables before their indexes.

> We claim to have solved that:

Oh, okay ...

> I suspect the problem isn't REINDEX INDEX in general, it's REINDEX INDEX
> over catalog tables modified during reindex.

So far, every one of the failures in the buildfarm looks like the REINDEX
is deciding that it needs to wait for some other transaction, eg here

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-30%2014%3A43%3A11

the relevant bit of postmaster log is

2019-04-30 14:44:13.478 UTC [16135:450] pg_regress/create_index LOG:  statement: REINDEX TABLE pg_class;
2019-04-30 14:44:14.478 UTC [16137:430] pg_regress/create_view LOG:  process 16137 detected deadlock while waiting for
AccessShareLockon relation 2662 of database 16384 after 1000.148 ms 
2019-04-30 14:44:14.478 UTC [16137:431] pg_regress/create_view DETAIL:  Process holding the lock: 16135. Wait queue: .
2019-04-30 14:44:14.478 UTC [16137:432] pg_regress/create_view STATEMENT:  DROP SCHEMA temp_view_test CASCADE;
2019-04-30 14:44:14.478 UTC [16137:433] pg_regress/create_view ERROR:  deadlock detected
2019-04-30 14:44:14.478 UTC [16137:434] pg_regress/create_view DETAIL:  Process 16137 waits for AccessShareLock on
relation2662 of database 16384; blocked by process 16135. 
    Process 16135 waits for ShareLock on transaction 2875; blocked by process 16137.
    Process 16137: DROP SCHEMA temp_view_test CASCADE;
    Process 16135: REINDEX TABLE pg_class;
2019-04-30 14:44:14.478 UTC [16137:435] pg_regress/create_view HINT:  See server log for query details.
2019-04-30 14:44:14.478 UTC [16137:436] pg_regress/create_view STATEMENT:  DROP SCHEMA temp_view_test CASCADE;

I haven't been able to reproduce this locally yet, but my guess is that
the REINDEX wants to update some row that was already updated by the
concurrent transaction, so it has to wait to see if the latter commits
or not.  And, of course, waiting while holding AccessExclusiveLock on
any index of pg_class is a Bad Idea (TM).  But I can't quite see why
we'd be doing something like that during the reindex ...

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
I wrote:
> I haven't been able to reproduce this locally yet, but my guess is that
> the REINDEX wants to update some row that was already updated by the
> concurrent transaction, so it has to wait to see if the latter commits
> or not.  And, of course, waiting while holding AccessExclusiveLock on
> any index of pg_class is a Bad Idea (TM).  But I can't quite see why
> we'd be doing something like that during the reindex ...

Ah-hah: the secret to making it reproducible is what prion is doing:
-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE

Here's a stack trace from reindex's side:

#0  0x00000033968e9223 in __epoll_wait_nocancel ()
    at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000787cb5 in WaitEventSetWaitBlock (set=0x22d52f0, timeout=-1,
    occurred_events=0x7ffc77117c00, nevents=1,
    wait_event_info=<value optimized out>) at latch.c:1080
#2  WaitEventSetWait (set=0x22d52f0, timeout=-1,
    occurred_events=0x7ffc77117c00, nevents=1,
    wait_event_info=<value optimized out>) at latch.c:1032
#3  0x00000000007886da in WaitLatchOrSocket (latch=0x7f90679077f4,
    wakeEvents=<value optimized out>, sock=-1, timeout=-1,
    wait_event_info=50331652) at latch.c:407
#4  0x000000000079993d in ProcSleep (locallock=<value optimized out>,
    lockMethodTable=<value optimized out>) at proc.c:1290
#5  0x0000000000796ba2 in WaitOnLock (locallock=0x2200600, owner=0x2213470)
    at lock.c:1768
#6  0x0000000000798719 in LockAcquireExtended (locktag=0x7ffc77117f90,
    lockmode=<value optimized out>, sessionLock=<value optimized out>,
    dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050
#7  0x00000000007939b7 in XactLockTableWait (xid=2874,
    rel=<value optimized out>, ctid=<value optimized out>,
    oper=XLTW_InsertIndexUnique) at lmgr.c:658
#8  0x00000000004d4841 in heapam_index_build_range_scan (
    heapRelation=0x7f905eb3fcd8, indexRelation=0x7f905eb3c5b8,
    indexInfo=0x22d50c0, allow_sync=<value optimized out>, anyvisible=false,
    progress=true, start_blockno=0, numblocks=4294967295,
    callback=0x4f8330 <_bt_build_callback>, callback_state=0x7ffc771184f0,
    scan=0x2446fb0) at heapam_handler.c:1527
#9  0x00000000004f9db0 in table_index_build_scan (heap=0x7f905eb3fcd8,
    index=0x7f905eb3c5b8, indexInfo=0x22d50c0)
    at ../../../../src/include/access/tableam.h:1437
#10 _bt_spools_heapscan (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8,
    indexInfo=0x22d50c0) at nbtsort.c:489
#11 btbuild (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8, indexInfo=0x22d50c0)
    at nbtsort.c:337
#12 0x0000000000547e33 in index_build (heapRelation=0x7f905eb3fcd8,
    indexRelation=0x7f905eb3c5b8, indexInfo=0x22d50c0, isreindex=true,
    parallel=<value optimized out>) at index.c:2724
#13 0x0000000000548b97 in reindex_index (indexId=2662,
    skip_constraint_checks=false, persistence=112 'p', options=0)
    at index.c:3349
#14 0x00000000005490f1 in reindex_relation (relid=<value optimized out>,
    flags=5, options=0) at index.c:3592
#15 0x00000000005ed295 in ReindexTable (relation=0x21e2938, options=0,
    concurrent=<value optimized out>) at indexcmds.c:2422
#16 0x00000000007b5f69 in standard_ProcessUtility (pstmt=0x21e2cf0,
    queryString=0x21e1f18 "REINDEX TABLE pg_class;",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x21e2de8, completionTag=0x7ffc77118d80 "") at utility.c:790
#17 0x00000000007b1689 in PortalRunUtility (portal=0x2247c38, pstmt=0x21e2cf0,
    isTopLevel=<value optimized out>, setHoldSnapshot=<value optimized out>,
    dest=0x21e2de8, completionTag=<value optimized out>) at pquery.c:1175
#18 0x00000000007b2611 in PortalRunMulti (portal=0x2247c38, isTopLevel=true,
    setHoldSnapshot=false, dest=0x21e2de8, altdest=0x21e2de8,
    completionTag=0x7ffc77118d80 "") at pquery.c:1328
#19 0x00000000007b2eb0 in PortalRun (portal=0x2247c38,
    count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x21e2de8,
    altdest=0x21e2de8, completionTag=0x7ffc77118d80 "") at pquery.c:796
#20 0x00000000007af2ab in exec_simple_query (
    query_string=0x21e1f18 "REINDEX TABLE pg_class;") at postgres.c:1215

So basically, the problem here lies in trying to re-verify uniqueness
of pg_class's indexes --- there could easily be entries in pg_class that
haven't committed yet.

I don't think there's an easy way to make this not deadlock against
concurrent DDL.  For sure I don't want to disable the uniqueness
checks.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
On 2019-04-30 11:51:10 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-30 00:50:20 -0400, Tom Lane wrote:
> > I suspect the problem isn't REINDEX INDEX in general, it's REINDEX INDEX
> > over catalog tables modified during reindex.
>
> So far, every one of the failures in the buildfarm looks like the REINDEX
> is deciding that it needs to wait for some other transaction, eg here
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-30%2014%3A43%3A11
>
> the relevant bit of postmaster log is
>
> 2019-04-30 14:44:13.478 UTC [16135:450] pg_regress/create_index LOG:  statement: REINDEX TABLE pg_class;
> 2019-04-30 14:44:14.478 UTC [16137:430] pg_regress/create_view LOG:  process 16137 detected deadlock while waiting
forAccessShareLock on relation 2662 of database 16384 after 1000.148 ms
 
> 2019-04-30 14:44:14.478 UTC [16137:431] pg_regress/create_view DETAIL:  Process holding the lock: 16135. Wait queue:
.
> 2019-04-30 14:44:14.478 UTC [16137:432] pg_regress/create_view STATEMENT:  DROP SCHEMA temp_view_test CASCADE;
> 2019-04-30 14:44:14.478 UTC [16137:433] pg_regress/create_view ERROR:  deadlock detected
> 2019-04-30 14:44:14.478 UTC [16137:434] pg_regress/create_view DETAIL:  Process 16137 waits for AccessShareLock on
relation2662 of database 16384; blocked by process 16135.
 
>     Process 16135 waits for ShareLock on transaction 2875; blocked by process 16137.
>     Process 16137: DROP SCHEMA temp_view_test CASCADE;
>     Process 16135: REINDEX TABLE pg_class;
> 2019-04-30 14:44:14.478 UTC [16137:435] pg_regress/create_view HINT:  See server log for query details.
> 2019-04-30 14:44:14.478 UTC [16137:436] pg_regress/create_view STATEMENT:  DROP SCHEMA temp_view_test CASCADE;
>
> I haven't been able to reproduce this locally yet, but my guess is that
> the REINDEX wants to update some row that was already updated by the
> concurrent transaction, so it has to wait to see if the latter commits
> or not.  And, of course, waiting while holding AccessExclusiveLock on
> any index of pg_class is a Bad Idea (TM).  But I can't quite see why
> we'd be doing something like that during the reindex ...

I've reproduced something similar locally by running "REINDEX INDEX
pg_class_oid_index;" via pgbench. Fails over pretty much immediately.

It's the lock-upgrade problem I theorized about
upthread. ReindexIndex(), via RangeVarCallbackForReindexIndex(), takes a
ShareLock on pg_class, and then goes on to upgrade to RowExclusiveLock
in RelationSetNewRelfilenode(). But at that time another session
obviously can already have the ShareLock and would also want to upgrade.

The same problem exists with reindexing indexes on pg_index.

ReindexTable is also affected. It locks the table with ShareLock, but
then subsidiary routines upgrade to RowExclusiveLock. The way to fix it
would be a bit different than for ReindexIndex(), as the locking happens
via RangeVarGetRelidExtended() directly, rather than in the callback.

There's a somewhat related issue in the new REINDEX CONCURRENTLY. See
https://www.postgresql.org/message-id/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de

Attached is a *hacky* prototype patch that fixes the issues for me. This
is *not* meant as an actual fix, just a demonstration.

Turns out it's not even sufficient to take a ShareRowExclusive for
pg_class. That prevents issues of concurrent REINDEX INDEX
pg_class_oid_index blowing up, but if one runs REINDEX INDEX
pg_class_oid_index; and REINDEX TABLE pg_class; (or just the latter)
concurrently it still blows up, albeit taking longer to do so.

The problem is that other codepaths expect to be able to hold an
AccessShareLock on pg_class, and multiple pg_class indexes
(e.g. catcache initialization which is easy to hit with -C, [1]). If we
were to want this concurrency safe, I think it requires an AEL on at
least pg_class for reindex (I think ShareRowExclusiveLock might suffice
for pg_index).

I'm not sure it's worth fixing this. It's crummy and somewhat fragile
that we'd have we'd have special locking rules for catalog tables. OTOH,
it really also sucks that a lone REINDEX TABLE pg_class; can deadlock
with another session doing nothing more than establishing a connection.

I guess it's not that common, and can be fixed by users by doing an
explicit BEGIN;LOCK pg_class;REINDEX TABLE pg_class;COMMIT;, but that's
not something anybody will know to do.

Pragmatically I don't think there's a meaningful difference between
holding a ShareLock on pg_class + AEL on one or more indexes, to holding
an AEL on pg_class. Just about every pg_class access is through an
index.

Greetings,

Andres Freund


[1]
#6  0x0000561dac7f9a36 in WaitOnLock (locallock=0x561dae101878, owner=0x561dae112ee8) at
/home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1768
#7  0x0000561dac7f869e in LockAcquireExtended (locktag=0x7ffd7a128650, lockmode=1, sessionLock=false, dontWait=false,
reportMemoryError=true,
    locallockp=0x7ffd7a128648) at /home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1050
#8  0x0000561dac7f5c15 in LockRelationOid (relid=2662, lockmode=1) at
/home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:116
#9  0x0000561dac3a3aa2 in relation_open (relationId=2662, lockmode=1) at
/home/andres/src/postgresql/src/backend/access/common/relation.c:56
#10 0x0000561dac422560 in index_open (relationId=2662, lockmode=1) at
/home/andres/src/postgresql/src/backend/access/index/indexam.c:156
#11 0x0000561dac421bbe in systable_beginscan (heapRelation=0x561dae14af80, indexId=2662, indexOK=true,
snapshot=0x561dacd26f80<CatalogSnapshotData>,
 
    nkeys=1, key=0x7ffd7a128760) at /home/andres/src/postgresql/src/backend/access/index/genam.c:364
#12 0x0000561dac982362 in ScanPgRelation (targetRelId=2663, indexOK=true, force_non_historic=false)
    at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:360
#13 0x0000561dac983b18 in RelationBuildDesc (targetRelId=2663, insertIt=true) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1058
#14 0x0000561dac985d24 in RelationIdGetRelation (relationId=2663) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2037
#15 0x0000561dac3a3aac in relation_open (relationId=2663, lockmode=1) at
/home/andres/src/postgresql/src/backend/access/common/relation.c:59
#16 0x0000561dac422560 in index_open (relationId=2663, lockmode=1) at
/home/andres/src/postgresql/src/backend/access/index/indexam.c:156
#17 0x0000561dac976116 in InitCatCachePhase2 (cache=0x561dae13e400, touch_index=true) at
/home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1050
--Type <RET> for more, q to quit, c to continue without paging--
#18 0x0000561dac990134 in InitCatalogCachePhase2 () at
/home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1078
#19 0x0000561dac988955 in RelationCacheInitializePhase3 () at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:3960
#20 0x0000561dac9acdac in InitPostgres (in_dbname=0x561dae111320 "postgres", dboid=0, username=0x561dae0dbaf8 "andres",
useroid=0,out_dbname=0x0,
 
    override_allow_connections=false) at /home/andres/src/postgresql/src/backend/utils/init/postinit.c:1034

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> It's the lock-upgrade problem I theorized about
> upthread. ReindexIndex(), via RangeVarCallbackForReindexIndex(), takes a
> ShareLock on pg_class, and then goes on to upgrade to RowExclusiveLock
> in RelationSetNewRelfilenode(). But at that time another session
> obviously can already have the ShareLock and would also want to upgrade.

Hmm.  Note that this is totally independent of the deadlock mechanism
I reported in my last message on this thread.

I also wonder whether clobber-cache testing would expose cases
we haven't seen that trace to the additional catalog accesses
caused by cache reloads.

> I'm not sure it's worth fixing this.

I am not sure it's even *possible* to fix all these cases.  Even
if we could, it's out of scope for v12 let alone the back branches.

I think the only practical solution is to remove those reindex tests.
Even if we ran them in a script with no concurrent scripts, there'd
be risk of failures against autovacuum, I'm afraid.  Not often, but
often enough to be annoying.

Possibly we could run them in a TAP test that configures a cluster
with autovac disabled?

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-30 14:05:50 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It's the lock-upgrade problem I theorized about
> > upthread. ReindexIndex(), via RangeVarCallbackForReindexIndex(), takes a
> > ShareLock on pg_class, and then goes on to upgrade to RowExclusiveLock
> > in RelationSetNewRelfilenode(). But at that time another session
> > obviously can already have the ShareLock and would also want to upgrade.
> 
> Hmm.  Note that this is totally independent of the deadlock mechanism
> I reported in my last message on this thread.

Yea :(


> > I'm not sure it's worth fixing this.
> 
> I am not sure it's even *possible* to fix all these cases.

I think it's worth fixing the most common ones though.  It sure sucks
that a plain REINDEX TABLE pg_class; isn't safe to run.


> Even if we could, it's out of scope for v12 let alone the back branches.

Unfortunately agreed. It's possible we could come up with a fix to
backpatch after maturing some, but certainly not before the release.


> I think the only practical solution is to remove those reindex tests.
> Even if we ran them in a script with no concurrent scripts, there'd
> be risk of failures against autovacuum, I'm afraid.  Not often, but
> often enough to be annoying.

> Possibly we could run them in a TAP test that configures a cluster
> with autovac disabled?

Hm. Would it be sufficient to instead move them to a non-concurrent
test group, and stick a BEGIN; LOCK pg_class, ....; COMMIT; around it? I
think that ought to make it safe against autovacuum, and theoretically
there shouldn't be any overlapping pg_class/index updates that we'd need
to wait for?

This is a pretty finnicky area of the code, with obviously not enough
test coverage.  I'm inclined to remove them from the back branches, and
try to get them working in master?

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-30 14:05:50 -0400, Tom Lane wrote:
>> Possibly we could run them in a TAP test that configures a cluster
>> with autovac disabled?

> Hm. Would it be sufficient to instead move them to a non-concurrent
> test group, and stick a BEGIN; LOCK pg_class, ....; COMMIT; around it?

Doubt it.  Maybe you could get away with it given that autovacuum and
autoanalyze only do non-transactional updates to pg_class, but that
seems like a pretty shaky assumption.

> This is a pretty finnicky area of the code, with obviously not enough
> test coverage.  I'm inclined to remove them from the back branches, and
> try to get them working in master?

I think trying to get this "working" is a v13 task now.  We've obviously
never tried to stress the case before, so you're neither fixing a
regression nor fixing a new-in-v12 issue.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-30 14:41:00 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-30 14:05:50 -0400, Tom Lane wrote:
> >> Possibly we could run them in a TAP test that configures a cluster
> >> with autovac disabled?
> 
> > Hm. Would it be sufficient to instead move them to a non-concurrent
> > test group, and stick a BEGIN; LOCK pg_class, ....; COMMIT; around it?
> 
> Doubt it.  Maybe you could get away with it given that autovacuum and
> autoanalyze only do non-transactional updates to pg_class, but that
> seems like a pretty shaky assumption.

I was pondering that autovacuum shouldn't play a role because it ought
to never cause a DELETE_IN_PROGRESS, because it shouldn't effect the
OldestXmin horizon. But that reasoning, even if correct, doesn't hold
for analyze, which does (much to my chargrin), holds a full blown
snapshot.


> > This is a pretty finnicky area of the code, with obviously not enough
> > test coverage.  I'm inclined to remove them from the back branches, and
> > try to get them working in master?
> 
> I think trying to get this "working" is a v13 task now.  We've obviously
> never tried to stress the case before, so you're neither fixing a
> regression nor fixing a new-in-v12 issue.

Well, the test *do* test that a previously existing all-branches bug
doesn't exist, no (albeit one just triggering an assert)?  I'm not
talking about making this concurrency safe, just about whether it's
possible to somehow keep the tests.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-30 14:41:00 -0400, Tom Lane wrote:
>> I think trying to get this "working" is a v13 task now.  We've obviously
>> never tried to stress the case before, so you're neither fixing a
>> regression nor fixing a new-in-v12 issue.

> Well, the test *do* test that a previously existing all-branches bug
> doesn't exist, no (albeit one just triggering an assert)?  I'm not
> talking about making this concurrency safe, just about whether it's
> possible to somehow keep the tests.

Well, I told you what I thought was a safe way to run the tests.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-30 15:11:43 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-30 14:41:00 -0400, Tom Lane wrote:
> > > On 2019-04-30 12:03:08 -0700, Andres Freund wrote:
> > > > This is a pretty finnicky area of the code, with obviously not enough
> > > > test coverage.  I'm inclined to remove them from the back branches, and
> > > > try to get them working in master?
> > >
> > > I think trying to get this "working" is a v13 task now.  We've obviously
> > > never tried to stress the case before, so you're neither fixing a
> > > regression nor fixing a new-in-v12 issue.
> 
> > Well, the test *do* test that a previously existing all-branches bug
> > doesn't exist, no (albeit one just triggering an assert)?  I'm not
> > talking about making this concurrency safe, just about whether it's
> > possible to somehow keep the tests.
> 
> Well, I told you what I thought was a safe way to run the tests.

Shrug. I was responding to you talking about "neither fixing a
regression nor fixing a new-in-v12 issue", when I explicitly was talking
about tests for the bug this thread is about. Not sure why "Well, I told
you what I thought was a safe way to run the tests." is a helpful answer
in turn.

I'm not wild to go for a separate TAP test. A separate initdb cycle for
a a tests that takes about 30ms seems a bit over the top.  So I'm
inclined to either try running it in a serial step on the buildfarm
(survived a few dozen cycles with -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE, and a few with -DCLOBBER_CACHE_ALWAYS), or
just remove them alltogether. Or remove it alltogether until we fix
this.  Since you indicated a preference agains the former, I'll remove
it in a bit until I hear otherwise.

I'll add it to my todo list to try to fix the concurrency issues for 13.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I'm not wild to go for a separate TAP test. A separate initdb cycle for
> a a tests that takes about 30ms seems a bit over the top.

Fair enough.

> So I'm
> inclined to either try running it in a serial step on the buildfarm
> (survived a few dozen cycles with -DRELCACHE_FORCE_RELEASE
> -DCATCACHE_FORCE_RELEASE, and a few with -DCLOBBER_CACHE_ALWAYS), or
> just remove them alltogether. Or remove it alltogether until we fix
> this.  Since you indicated a preference agains the former, I'll remove
> it in a bit until I hear otherwise.

> I'll add it to my todo list to try to fix the concurrency issues for 13.

If you're really expecting to have a go at that during the v13 cycle,
I think we could live without these test cases till then.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Just when you thought it was safe to go back in the water ...

markhor just reported in with results showing that we have worse
problems than deadlock-prone tests in the back branches: 9.4
for example looks like

  --
  -- whole tables
  REINDEX TABLE pg_class; -- mapped, non-shared, critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX TABLE pg_index; -- non-mapped, non-shared, critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX TABLE pg_operator; -- non-mapped, non-shared, critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX TABLE pg_database; -- mapped, shared, critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX TABLE pg_shdescription; -- mapped, shared non-critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  -- Check that individual system indexes can be reindexed. That's a bit
  -- different from the entire-table case because reindex_relation
  -- treats e.g. pg_class special.
  REINDEX INDEX pg_class_oid_index; -- mapped, non-shared, critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX INDEX pg_class_relname_nsp_index; -- mapped, non-shared, non-critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX INDEX pg_index_indexrelid_index; -- non-mapped, non-shared, critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX INDEX pg_index_indrelid_index; -- non-mapped, non-shared, non-critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX INDEX pg_database_oid_index; -- mapped, shared, critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
  REINDEX INDEX pg_shdescription_o_c_index; -- mapped, shared, non-critical
+ ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes

No doubt this is triggered by CLOBBER_CACHE_ALWAYS.

Given this, I'm rethinking my position that we can dispense with these
test cases.  Let's try putting them in a standalone test script, and
see whether that leads to failures or not.  Even if it does, we'd
better keep them until we've got a fully clean bill of health from
the buildfarm.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-04-30 18:42:36 -0400, Tom Lane wrote:
> markhor just reported in with results showing that we have worse
> problems than deadlock-prone tests in the back branches: 9.4
> for example looks like

>   -- whole tables
>   REINDEX TABLE pg_class; -- mapped, non-shared, critical
> + ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes

Ugh.  Also failed on 9.6.


> Given this, I'm rethinking my position that we can dispense with these
> test cases.  Let's try putting them in a standalone test script, and
> see whether that leads to failures or not.  Even if it does, we'd
> better keep them until we've got a fully clean bill of health from
> the buildfarm.

Yea. Seems likely this indicates a proper, distinct, bug :/

I'll move the test into a new "reindex_catalog" test, with a comment
explaining that the failure cases necessitating that are somewhere
between bugs, ugly warts, an hard to fix edge cases.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
On 2019-04-30 15:53:07 -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-04-30 18:42:36 -0400, Tom Lane wrote:
> > markhor just reported in with results showing that we have worse
> > problems than deadlock-prone tests in the back branches: 9.4
> > for example looks like
> 
> >   -- whole tables
> >   REINDEX TABLE pg_class; -- mapped, non-shared, critical
> > + ERROR:  could not read block 0 in file "base/16384/27769": read only 0 of 8192 bytes
> 
> Ugh.  Also failed on 9.6.

I see the bug. Turns out we need to figure out another way to solve the
assertion triggered by doing catalog updates within
RelationSetNewRelfilenode() - we can't just move the
SetReindexProcessing() before it.  When CCA is enabled, the
CommandCounterIncrement() near the tail of RelationSetNewRelfilenode()
triggers a rebuild of the catalog entries - but without the
SetReindexProcessing() those scans will try to use the index currently
being rebuilt. Which then predictably fails:

#0  mdread (reln=0x5600aea36498, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7f71037db800 "") at
/home/andres/src/postgresql/src/backend/storage/smgr/md.c:633
#1  0x00005600ae3f656f in smgrread (reln=0x5600aea36498, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7f71037db800 "")
    at /home/andres/src/postgresql/src/backend/storage/smgr/smgr.c:590
#2  0x00005600ae3b4c13 in ReadBuffer_common (smgr=0x5600aea36498, relpersistence=112 'p', forkNum=MAIN_FORKNUM,
blockNum=0,mode=RBM_NORMAL, strategy=0x0, 
 
    hit=0x7fff5bb11cab) at /home/andres/src/postgresql/src/backend/storage/buffer/bufmgr.c:896
#3  0x00005600ae3b44ab in ReadBufferExtended (reln=0x7f7107972540, forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL,
strategy=0x0)
    at /home/andres/src/postgresql/src/backend/storage/buffer/bufmgr.c:664
#4  0x00005600ae3b437f in ReadBuffer (reln=0x7f7107972540, blockNum=0) at
/home/andres/src/postgresql/src/backend/storage/buffer/bufmgr.c:596
#5  0x00005600ae00e0b3 in _bt_getbuf (rel=0x7f7107972540, blkno=0, access=1) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtpage.c:805
#6  0x00005600ae00dd2a in _bt_heapkeyspace (rel=0x7f7107972540) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtpage.c:694
#7  0x00005600ae01679c in _bt_first (scan=0x5600aea44440, dir=ForwardScanDirection) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtsearch.c:1237
#8  0x00005600ae012617 in btgettuple (scan=0x5600aea44440, dir=ForwardScanDirection) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtree.c:247
#9  0x00005600ae005572 in index_getnext_tid (scan=0x5600aea44440, direction=ForwardScanDirection)
    at /home/andres/src/postgresql/src/backend/access/index/indexam.c:550
#10 0x00005600ae00571e in index_getnext_slot (scan=0x5600aea44440, direction=ForwardScanDirection,
slot=0x5600ae9c6ed0)
    at /home/andres/src/postgresql/src/backend/access/index/indexam.c:642
#11 0x00005600ae003e54 in systable_getnext (sysscan=0x5600aea44080) at
/home/andres/src/postgresql/src/backend/access/index/genam.c:450
#12 0x00005600ae564292 in ScanPgRelation (targetRelId=1259, indexOK=true, force_non_historic=false)
    at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:365
#13 0x00005600ae568203 in RelationReloadNailed (relation=0x5600aea0c4d0) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2292
#14 0x00005600ae568621 in RelationClearRelation (relation=0x5600aea0c4d0, rebuild=true) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2425
#15 0x00005600ae569081 in RelationCacheInvalidate () at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2858
#16 0x00005600ae55b32b in InvalidateSystemCaches () at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:649
#17 0x00005600ae55b408 in AcceptInvalidationMessages () at
/home/andres/src/postgresql/src/backend/utils/cache/inval.c:708
#18 0x00005600ae3d7b22 in LockRelationOid (relid=1259, lockmode=1) at
/home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:136
#19 0x00005600adf85ad2 in relation_open (relationId=1259, lockmode=1) at
/home/andres/src/postgresql/src/backend/access/common/relation.c:56
#20 0x00005600ae040337 in table_open (relationId=1259, lockmode=1) at
/home/andres/src/postgresql/src/backend/access/table/table.c:43
#21 0x00005600ae564215 in ScanPgRelation (targetRelId=2662, indexOK=false, force_non_historic=false)
    at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:348
#22 0x00005600ae567ecf in RelationReloadIndexInfo (relation=0x7f7107972540) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2170
#23 0x00005600ae5681d3 in RelationReloadNailed (relation=0x7f7107972540) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2270
#24 0x00005600ae568621 in RelationClearRelation (relation=0x7f7107972540, rebuild=true) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2425
#25 0x00005600ae568d19 in RelationFlushRelation (relation=0x7f7107972540) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2686
#26 0x00005600ae568e32 in RelationCacheInvalidateEntry (relationId=2662) at
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2738
#27 0x00005600ae55b1af in LocalExecuteInvalidationMessage (msg=0x5600aea262a8) at
/home/andres/src/postgresql/src/backend/utils/cache/inval.c:589
#28 0x00005600ae55af06 in ProcessInvalidationMessages (hdr=0x5600aea26250, func=0x5600ae55b0a8
<LocalExecuteInvalidationMessage>)


I can think of several ways to properly fix this:

1) Remove the CommandCounterIncrement() from
   RelationSetNewRelfilenode(), move it to the callers. That would allow
   for, I think, proper sequencing in reindex_index():

   /*
    * Create a new relfilenode - note that this doesn't make the new
    * relfilenode visible yet, we'd otherwise run into danger of that
    * index (which is empty at this point) being used while processing
    * cache invalidations.
    */
   RelationSetNewRelfilenode(iRel, persistence);

   /*
    * Before making the new relfilenode visible, prevent its use of the
    * to-be-reindexed index while building it.
    */
   SetReindexProcessing(heapId, indexId);

   CommandCounterIncrement();


2) Separate out the state for the assertion triggered by
   SetReindexProcessing from the prohibition of the use of the index for
   searches.

3) Turn on REINDEX_REL_SUPPRESS_INDEX_USE mode when reindexing
   pg_class. But that seems like a bigger hammer than necessary?



Sidenote: It'd be pretty helpful to have an option for the buildfarm etc
to turn md.c type errors like this into PANICs.


> > Given this, I'm rethinking my position that we can dispense with these
> > test cases.  Let's try putting them in a standalone test script, and
> > see whether that leads to failures or not.  Even if it does, we'd
> > better keep them until we've got a fully clean bill of health from
> > the buildfarm.
> 
> Yea. Seems likely this indicates a proper, distinct, bug :/
> 
> I'll move the test into a new "reindex_catalog" test, with a comment
> explaining that the failure cases necessitating that are somewhere
> between bugs, ugly warts, an hard to fix edge cases.

Just pushed that.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-30 15:53:07 -0700, Andres Freund wrote:
>> I'll move the test into a new "reindex_catalog" test, with a comment
>> explaining that the failure cases necessitating that are somewhere
>> between bugs, ugly warts, an hard to fix edge cases.

> Just pushed that.

locust is kind of unimpressed:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2019-05-01%2003%3A12%3A13

The relevant bit of log is

2019-05-01 05:24:47.527 CEST [97690:429] pg_regress/create_view LOG:  statement: DROP SCHEMA temp_view_test CASCADE;
2019-05-01 05:24:47.605 CEST [97690:430] pg_regress/create_view LOG:  statement: DROP SCHEMA testviewschm2 CASCADE;
2019-05-01 05:24:47.858 CEST [97694:1] [unknown] LOG:  connection received: host=[local]
2019-05-01 05:24:47.863 CEST [97694:2] [unknown] LOG:  connection authorized: user=pgbuildfarm database=regression
2019-05-01 05:24:47.878 CEST [97694:3] pg_regress/reindex_catalog LOG:  statement: REINDEX TABLE pg_class;
2019-05-01 05:24:48.887 CEST [97694:4] pg_regress/reindex_catalog ERROR:  deadlock detected
2019-05-01 05:24:48.887 CEST [97694:5] pg_regress/reindex_catalog DETAIL:  Process 97694 waits for ShareLock on
transaction2559; blocked by process 97690. 
    Process 97690 waits for RowExclusiveLock on relation 1259 of database 16387; blocked by process 97694.
    Process 97694: REINDEX TABLE pg_class;
    Process 97690: DROP SCHEMA testviewschm2 CASCADE;
2019-05-01 05:24:48.887 CEST [97694:6] pg_regress/reindex_catalog HINT:  See server log for query details.
2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  while checking uniqueness of tuple (12,71)
inrelation "pg_class" 
2019-05-01 05:24:48.887 CEST [97694:8] pg_regress/reindex_catalog STATEMENT:  REINDEX TABLE pg_class;
2019-05-01 05:24:48.904 CEST [97690:431] pg_regress/create_view LOG:  disconnection: session time: 0:00:03.748
user=pgbuildfarmdatabase=regression host=[local] 

which is mighty confusing at first glance, but I think the explanation is
that what the postmaster is reporting is process 97690's *latest* query,
not what it's currently doing.  What it's really currently doing at the
moment of the deadlock is cleaning out its temporary schema after the
client disconnected.  So this says you were careless about where to insert
the reindex_catalog test in the test schedule: it can't be after anything
that creates any temp objects.  That seems like kind of a problem :-(.
We could put it second, after the tablespace test, but that would mean
that we're reindexing after very little churn has happened in the
catalogs, which doesn't seem like much of a stress test.

Another fairly interesting thing is that this log includes the telltale

2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  while checking uniqueness of tuple (12,71)
inrelation "pg_class" 

Why did I have to dig to find that information in HEAD?  Have we lost
some useful context reporting?  (Note this run is in the v10 branch.)

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I see the bug. Turns out we need to figure out another way to solve the
> assertion triggered by doing catalog updates within
> RelationSetNewRelfilenode() - we can't just move the
> SetReindexProcessing() before it.  When CCA is enabled, the
> CommandCounterIncrement() near the tail of RelationSetNewRelfilenode()
> triggers a rebuild of the catalog entries - but without the
> SetReindexProcessing() those scans will try to use the index currently
> being rebuilt.

Yeah.  I think what this demonstrates is that REINDEX INDEX has to have
RelationSetIndexList logic similar to what REINDEX TABLE has, to control
which indexes get updated when while we're rebuilding an index of
pg_class.  In hindsight that seems glaringly obvious ... I wonder how we
missed that when we built that infrastructure for REINDEX TABLE?

I'm pretty sure that infrastructure is my fault, so I'll take a
whack at fixing this.

Did you figure out why this doesn't also happen in HEAD?

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
I wrote:
> Did you figure out why this doesn't also happen in HEAD?

... actually, HEAD *is* broken with CCA, just differently.
I'm on it.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-01 12:20:22 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I see the bug. Turns out we need to figure out another way to solve the
> > assertion triggered by doing catalog updates within
> > RelationSetNewRelfilenode() - we can't just move the
> > SetReindexProcessing() before it.  When CCA is enabled, the
> > CommandCounterIncrement() near the tail of RelationSetNewRelfilenode()
> > triggers a rebuild of the catalog entries - but without the
> > SetReindexProcessing() those scans will try to use the index currently
> > being rebuilt.
> 
> Yeah.  I think what this demonstrates is that REINDEX INDEX has to have
> RelationSetIndexList logic similar to what REINDEX TABLE has, to control
> which indexes get updated when while we're rebuilding an index of
> pg_class.  In hindsight that seems glaringly obvious ... I wonder how we
> missed that when we built that infrastructure for REINDEX TABLE?

I'm not sure this is the right short-term answer. Why isn't it, for now,
sufficient to do what I suggested with RelationSetNewRelfilenode() not
doing the CommandCounterIncrement(), and reindex_index() then doing the
SetReindexProcessing() before a CommandCounterIncrement()? That's like
~10 line code change, and a few more with comments.

There is the danger that the current and above approach basically relies
on there not to be any non-inplace updates during reindex.  But at the
moment code does take care to use inplace updates
(cf. index_update_stats()).

It's not clear to me whether the approach of using
RelationSetIndexList() in reindex_index() would be meaningfully more
robust against non-inplace updates during reindex either - ISTM we'd
just as well skip the necessary index insertions if we hid the index
being rebuilt. Skipping to-be-rebuilt indexes works for
reindex_relation() because they're going to be rebuilt subsequently (and
thus the missing index rows don't matter) - but it'd not work for
reindexing a single index, because it'll not get the result at a later
stage.


> I'm pretty sure that infrastructure is my fault, so I'll take a
> whack at fixing this.
> 
> Did you figure out why this doesn't also happen in HEAD?

It does for me now, at least when just doing a reindex in isolation (CCA
tests would have taken too long last night). I'm not sure why I wasn't
previously able to trigger it and markhor hasn't run yet on master.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
On 2019-05-01 10:06:03 -0700, Andres Freund wrote:
> I'm not sure this is the right short-term answer. Why isn't it, for now,
> sufficient to do what I suggested with RelationSetNewRelfilenode() not
> doing the CommandCounterIncrement(), and reindex_index() then doing the
> SetReindexProcessing() before a CommandCounterIncrement()? That's like
> ~10 line code change, and a few more with comments.
> 
> There is the danger that the current and above approach basically relies
> on there not to be any non-inplace updates during reindex.  But at the
> moment code does take care to use inplace updates
> (cf. index_update_stats()).
> 
> It's not clear to me whether the approach of using
> RelationSetIndexList() in reindex_index() would be meaningfully more
> robust against non-inplace updates during reindex either - ISTM we'd
> just as well skip the necessary index insertions if we hid the index
> being rebuilt. Skipping to-be-rebuilt indexes works for
> reindex_relation() because they're going to be rebuilt subsequently (and
> thus the missing index rows don't matter) - but it'd not work for
> reindexing a single index, because it'll not get the result at a later
> stage.

FWIW, the dirty-hack version (attached) of the CommandCounterIncrement()
approach fixes the issue for a REINDEX pg_class_oid_index; in solation
even when using CCA. Started a whole CCA testrun with it, but the
results of that will obviously not be in quick.

Greetings,

Andres Freund

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-01 10:06:03 -0700, Andres Freund wrote:
>> I'm not sure this is the right short-term answer. Why isn't it, for now,
>> sufficient to do what I suggested with RelationSetNewRelfilenode() not
>> doing the CommandCounterIncrement(), and reindex_index() then doing the
>> SetReindexProcessing() before a CommandCounterIncrement()? That's like
>> ~10 line code change, and a few more with comments.

That looks like a hack to me...

The main thing I'm worried about right now is that I realized that
our recovery from errors in this area is completely hosed, cf
https://www.postgresql.org/message-id/4541.1556736252@sss.pgh.pa.us

The problem with CCA is actually kind of convenient for testing that,
since it means you don't have to inject any new fault to get an error
to be thrown while the index relcache entry is in the needing-to-be-
reverted state.  So I'm going to work on fixing the recovery first.
But I suspect that doing this right will require the more complicated
approach anyway.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-01 10:21:15 -0700, Andres Freund wrote:
> FWIW, the dirty-hack version (attached) of the CommandCounterIncrement()
> approach fixes the issue for a REINDEX pg_class_oid_index; in solation
> even when using CCA. Started a whole CCA testrun with it, but the
> results of that will obviously not be in quick.

Not finished yet, but it got pretty far:

parallel group (5 tests):  create_index_spgist index_including_gist index_including create_view create_index
     create_index                 ... ok       500586 ms
     create_index_spgist          ... ok        86890 ms
     create_view                  ... ok       466512 ms
     index_including              ... ok       150279 ms
     index_including_gist         ... ok       109087 ms
test reindex_catalog              ... ok         2285 ms
parallel group (16 tests):  create_cast roleattributes drop_if_exists create_aggregate vacuum create_am hash_func
selectcreate_function_3 constraints typed_table rolenames errors updatable_views triggers inherit
 

that's where it's at right now:

parallel group (20 tests):  init_privs security_label gin password drop_operator lock gist tablesample spgist


Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2019-05-01 10:06:03 -0700, Andres Freund wrote:
>>> I'm not sure this is the right short-term answer. Why isn't it, for now,
>>> sufficient to do what I suggested with RelationSetNewRelfilenode() not
>>> doing the CommandCounterIncrement(), and reindex_index() then doing the
>>> SetReindexProcessing() before a CommandCounterIncrement()? That's like
>>> ~10 line code change, and a few more with comments.

> That looks like a hack to me...

> The main thing I'm worried about right now is that I realized that
> our recovery from errors in this area is completely hosed, cf
> https://www.postgresql.org/message-id/4541.1556736252@sss.pgh.pa.us

OK, so per the other thread, it seems like the error recovery problem
isn't going to affect this directly.  However, I still don't like this
proposal much; the reason being that it's a rather fundamental change
in the API of RelationSetNewRelfilenode.  This will certainly break
any external callers of that function --- and silently, too.

Admittedly, there might not be any outside callers, but I don't really
like that assumption for something we're going to have to back-patch.

The solution I'm thinking of should have much more localized effects,
basically just in reindex_index and RelationSetNewRelfilenode, which is
why I like it better.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-01 19:41:24 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> On 2019-05-01 10:06:03 -0700, Andres Freund wrote:
> >>> I'm not sure this is the right short-term answer. Why isn't it, for now,
> >>> sufficient to do what I suggested with RelationSetNewRelfilenode() not
> >>> doing the CommandCounterIncrement(), and reindex_index() then doing the
> >>> SetReindexProcessing() before a CommandCounterIncrement()? That's like
> >>> ~10 line code change, and a few more with comments.
> 
> > That looks like a hack to me...
> 
> > The main thing I'm worried about right now is that I realized that
> > our recovery from errors in this area is completely hosed, cf
> > https://www.postgresql.org/message-id/4541.1556736252@sss.pgh.pa.us
> 
> OK, so per the other thread, it seems like the error recovery problem
> isn't going to affect this directly.  However, I still don't like this
> proposal much; the reason being that it's a rather fundamental change
> in the API of RelationSetNewRelfilenode.  This will certainly break
> any external callers of that function --- and silently, too.
> 
> Admittedly, there might not be any outside callers, but I don't really
> like that assumption for something we're going to have to back-patch.

Couldn't we just address that by adding a new
RelationSetNewRelfilenodeInternal() that's then wrapped by
RelationSetNewRelfilenode() which just does
RelationSetNewRelfilenodeInternal();CCI();?

Doesn't have to be ...Internal(), could also be
RelationBeginSetNewRelfilenode() or such.

I'm not sure why you think using CCI() for this purpose is a hack? To me
the ability to have catalog changes only take effect when they're all
done, and the system is ready for them, is one of the core purposes of
the infrastructure?


> The solution I'm thinking of should have much more localized effects,
> basically just in reindex_index and RelationSetNewRelfilenode, which is
> why I like it better.

Well, as I said before, I think hiding the to-be-rebuilt index from the
list of indexes is dangerous too - if somebody added an actual
CatalogUpdate/Insert (rather than inplace_update) anywhere along the
index_build() path, we'd not get an assertion failure anymore, but just
an index without the new entry. And given the fragility with HOT hiding
that a lot of the time, that seems dangerous to me.


Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-01 19:41:24 -0400, Tom Lane wrote:
>> OK, so per the other thread, it seems like the error recovery problem
>> isn't going to affect this directly.  However, I still don't like this
>> proposal much; the reason being that it's a rather fundamental change
>> in the API of RelationSetNewRelfilenode.  This will certainly break
>> any external callers of that function --- and silently, too.

> Couldn't we just address that by adding a new
> RelationSetNewRelfilenodeInternal() that's then wrapped by
> RelationSetNewRelfilenode() which just does
> RelationSetNewRelfilenodeInternal();CCI();?

That's just adding more ugliness ...

>> The solution I'm thinking of should have much more localized effects,
>> basically just in reindex_index and RelationSetNewRelfilenode, which is
>> why I like it better.

> Well, as I said before, I think hiding the to-be-rebuilt index from the
> list of indexes is dangerous too - if somebody added an actual
> CatalogUpdate/Insert (rather than inplace_update) anywhere along the
> index_build() path, we'd not get an assertion failure anymore, but just
> an index without the new entry. And given the fragility with HOT hiding
> that a lot of the time, that seems dangerous to me.

I think that argument is pretty pointless considering that "REINDEX TABLE
pg_class" does it this way, and that code is nearly old enough to vote.
Perhaps there'd be value in rewriting things so that we don't need
RelationSetIndexList at all, but it's not real clear to me what we'd do
instead, and in any case I don't agree with back-patching such a change.
In the near term it seems better to me to make "REINDEX INDEX
some-pg_class-index" handle this problem the same way "REINDEX TABLE
pg_class" has been doing for many years.

Attached is a draft patch for this.  It passes check-world with
xxx_FORCE_RELEASE, and gets through reindexing pg_class with
CLOBBER_CACHE_ALWAYS, but I've not completed a full CCA run.

            regards, tom lane

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ce02410..1af959c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3261,6 +3261,9 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
                 heapRelation;
     Oid            heapId;
     IndexInfo  *indexInfo;
+    bool        is_pg_class;
+    List       *allIndexIds = NIL;
+    List       *otherIndexIds = NIL;
     volatile bool skipped_constraint = false;
     PGRUsage    ru0;

@@ -3331,19 +3334,52 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
     }

     /*
-     * Build a new physical relation for the index. Need to do that before
-     * "officially" starting the reindexing with SetReindexProcessing -
-     * otherwise the necessary pg_class changes cannot be made with
-     * encountering assertions.
+     * RelationSetNewRelfilenode will need to update the index's pg_class row.
+     * If we are reindexing some index of pg_class, that creates a problem;
+     * once we call SetReindexProcessing, the index support will complain if
+     * we try to insert a new index entry.  But we can't do that in the other
+     * order without creating other problems.  We solve this by temporarily
+     * removing the target index from pg_class's index list.  It won't get
+     * updated during RelationSetNewRelfilenode, but that's fine because the
+     * subsequent index build will include the new entry.  (There are more
+     * comments about this in reindex_relation.)
+     *
+     * If we are doing one index for reindex_relation, then we will find that
+     * the index is already not present in the index list.  In that case we
+     * don't have to do anything to the index list here, which we mark by
+     * clearing is_pg_class.
      */
-    RelationSetNewRelfilenode(iRel, persistence);
+    is_pg_class = (RelationGetRelid(heapRelation) == RelationRelationId);
+    if (is_pg_class)
+    {
+        allIndexIds = RelationGetIndexList(heapRelation);
+        if (list_member_oid(allIndexIds, indexId))
+        {
+            otherIndexIds = list_delete_oid(list_copy(allIndexIds), indexId);
+            /* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
+            (void) RelationGetIndexAttrBitmap(heapRelation, INDEX_ATTR_BITMAP_ALL);
+        }
+        else
+            is_pg_class = false;
+    }

-    /* ensure SetReindexProcessing state isn't leaked */
+    /*
+     * Ensure SetReindexProcessing state isn't leaked.  (We don't have to
+     * clean up the RelationSetIndexList state on error, though; transaction
+     * abort knows about fixing that.)
+     */
     PG_TRY();
     {
         /* Suppress use of the target index while rebuilding it */
         SetReindexProcessing(heapId, indexId);

+        /* ... and suppress updating it too, if necessary */
+        if (is_pg_class)
+            RelationSetIndexList(heapRelation, otherIndexIds);
+
+        /* Build a new physical relation for the index */
+        RelationSetNewRelfilenode(iRel, persistence);
+
         /* Initialize the index and rebuild */
         /* Note: we do not need to re-establish pkey setting */
         index_build(heapRelation, iRel, indexInfo, true, true);
@@ -3357,6 +3393,9 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
     PG_END_TRY();
     ResetReindexProcessing();

+    if (is_pg_class)
+        RelationSetIndexList(heapRelation, allIndexIds);
+
     /*
      * If the index is marked invalid/not-ready/dead (ie, it's from a failed
      * CREATE INDEX CONCURRENTLY, or a DROP INDEX CONCURRENTLY failed midway),

Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-01 22:01:53 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Well, as I said before, I think hiding the to-be-rebuilt index from the
> > list of indexes is dangerous too - if somebody added an actual
> > CatalogUpdate/Insert (rather than inplace_update) anywhere along the
> > index_build() path, we'd not get an assertion failure anymore, but just
> > an index without the new entry. And given the fragility with HOT hiding
> > that a lot of the time, that seems dangerous to me.
> 
> I think that argument is pretty pointless considering that "REINDEX TABLE
> pg_class" does it this way, and that code is nearly old enough to
> vote.

IMO the reindex_relation() case isn't comparable. By my read the main
purpose there is to prevent inserting into not-yet-rebuilt indexes. The
relevant comment says:
     * ....  If we are processing pg_class itself, we want to make sure
     * that the updates do not try to insert index entries into indexes we
     * have not processed yet.  (When we are trying to recover from corrupted
     * indexes, that could easily cause a crash.)

Note the *not processed yet* bit.  That's *not* comparable logic to
hiding the index that *already* has been rebuilt, in the middle of
reindex_index().  Yes, the way reindex_relation() is currently coded,
the RelationSetIndexList() *also* hides the already rebuilt index, but
that's hard for reindex_relation() to avoid, because it's outside of
reindex_index().


> +     * If we are doing one index for reindex_relation, then we will find that
> +     * the index is already not present in the index list.  In that case we
> +     * don't have to do anything to the index list here, which we mark by
> +     * clearing is_pg_class.
>       */

> -    RelationSetNewRelfilenode(iRel, persistence);
> +    is_pg_class = (RelationGetRelid(heapRelation) == RelationRelationId);
> +    if (is_pg_class)
> +    {
> +        allIndexIds = RelationGetIndexList(heapRelation);
> +        if (list_member_oid(allIndexIds, indexId))
> +        {
> +            otherIndexIds = list_delete_oid(list_copy(allIndexIds), indexId);
> +            /* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
> +            (void) RelationGetIndexAttrBitmap(heapRelation, INDEX_ATTR_BITMAP_ALL);
> +        }
> +        else
> +            is_pg_class = false;
> +    }

That's not pretty either :(

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-01 22:01:53 -0400, Tom Lane wrote:
>> I think that argument is pretty pointless considering that "REINDEX TABLE
>> pg_class" does it this way, and that code is nearly old enough to
>> vote.

> IMO the reindex_relation() case isn't comparable.

IMV it's the exact same case: we need to perform a pg_class update while
one or more of pg_class's indexes shouldn't be touched.  I am kind of
wondering why it didn't seem to be necessary to cover this for REINDEX
INDEX back in 2003, but it clearly is necessary now.

> That's not pretty either :(

So, I don't like your patch, you don't like mine.  Anybody else
want to weigh in?

We do not have the luxury of time to argue about this.  If we commit
something today, we *might* get a useful set of CLOBBER_CACHE_ALWAYS
results for all branches by Sunday.  Those regression tests will have to
come out of the back branches on Sunday, because we are not shipping minor
releases with unstable regression tests, and I've heard no proposal for
avoiding the occasional-deadlock problem.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-01 00:43:34 -0400, Tom Lane wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2019-05-01%2003%3A12%3A13
> 
> The relevant bit of log is
> 
> 2019-05-01 05:24:47.527 CEST [97690:429] pg_regress/create_view LOG:  statement: DROP SCHEMA temp_view_test CASCADE;
> 2019-05-01 05:24:47.605 CEST [97690:430] pg_regress/create_view LOG:  statement: DROP SCHEMA testviewschm2 CASCADE;
> 2019-05-01 05:24:47.858 CEST [97694:1] [unknown] LOG:  connection received: host=[local]
> 2019-05-01 05:24:47.863 CEST [97694:2] [unknown] LOG:  connection authorized: user=pgbuildfarm database=regression
> 2019-05-01 05:24:47.878 CEST [97694:3] pg_regress/reindex_catalog LOG:  statement: REINDEX TABLE pg_class;
> 2019-05-01 05:24:48.887 CEST [97694:4] pg_regress/reindex_catalog ERROR:  deadlock detected
> 2019-05-01 05:24:48.887 CEST [97694:5] pg_regress/reindex_catalog DETAIL:  Process 97694 waits for ShareLock on
transaction2559; blocked by process 97690.
 
>     Process 97690 waits for RowExclusiveLock on relation 1259 of database 16387; blocked by process 97694.
>     Process 97694: REINDEX TABLE pg_class;
>     Process 97690: DROP SCHEMA testviewschm2 CASCADE;
> 2019-05-01 05:24:48.887 CEST [97694:6] pg_regress/reindex_catalog HINT:  See server log for query details.
> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  while checking uniqueness of tuple
(12,71)in relation "pg_class"
 
> 2019-05-01 05:24:48.887 CEST [97694:8] pg_regress/reindex_catalog STATEMENT:  REINDEX TABLE pg_class;
> 2019-05-01 05:24:48.904 CEST [97690:431] pg_regress/create_view LOG:  disconnection: session time: 0:00:03.748
user=pgbuildfarmdatabase=regression host=[local]
 
> 
> which is mighty confusing at first glance, but I think the explanation is
> that what the postmaster is reporting is process 97690's *latest* query,
> not what it's currently doing.  What it's really currently doing at the
> moment of the deadlock is cleaning out its temporary schema after the
> client disconnected.  So this says you were careless about where to insert
> the reindex_catalog test in the test schedule: it can't be after anything
> that creates any temp objects.  That seems like kind of a problem :-(.
> We could put it second, after the tablespace test, but that would mean
> that we're reindexing after very little churn has happened in the
> catalogs, which doesn't seem like much of a stress test.

I'm inclined to remove the tests from the backbranches, once we've
committed a fix for the actual REINDEX issue, and most of the farm has
been through a cycle or three. I don't think we'll figure out how to
make them robust in time for next week's release.

I don't think we can really rely on the post-disconnect phase completing
in a particularly deterministic time. I was wondering for a second
whether we could just trigger the cleanup of temp tables in the test
group before the reindex_catalog table with an explicit DISCARD, but
that seems might fragile too.


Obviously not something trivially changable, and never even remotely
backpatchable, but once more I'm questioning the wisdom of all the
early-release logic we have for catalog tables...


> Another fairly interesting thing is that this log includes the telltale
> 
> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  while checking uniqueness of tuple
(12,71)in relation "pg_class"
 
> 
> Why did I have to dig to find that information in HEAD?  Have we lost
> some useful context reporting?  (Note this run is in the v10 branch.)

Hm. There's still code for it. And I found another run on HEAD still
showing it
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2019-05-01%2010%3A45%3A00

+ERROR:  deadlock detected
+DETAIL:  Process 13455 waits for ShareLock on transaction 2986; blocked by process 16881.
+Process 16881 waits for RowExclusiveLock on relation 1259 of database 16384; blocked by process 13455.
+HINT:  See server log for query details.
+CONTEXT:  while checking uniqueness of tuple (39,35) in relation "pg_class"

What made you think it's not present on HEAD?

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-01 00:43:34 -0400, Tom Lane wrote:
>> ...  What it's really currently doing at the
>> moment of the deadlock is cleaning out its temporary schema after the
>> client disconnected.

> I'm inclined to remove the tests from the backbranches, once we've
> committed a fix for the actual REINDEX issue, and most of the farm has
> been through a cycle or three. I don't think we'll figure out how to
> make them robust in time for next week's release.

Yeah, as I just said in my other message, I see no other alternative for
next week's releases.  We can leave the test in place in HEAD a bit
longer, but I don't really want it there for the beta either, unless we
can think of some better plan.

> I don't think we can really rely on the post-disconnect phase completing
> in a particularly deterministic time.

Exactly :-(

>> Another fairly interesting thing is that this log includes the telltale
>> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  while checking uniqueness of tuple
(12,71)in relation "pg_class" 
>> Why did I have to dig to find that information in HEAD?  Have we lost
>> some useful context reporting?  (Note this run is in the v10 branch.)

FWIW, as best I can reconstruct the sequence of events, I might just
not've looked.  I got an error and just assumed it was the same as what
we'd seen in the buildfarm; but now we realize that there were multiple
ways to get deadlocks, and only some of them would have shown this.
For the moment I'm willing to assume this isn't a real issue.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-02 10:49:00 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-05-01 22:01:53 -0400, Tom Lane wrote:
> >> I think that argument is pretty pointless considering that "REINDEX TABLE
> >> pg_class" does it this way, and that code is nearly old enough to
> >> vote.
>
> > IMO the reindex_relation() case isn't comparable.
>
> IMV it's the exact same case: we need to perform a pg_class update while
> one or more of pg_class's indexes shouldn't be touched.  I am kind of
> wondering why it didn't seem to be necessary to cover this for REINDEX
> INDEX back in 2003, but it clearly is necessary now.
>
> > That's not pretty either :(
>
> So, I don't like your patch, you don't like mine.  Anybody else
> want to weigh in?

Well, I think I can live with your fix. I think it's likely to hide
future bugs, but this is an active bug. And, as you say, we don't have a
lot of time.


ISTM that if we go down this path, we should split (not now, but either
still in v12, or *early* in v13), the sets of indexes that are intended
to a) not being used for catalog queries b) may be skipped for index
insertions. It seems pretty likely that somebody will otherwise soon
introduce an heap_update() somewhere into the index build process, and
it'll work just fine in testing due to HOT.


We already have somewhat separate and half complimentary mechanisms
here:
1) When REINDEX_REL_SUPPRESS_INDEX_USE is set (only cluster.c), we mark
   indexes on tables as unused by SetReindexPending(). That prevents them
   from being used for catalog queries. But it disallows new inserts
   into them.

2) When reindex_index() starts processing an index, it marks it as being
   processed. Indexes on this list are not alowed to be inserted to
   (enforced by assertions).  Note that this currently removes the
   specific index from the list set by 1).

   It also marks the heap as being reindexed, which then triggers (as
   the sole effect afaict), some special case logic in
   index_update_stats(), that avoids the syscache and opts for a direct
   manual catalog scan. I'm a bit confused as to why that's necessary.

3) Just for pg_class, reindex_relation(), just hard-sets the list of
   indexes that are alreday rebuilt. This allows index insertions into
   the the indexes that are later going to be rebuilt - which is
   necessary because we currently update pg_class in
   RelationSetNewRelfilenode().

Seems odd to resort to RelationSetIndexList(), when we could just mildly
extend the SetReindexPending() logic instead.

I kinda wonder if there's not a third approach hiding somewhere here. We
could just stop updating pg_class in RelationSetNewRelfilenode() in
pg_class, when it's an index on pg_class. The pg_class changes for
mapped indexes done aren't really crucial, and are going to be
overwritten later by index_update_stats().  That'd have the big
advantage that we'd afaict not need the logic of having to allow
catalog modifications at all during the reindex path at all.


> We do not have the luxury of time to argue about this.  If we commit
> something today, we *might* get a useful set of CLOBBER_CACHE_ALWAYS
> results for all branches by Sunday.

Yea. I think I'll also just trigger a manual CCA run of check-world for
all branches (thank god for old workstations). And CCR for at least a
few crucial bits.


> Those regression tests will have to come out of the back branches on
> Sunday, because we are not shipping minor releases with unstable
> regression tests, and I've heard no proposal for avoiding the
> occasional-deadlock problem.

Yea, I've just proposed the same in a separate thread.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> ISTM that if we go down this path, we should split (not now, but either
> still in v12, or *early* in v13), the sets of indexes that are intended
> to a) not being used for catalog queries b) may be skipped for index
> insertions. It seems pretty likely that somebody will otherwise soon
> introduce an heap_update() somewhere into the index build process, and
> it'll work just fine in testing due to HOT.

Given the assertions you added in CatalogIndexInsert, I'm not sure
why that's a big hazard?

> I kinda wonder if there's not a third approach hiding somewhere here. We
> could just stop updating pg_class in RelationSetNewRelfilenode() in
> pg_class, when it's an index on pg_class.

Hmm ... are all those indexes mapped?  I guess so.  But don't we need
to worry about resetting relfrozenxid?

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > ISTM that if we go down this path, we should split (not now, but either
> > still in v12, or *early* in v13), the sets of indexes that are intended
> > to a) not being used for catalog queries b) may be skipped for index
> > insertions. It seems pretty likely that somebody will otherwise soon
> > introduce an heap_update() somewhere into the index build process, and
> > it'll work just fine in testing due to HOT.
> 
> Given the assertions you added in CatalogIndexInsert, I'm not sure
> why that's a big hazard?

Afaict the new RelationSetIndexList() trickery would prevent that
assertion from being reached, because RelationGetIndexList() will not
see the current index, and therefore CatalogIndexInsert() won't know to
assert it either.  It kinda works today for reindex_relation(), because
we'll "un-hide" the already rebuilt indexes - i.e. we'd not notice the
bug on pg_class' first index, but for later ones it'd trigger.  I guess
you could argue that we'll just have to rely on REINDEX TABLE pg_class
regression tests to make sure REINDEX INDEX pg_class_* ain't broken :/.


> > I kinda wonder if there's not a third approach hiding somewhere here. We
> > could just stop updating pg_class in RelationSetNewRelfilenode() in
> > pg_class, when it's an index on pg_class.
> 
> Hmm ... are all those indexes mapped?  I guess so.

They are:

postgres[13357][1]=# SELECT oid::regclass, relfilenode FROM pg_class WHERE oid IN (SELECT indexrelid FROM pg_index
WHEREindrelid = 'pg_class'::regclass);
 
┌───────────────────────────────────┬─────────────┐
│                oid                │ relfilenode │
├───────────────────────────────────┼─────────────┤
│ pg_class_oid_index                │           0 │
│ pg_class_relname_nsp_index        │           0 │
│ pg_class_tblspc_relfilenode_index │           0 │
└───────────────────────────────────┴─────────────┘
(3 rows)

I guess that doesn't stricly have to be the case for at least some of
them, but it seems unlikely that we'd want to change that.


> But don't we need to worry about resetting relfrozenxid?

Indexes don't have that though? We couldn't do it for pg_class itself,
but that's not a problem here.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
>> But don't we need to worry about resetting relfrozenxid?

> Indexes don't have that though? We couldn't do it for pg_class itself,
> but that's not a problem here.

Hmm.  Again, that seems like the sort of assumption that could bite
us later.  But maybe we could add some assertions that the new values
match the old?  I'll experiment.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
>>> But don't we need to worry about resetting relfrozenxid?

>> Indexes don't have that though? We couldn't do it for pg_class itself,
>> but that's not a problem here.

> Hmm.  Again, that seems like the sort of assumption that could bite
> us later.  But maybe we could add some assertions that the new values
> match the old?  I'll experiment.

Huh, this actually seems to work.  The attached is a quick hack for
testing.  It gets through check-world straight up and with
xxx_FORCE_RELEASE, and I've verified reindexing pg_class works with
CLOBBER_CACHE_ALWAYS, but it'll be a few hours before I have a full CCA
run.

One interesting thing that turns up in check-world is that if wal_level
is minimal, we have to manually force an XID to be assigned, else
reindexing pg_class fails with "cannot commit a transaction that deleted
files but has no xid" :-(.  Perhaps there's some other cleaner place to
do that?

If we go this path, we should remove RelationSetIndexList altogether
(in HEAD), but I've not done so here.  The comments likely need more
work too.

I have to go out and do some errands for the next few hours, so I can't
push this forward any more right now.

            regards, tom lane

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ce02410..508a04e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3330,20 +3330,15 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
         indexInfo->ii_ExclusionStrats = NULL;
     }

-    /*
-     * Build a new physical relation for the index. Need to do that before
-     * "officially" starting the reindexing with SetReindexProcessing -
-     * otherwise the necessary pg_class changes cannot be made with
-     * encountering assertions.
-     */
-    RelationSetNewRelfilenode(iRel, persistence);
-
     /* ensure SetReindexProcessing state isn't leaked */
     PG_TRY();
     {
         /* Suppress use of the target index while rebuilding it */
         SetReindexProcessing(heapId, indexId);

+        /* Build a new physical relation for the index */
+        RelationSetNewRelfilenode(iRel, persistence);
+
         /* Initialize the index and rebuild */
         /* Note: we do not need to re-establish pkey setting */
         index_build(heapRelation, iRel, indexInfo, true, true);
@@ -3491,7 +3486,6 @@ reindex_relation(Oid relid, int flags, int options)
     Relation    rel;
     Oid            toast_relid;
     List       *indexIds;
-    bool        is_pg_class;
     bool        result;
     int            i;

@@ -3527,32 +3521,8 @@ reindex_relation(Oid relid, int flags, int options)
      */
     indexIds = RelationGetIndexList(rel);

-    /*
-     * reindex_index will attempt to update the pg_class rows for the relation
-     * and index.  If we are processing pg_class itself, we want to make sure
-     * that the updates do not try to insert index entries into indexes we
-     * have not processed yet.  (When we are trying to recover from corrupted
-     * indexes, that could easily cause a crash.) We can accomplish this
-     * because CatalogTupleInsert/CatalogTupleUpdate will use the relcache's
-     * index list to know which indexes to update. We just force the index
-     * list to be only the stuff we've processed.
-     *
-     * It is okay to not insert entries into the indexes we have not processed
-     * yet because all of this is transaction-safe.  If we fail partway
-     * through, the updated rows are dead and it doesn't matter whether they
-     * have index entries.  Also, a new pg_class index will be created with a
-     * correct entry for its own pg_class row because we do
-     * RelationSetNewRelfilenode() before we do index_build().
-     */
-    is_pg_class = (RelationGetRelid(rel) == RelationRelationId);
-
-    /* Ensure rd_indexattr is valid; see comments for RelationSetIndexList */
-    if (is_pg_class)
-        (void) RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_ALL);
-
     PG_TRY();
     {
-        List       *doneIndexes;
         ListCell   *indexId;
         char        persistence;

@@ -3580,15 +3550,11 @@ reindex_relation(Oid relid, int flags, int options)
             persistence = rel->rd_rel->relpersistence;

         /* Reindex all the indexes. */
-        doneIndexes = NIL;
         i = 1;
         foreach(indexId, indexIds)
         {
             Oid            indexOid = lfirst_oid(indexId);

-            if (is_pg_class)
-                RelationSetIndexList(rel, doneIndexes);
-
             reindex_index(indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
                           persistence, options);

@@ -3597,9 +3563,6 @@ reindex_relation(Oid relid, int flags, int options)
             /* Index should no longer be in the pending list */
             Assert(!ReindexIsProcessingIndex(indexOid));

-            if (is_pg_class)
-                doneIndexes = lappend_oid(doneIndexes, indexOid);
-
             /* Set index rebuild count */
             pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
                                          i);
@@ -3615,9 +3578,6 @@ reindex_relation(Oid relid, int flags, int options)
     PG_END_TRY();
     ResetReindexPending();

-    if (is_pg_class)
-        RelationSetIndexList(rel, indexIds);
-
     /*
      * Close rel, but continue to hold the lock.
      */
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 90ff8cc..b51a53d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3463,9 +3463,6 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
      */
     RelationDropStorage(relation);

-    /* initialize new relfilenode from old relfilenode */
-    newrnode = relation->rd_node;
-
     /*
      * Create storage for the main fork of the new relfilenode. If it's
      * table-like object, call into table AM to do so, which'll also create
@@ -3479,15 +3476,6 @@ RelationSetNewRelfilenode(Relation relation, char persistence)

     switch (relation->rd_rel->relkind)
     {
-            /* shouldn't be called for these */
-        case RELKIND_VIEW:
-        case RELKIND_COMPOSITE_TYPE:
-        case RELKIND_FOREIGN_TABLE:
-        case RELKIND_PARTITIONED_TABLE:
-        case RELKIND_PARTITIONED_INDEX:
-            elog(ERROR, "should not have storage");
-            break;
-
         case RELKIND_INDEX:
         case RELKIND_SEQUENCE:
             {
@@ -3505,41 +3493,71 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
                                             persistence,
                                             &freezeXid, &minmulti);
             break;
+
+            /* shouldn't be called for anything else */
+        default:
+            elog(ERROR, "should not have storage");
+            break;
     }

     /*
-     * However, if we're dealing with a mapped index, pg_class.relfilenode
-     * doesn't change; instead we have to send the update to the relation
-     * mapper.
+     * If we're dealing with a mapped index, pg_class.relfilenode doesn't
+     * change; instead we have to send the update to the relation mapper.
+     *
+     * For mapped indexes, we don't actually change the pg_class entry at all;
+     * this is essential when reindexing pg_class itself.  That leaves us with
+     * possibly-inaccurate values of relpages etc, but those will be fixed up
+     * later.
      */
     if (RelationIsMapped(relation))
+    {
+        /* Since we're not updating pg_class, these had better not change */
+        Assert(classform->relfrozenxid == freezeXid);
+        Assert(classform->relminmxid == minmulti);
+        Assert(classform->relpersistence == persistence);
+
+        /*
+         * In some code paths it's possible that the tuple update we'd
+         * otherwise do here is the only thing that would assign an XID for
+         * the current transaction.  However, we must have an XID to delete
+         * files, so make sure one is assigned.
+         */
+        (void) GetCurrentTransactionId();
+
+        /* Do the deed */
         RelationMapUpdateMap(RelationGetRelid(relation),
                              newrelfilenode,
                              relation->rd_rel->relisshared,
                              false);
+
+        /* Since we're not updating pg_class, must trigger inval manually */
+        CacheInvalidateRelcache(relation);
+    }
     else
+    {
         classform->relfilenode = newrelfilenode;

-    /* These changes are safe even for a mapped relation */
-    if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
-    {
-        classform->relpages = 0;    /* it's empty until further notice */
-        classform->reltuples = 0;
-        classform->relallvisible = 0;
-    }
-    classform->relfrozenxid = freezeXid;
-    classform->relminmxid = minmulti;
-    classform->relpersistence = persistence;
+        /* These changes are safe even for a mapped relation */
+        if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
+        {
+            classform->relpages = 0;    /* it's empty until further notice */
+            classform->reltuples = 0;
+            classform->relallvisible = 0;
+        }
+        classform->relfrozenxid = freezeXid;
+        classform->relminmxid = minmulti;
+        classform->relpersistence = persistence;

-    CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+        CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+    }

     heap_freetuple(tuple);

     table_close(pg_class, RowExclusiveLock);

     /*
-     * Make the pg_class row change visible, as well as the relation map
-     * change if any.  This will cause the relcache entry to get updated, too.
+     * Make the pg_class row change or relation map change visible.  This will
+     * cause the relcache entry to get updated, too.
      */
     CommandCounterIncrement();


Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-02 12:59:55 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> On 2019-05-02 11:41:28 -0400, Tom Lane wrote:
> >>> But don't we need to worry about resetting relfrozenxid?
> 
> >> Indexes don't have that though? We couldn't do it for pg_class itself,
> >> but that's not a problem here.
> 
> > Hmm.  Again, that seems like the sort of assumption that could bite
> > us later.  But maybe we could add some assertions that the new values
> > match the old?  I'll experiment.

index_create() has
    Assert(relfrozenxid == InvalidTransactionId);
    Assert(relminmxid == InvalidMultiXactId);

I think we should just add the same to reindex_index() (modulo accessing
relcache rather than local vars, of course)?


> Huh, this actually seems to work.  The attached is a quick hack for
> testing.  It gets through check-world straight up and with
> xxx_FORCE_RELEASE, and I've verified reindexing pg_class works with
> CLOBBER_CACHE_ALWAYS, but it'll be a few hours before I have a full CCA
> run.

Great.


> One interesting thing that turns up in check-world is that if wal_level
> is minimal, we have to manually force an XID to be assigned, else
> reindexing pg_class fails with "cannot commit a transaction that deleted
> files but has no xid" :-(.  Perhaps there's some other cleaner place to
> do that?

Hm. We could replace that RecordTransactionCommit() with an xid
assignment or such. But that seems at least as fragile. Or we could
expand the logic we have for LogStandbyInvalidations() a few lines below
the elog to also be able to handle files.  IIRC that was introduced to
handle somewhat related issues about being able to run VACUUM
(containing invalidations) without an xid.


> +     * If we're dealing with a mapped index, pg_class.relfilenode doesn't
> +     * change; instead we have to send the update to the relation mapper.
> +     *
> +     * For mapped indexes, we don't actually change the pg_class entry at all;
> +     * this is essential when reindexing pg_class itself.  That leaves us with
> +     * possibly-inaccurate values of relpages etc, but those will be fixed up
> +     * later.
>       */
>      if (RelationIsMapped(relation))
> +    {
> +        /* Since we're not updating pg_class, these had better not change */
> +        Assert(classform->relfrozenxid == freezeXid);
> +        Assert(classform->relminmxid == minmulti);
> +        Assert(classform->relpersistence == persistence);

Hm. Could we additionally assert that we're dealing with an index? The
above checks will trigger for tables right now, but I'm not sure that'll
always be the case.


> +        /*
> +         * In some code paths it's possible that the tuple update we'd
> +         * otherwise do here is the only thing that would assign an XID for
> +         * the current transaction.  However, we must have an XID to delete
> +         * files, so make sure one is assigned.
> +         */
> +        (void) GetCurrentTransactionId();

Not pretty, but seems tolerable.


> -    /* These changes are safe even for a mapped relation */
> -    if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
> -    {
> -        classform->relpages = 0;    /* it's empty until further notice */
> -        classform->reltuples = 0;
> -        classform->relallvisible = 0;
> -    }
> -    classform->relfrozenxid = freezeXid;
> -    classform->relminmxid = minmulti;
> -    classform->relpersistence = persistence;
> +        /* These changes are safe even for a mapped relation */

You'd probably have noticed that, but this one probably has to go.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-02 12:59:55 -0400, Tom Lane wrote:
>> One interesting thing that turns up in check-world is that if wal_level
>> is minimal, we have to manually force an XID to be assigned, else
>> reindexing pg_class fails with "cannot commit a transaction that deleted
>> files but has no xid" :-(.  Perhaps there's some other cleaner place to
>> do that?

> Hm. We could replace that RecordTransactionCommit() with an xid
> assignment or such. But that seems at least as fragile. Or we could
> expand the logic we have for LogStandbyInvalidations() a few lines below
> the elog to also be able to handle files.  IIRC that was introduced to
> handle somewhat related issues about being able to run VACUUM
> (containing invalidations) without an xid.

Well, that's something we can maybe improve later.  I'm content to leave
the patch as it is for now.

>> if (RelationIsMapped(relation))
>> +    {
>> +        /* Since we're not updating pg_class, these had better not change */
>> +        Assert(classform->relfrozenxid == freezeXid);
>> +        Assert(classform->relminmxid == minmulti);
>> +        Assert(classform->relpersistence == persistence);

> Hm. Could we additionally assert that we're dealing with an index?

Will do.

>> +        /* These changes are safe even for a mapped relation */

> You'd probably have noticed that, but this one probably has to go.

Ah, right.  As I said, I'd not paid much attention to the comments yet.

I just finished a successful run of the core regression tests with CCA.
Given the calendar, I think that's about as much CCA testing as I should
do personally.  I'll make a cleanup pass on this patch and try to get it
pushed within a few hours, if there are not objections.

How do you feel about the other patch to rejigger the order of operations
in CommandCounterIncrement?  I think that's a bug, but it's probably
noncritical for most people.  What I'm leaning towards for that one is
waiting till after the minor releases, then pushing it to all branches.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-02 16:54:11 -0400, Tom Lane wrote:
> I just finished a successful run of the core regression tests with CCA.
> Given the calendar, I think that's about as much CCA testing as I should
> do personally.  I'll make a cleanup pass on this patch and try to get it
> pushed within a few hours, if there are not objections.

Sounds good to me.


> How do you feel about the other patch to rejigger the order of operations
> in CommandCounterIncrement?  I think that's a bug, but it's probably
> noncritical for most people.  What I'm leaning towards for that one is
> waiting till after the minor releases, then pushing it to all branches.

I've not yet have the mental cycles to look more deeply into it. I
thought your explanation why the current way is wrong made sense, but I
wanted to look a bit more into how it came to be how it is now. I agree
that pushing after the minors would make sense, it's too subtle to go
for it now.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-02 16:54:11 -0400, Tom Lane wrote:
>> How do you feel about the other patch to rejigger the order of operations
>> in CommandCounterIncrement?  I think that's a bug, but it's probably
>> noncritical for most people.  What I'm leaning towards for that one is
>> waiting till after the minor releases, then pushing it to all branches.

> I've not yet have the mental cycles to look more deeply into it. I
> thought your explanation why the current way is wrong made sense, but I
> wanted to look a bit more into how it came to be how it is now.

Well, I wrote that code, and I can say pretty confidently that this
failure mode just didn't occur to me at the time.

> I agree
> that pushing after the minors would make sense, it's too subtle to go
> for it now.

It is subtle, and given that it's been there this long, I don't feel
urgency to fix it Right Now.  I think we're already taking plenty of
risk back-patching the REINDEX patch :-(

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-02 16:54:11 -0400, Tom Lane wrote:
>> I just finished a successful run of the core regression tests with CCA.
>> Given the calendar, I think that's about as much CCA testing as I should
>> do personally.  I'll make a cleanup pass on this patch and try to get it
>> pushed within a few hours, if there are not objections.

> Sounds good to me.

Pushed --- hopefully, we have enough time before Sunday that we can get
reasonably complete buildfarm testing.

I did manually verify that all branches get through "reindex table
pg_class" and "reindex index pg_class_oid_index" under
CLOBBER_CACHE_ALWAYS, as well as a normal-mode check-world.  But CCA
world runs seem like a good idea.

As far as a permanent test scheme goes, I noticed while testing that
src/bin/scripts/t/090_reindexdb.pl and
src/bin/scripts/t/091_reindexdb_all.pl seem to be giving us a good
deal of coverage on this already, although of course they never caught the
problem with non-HOT updates, nor any of the deadlock issues.  Still,
it seems like maybe a core regression test that's been lobotomized enough
to be perfectly parallel-safe might not give us more coverage than can
be had there.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Michael Paquier
Дата:
On Thu, May 02, 2019 at 07:18:19PM -0400, Tom Lane wrote:
> I did manually verify that all branches get through "reindex table
> pg_class" and "reindex index pg_class_oid_index" under
> CLOBBER_CACHE_ALWAYS, as well as a normal-mode check-world.  But CCA
> world runs seem like a good idea.

(catching up a bit..)

sidewinder is still pissed of as of HEAD, pointing visibly to f912d7d
as the root cause:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2019-05-03%2021%3A45%3A00

 REINDEX TABLE pg_class; -- mapped, non-shared, critical
+ERROR:  deadlock detected
+DETAIL:  Process 28266 waits for ShareLock on transaction 2988;
 blocked by process 20650.
+Process 20650 waits for RowExclusiveLock on relation 1259 of
 database 16387; blocked by process 28266.
+HINT:  See server log for query details.

I don't think that's sane just before the upcoming release..
--
Michael

Вложения

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> sidewinder is still pissed of as of HEAD, pointing visibly to f912d7d
> as the root cause:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2019-05-03%2021%3A45%3A00

Right, the deadlocks are expected when some previous session is slow about
cleaning out its temp schema.  The plan is to leave that in place till
tomorrow to see if any *other* failure modes turn up.  But it has to come
out before we wrap the releases.

I don't think we discussed exactly what "come out" means.  My thought is
to leave the test scripts in place (so they can be invoked manually with
EXTRA_TESTS) but remove them from the schedule files.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-04 11:04:07 -0400, Tom Lane wrote:
> I don't think we discussed exactly what "come out" means.  My thought is
> to leave the test scripts in place (so they can be invoked manually with
> EXTRA_TESTS) but remove them from the schedule files.

Yea, that sounds sensible. I'll do so by tonight if you don't beat me to
it.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-04 11:04:07 -0400, Tom Lane wrote:
>> I don't think we discussed exactly what "come out" means.  My thought is
>> to leave the test scripts in place (so they can be invoked manually with
>> EXTRA_TESTS) but remove them from the schedule files.

> Yea, that sounds sensible. I'll do so by tonight if you don't beat me to
> it.

On this coast, "tonight" is running into "tomorrow" ... you planning
to do that soon?

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Andres Freund
Дата:
Hi,

On May 5, 2019 8:56:58 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> On 2019-05-04 11:04:07 -0400, Tom Lane wrote:
>>> I don't think we discussed exactly what "come out" means.  My
>thought is
>>> to leave the test scripts in place (so they can be invoked manually
>with
>>> EXTRA_TESTS) but remove them from the schedule files.
>
>> Yea, that sounds sensible. I'll do so by tonight if you don't beat me
>to
>> it.
>
>On this coast, "tonight" is running into "tomorrow" ... you planning
>to do that soon?

I'd planned to finish cooking and eating, and then doing it. Seems like that'd be plenty early?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On May 5, 2019 8:56:58 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On this coast, "tonight" is running into "tomorrow" ... you planning
>> to do that soon?

> I'd planned to finish cooking and eating, and then doing it. Seems like that'd be plenty early?

Sure, dinner can take priority.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
On 2019-05-06 00:00:04 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On May 5, 2019 8:56:58 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> On this coast, "tonight" is running into "tomorrow" ... you planning
> >> to do that soon?
>
> > I'd planned to finish cooking and eating, and then doing it. Seems like that'd be plenty early?
>
> Sure, dinner can take priority.

And pushed.

I've not done so for 12. For one, because there's no imminent release,
and there's plenty reindexing related changes in 12. But also because I
have two vague ideas that might allow us to keep the test in the regular
schedule.

1) Is there a way that we could just use the type of logic we use for
   CREATE INDEX CONCURRENTLY to force waiting for previously started
   sessions, before doing the REINDEXing of pg_class et al?

   I think it'd work to just add a CREATE INDEX CONCURRENTLY in a
   transaction, after taking an exclusive lock on pg_class - but I
   suspect that'd be just as deadlock prone, just for different reasons?

2) Couldn't we just add a simple loop in plpgsql that checks that the
   previous session ended? A simple DO loop around SELECT pid FROM
   pg_stat_activity WHERE datname = current_database() AND pid <>
   pg_backend_pid(); doesn't sound like it'd be too complicated?  That
   wouldn't work in older releases, because we e.g. wouldn't see
   autoanalyze anywhere conveniently.

   I'm afraid there'd still be the issue that an autoanalyze could spin
   up concurrently? And we can't just prevent that by locking pg_class,
   because we'd otherwise just have the same deadlock?


I for sure thought I earlier had an idea that'd actually work. But
either I've lost it, or it didn't actually work. But perhaps somebody
else can come up with something based on the above strawman ideas?

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I for sure thought I earlier had an idea that'd actually work. But
> either I've lost it, or it didn't actually work. But perhaps somebody
> else can come up with something based on the above strawman ideas?

Both of those ideas fail if an autovacuum starts up after you're
done looking.  I still think the only way you could make this
reliable enough for the buildfarm is to do it in a TAP test that's
set up a cluster with autovacuum disabled.  Whether it's worth the
cycles to do so is pretty unclear, since that wouldn't be a terribly
real-world test environment.  (I also wonder whether the existing
TAP tests for reindexdb don't provide largely the same coverage.)

My advice is to let it go until we have time to work on getting rid
of the deadlock issues.  If we're successful at that, it might be
possible to re-enable these tests in the regular regression environment.

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-07 10:50:19 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I for sure thought I earlier had an idea that'd actually work. But
> > either I've lost it, or it didn't actually work. But perhaps somebody
> > else can come up with something based on the above strawman ideas?
> 
> Both of those ideas fail if an autovacuum starts up after you're
> done looking.

Well, that's why I had proposed to basically to first lock pg_class, and
then wait for other sessions. Which'd be fine, except that it'd also
create deadlock risks :(.


> My advice is to let it go until we have time to work on getting rid
> of the deadlock issues.  If we're successful at that, it might be
> possible to re-enable these tests in the regular regression environment.

Yea, that might be right. I'm planning to leave the tests in until a
bunch of the open REINDEX issues are resolved. Not super likely that
it'd break something, but probably worth anyway?

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Yea, that might be right. I'm planning to leave the tests in until a
> bunch of the open REINDEX issues are resolved. Not super likely that
> it'd break something, but probably worth anyway?

The number of deadlock failures is kind of annoying, so I'd rather remove
the tests from HEAD sooner than later.  What issues around that do you
think remain that these tests would be helpful for?

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-07 12:07:37 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Yea, that might be right. I'm planning to leave the tests in until a
> > bunch of the open REINDEX issues are resolved. Not super likely that
> > it'd break something, but probably worth anyway?
> 
> The number of deadlock failures is kind of annoying, so I'd rather remove
> the tests from HEAD sooner than later.  What issues around that do you
> think remain that these tests would be helpful for?

I was wondering about
https://postgr.es/m/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
but perhaps it's too unlikely to break anything the tests would detect
though.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-07 12:07:37 -0400, Tom Lane wrote:
>> The number of deadlock failures is kind of annoying, so I'd rather remove
>> the tests from HEAD sooner than later.  What issues around that do you
>> think remain that these tests would be helpful for?

> I was wondering about
> https://postgr.es/m/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
> but perhaps it's too unlikely to break anything the tests would detect
> though.

Since we don't allow REINDEX CONCURRENTLY on system catalogs, I'm not
seeing any particular overlap there ...

            regards, tom lane



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-07 12:14:43 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-05-07 12:07:37 -0400, Tom Lane wrote:
> >> The number of deadlock failures is kind of annoying, so I'd rather remove
> >> the tests from HEAD sooner than later.  What issues around that do you
> >> think remain that these tests would be helpful for?
> 
> > I was wondering about
> > https://postgr.es/m/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
> > but perhaps it's too unlikely to break anything the tests would detect
> > though.
> 
> Since we don't allow REINDEX CONCURRENTLY on system catalogs, I'm not
> seeing any particular overlap there ...

Well, it rejiggers the way table locks are acquired for all REINDEX
INDEX commands, not just in the CONCURRENTLY. But yea, it's probably
easy to catch issues there on user tables.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

От
Andres Freund
Дата:
Hi,

On 2019-05-07 09:17:11 -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-05-07 12:14:43 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2019-05-07 12:07:37 -0400, Tom Lane wrote:
> > >> The number of deadlock failures is kind of annoying, so I'd rather remove
> > >> the tests from HEAD sooner than later.  What issues around that do you
> > >> think remain that these tests would be helpful for?
> > 
> > > I was wondering about
> > > https://postgr.es/m/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
> > > but perhaps it's too unlikely to break anything the tests would detect
> > > though.
> > 
> > Since we don't allow REINDEX CONCURRENTLY on system catalogs, I'm not
> > seeing any particular overlap there ...
> 
> Well, it rejiggers the way table locks are acquired for all REINDEX
> INDEX commands, not just in the CONCURRENTLY. But yea, it's probably
> easy to catch issues there on user tables.

Pushed now.

Greetings,

Andres Freund



Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-07 09:17:11 -0700, Andres Freund wrote:
>> Well, it rejiggers the way table locks are acquired for all REINDEX
>> INDEX commands, not just in the CONCURRENTLY. But yea, it's probably
>> easy to catch issues there on user tables.

> Pushed now.

OK.  I marked the open issue as closed.

            regards, tom lane