Re: REINDEX INDEX results in a crash for an index of pg_class since9.6

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: REINDEX INDEX results in a crash for an index of pg_class since9.6
Дата
Msg-id 20190502145045.mdrxilmm5bhq55ti@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2019-05-01 00:43:34 -0400, Tom Lane wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2019-05-01%2003%3A12%3A13
> 
> The relevant bit of log is
> 
> 2019-05-01 05:24:47.527 CEST [97690:429] pg_regress/create_view LOG:  statement: DROP SCHEMA temp_view_test CASCADE;
> 2019-05-01 05:24:47.605 CEST [97690:430] pg_regress/create_view LOG:  statement: DROP SCHEMA testviewschm2 CASCADE;
> 2019-05-01 05:24:47.858 CEST [97694:1] [unknown] LOG:  connection received: host=[local]
> 2019-05-01 05:24:47.863 CEST [97694:2] [unknown] LOG:  connection authorized: user=pgbuildfarm database=regression
> 2019-05-01 05:24:47.878 CEST [97694:3] pg_regress/reindex_catalog LOG:  statement: REINDEX TABLE pg_class;
> 2019-05-01 05:24:48.887 CEST [97694:4] pg_regress/reindex_catalog ERROR:  deadlock detected
> 2019-05-01 05:24:48.887 CEST [97694:5] pg_regress/reindex_catalog DETAIL:  Process 97694 waits for ShareLock on
transaction2559; blocked by process 97690.
 
>     Process 97690 waits for RowExclusiveLock on relation 1259 of database 16387; blocked by process 97694.
>     Process 97694: REINDEX TABLE pg_class;
>     Process 97690: DROP SCHEMA testviewschm2 CASCADE;
> 2019-05-01 05:24:48.887 CEST [97694:6] pg_regress/reindex_catalog HINT:  See server log for query details.
> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  while checking uniqueness of tuple
(12,71)in relation "pg_class"
 
> 2019-05-01 05:24:48.887 CEST [97694:8] pg_regress/reindex_catalog STATEMENT:  REINDEX TABLE pg_class;
> 2019-05-01 05:24:48.904 CEST [97690:431] pg_regress/create_view LOG:  disconnection: session time: 0:00:03.748
user=pgbuildfarmdatabase=regression host=[local]
 
> 
> which is mighty confusing at first glance, but I think the explanation is
> that what the postmaster is reporting is process 97690's *latest* query,
> not what it's currently doing.  What it's really currently doing at the
> moment of the deadlock is cleaning out its temporary schema after the
> client disconnected.  So this says you were careless about where to insert
> the reindex_catalog test in the test schedule: it can't be after anything
> that creates any temp objects.  That seems like kind of a problem :-(.
> We could put it second, after the tablespace test, but that would mean
> that we're reindexing after very little churn has happened in the
> catalogs, which doesn't seem like much of a stress test.

I'm inclined to remove the tests from the backbranches, once we've
committed a fix for the actual REINDEX issue, and most of the farm has
been through a cycle or three. I don't think we'll figure out how to
make them robust in time for next week's release.

I don't think we can really rely on the post-disconnect phase completing
in a particularly deterministic time. I was wondering for a second
whether we could just trigger the cleanup of temp tables in the test
group before the reindex_catalog table with an explicit DISCARD, but
that seems might fragile too.


Obviously not something trivially changable, and never even remotely
backpatchable, but once more I'm questioning the wisdom of all the
early-release logic we have for catalog tables...


> Another fairly interesting thing is that this log includes the telltale
> 
> 2019-05-01 05:24:48.887 CEST [97694:7] pg_regress/reindex_catalog CONTEXT:  while checking uniqueness of tuple
(12,71)in relation "pg_class"
 
> 
> Why did I have to dig to find that information in HEAD?  Have we lost
> some useful context reporting?  (Note this run is in the v10 branch.)

Hm. There's still code for it. And I found another run on HEAD still
showing it
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2019-05-01%2010%3A45%3A00

+ERROR:  deadlock detected
+DETAIL:  Process 13455 waits for ShareLock on transaction 2986; blocked by process 16881.
+Process 16881 waits for RowExclusiveLock on relation 1259 of database 16384; blocked by process 13455.
+HINT:  See server log for query details.
+CONTEXT:  while checking uniqueness of tuple (39,35) in relation "pg_class"

What made you think it's not present on HEAD?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Следующее
От: Ibrar Ahmed
Дата:
Сообщение: Re: pgbench - add option to show actual builtin script code