Обсуждение: Intermittent BRIN failures on hyrax and lousyjack
Buildfarm member hyrax has shown this failure twice recently: --- /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/expected/brin.out 2020-01-23 11:10:05.730014075 -0500 +++ /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/brin.out 2020-06-30 03:50:23.651196117 -0400 @@ -490,6 +490,7 @@ INSERT INTO brintest_2 VALUES (numrange(0, 2^1000::numeric)); INSERT INTO brintest_2 VALUES ('(-1, 0)'); SELECT brin_desummarize_range('brinidx', 0); +WARNING: leftover placeholder tuple detected in BRIN index "brinidx", deleting brin_desummarize_range ------------------------ This happened on HEAD: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-06-30%2001%3A41%3A50 and v13: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-06-17%2005%3A50%3A46 and lousyjack has also shown it once on HEAD: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-02%2013%3A03%3A04 The warning is from this bit (commit c655899ba): /* * Because of ShareUpdateExclusive lock, this function shouldn't run * concurrently with summarization. Placeholder tuples can only exist as * leftovers from crashed summarization, so if we detect any, we complain * but proceed. */ if (BrinTupleIsPlaceholder(tup)) ereport(WARNING, (errmsg("leftover placeholder tuple detected in BRIN index \"%s\", deleting", RelationGetRelationName(idxrel)))); Now, there was no preceding crash in these tests, so the comment's claim is evidently a lie. But what's going wrong? The postmaster log provides a strong hint: 2020-06-30 03:50:14.603 EDT [3699:48] pg_regress/brin LOG: statement: SELECT brin_desummarize_range('brinidx', 0); 2020-06-30 03:50:15.894 EDT [3699:49] pg_regress/brin LOG: process 3699 still waiting for ShareUpdateExclusiveLock on relation24795 of database 16384 after 1000.094 ms 2020-06-30 03:50:15.894 EDT [3699:50] pg_regress/brin DETAIL: Process holding the lock: 7237. Wait queue: 3699. 2020-06-30 03:50:15.894 EDT [3699:51] pg_regress/brin STATEMENT: SELECT brin_desummarize_range('brinidx', 0); 2020-06-30 03:50:15.895 EDT [7237:1] ERROR: canceling autovacuum task 2020-06-30 03:50:15.895 EDT [7237:2] CONTEXT: while cleaning up index "brinidx" of relation "public.brintest" automatic vacuum of table "regression.public.brintest" 2020-06-30 03:50:15.899 EDT [3699:52] pg_regress/brin LOG: process 3699 acquired ShareUpdateExclusiveLock on relation 24795of database 16384 after 1001.626 ms 2020-06-30 03:50:15.899 EDT [3699:53] pg_regress/brin STATEMENT: SELECT brin_desummarize_range('brinidx', 0); 2020-06-30 03:50:16.018 EDT [3699:54] pg_regress/brin WARNING: leftover placeholder tuple detected in BRIN index "brinidx",deleting 2020-06-30 03:50:16.031 EDT [3699:55] pg_regress/brin LOG: statement: SELECT brin_summarize_range('brinidx', 0); I think the "crash" actually was the forced autovac cancellation we see here. Thus, the fact that both these animals use CLOBBER_CACHE_ALWAYS is not a direct cause of the failure, though it might contribute to getting the timing right for this to happen. So (1) the comment needs to be adjusted to mention that vacuum cancellation is enough to create the situation, and (2) probably the message needs to be downgraded quite a lot, DEBUG2 or even less. Or maybe we could just delete the quoted stanza altogether. regards, tom lane
On 2020-Jun-30, Tom Lane wrote: > SELECT brin_desummarize_range('brinidx', 0); > +WARNING: leftover placeholder tuple detected in BRIN index "brinidx", deleting > I think the "crash" actually was the forced autovac cancellation we see > here. Thus, the fact that both these animals use CLOBBER_CACHE_ALWAYS is > not a direct cause of the failure, though it might contribute to getting > the timing right for this to happen. Oh, interesting. > So (1) the comment needs to be adjusted to mention that vacuum > cancellation is enough to create the situation, and (2) probably the > message needs to be downgraded quite a lot, DEBUG2 or even less. > Or maybe we could just delete the quoted stanza altogether. Yeah, maybe deleting the whole thing is a decent answer. When I wrote that, I was thinking that it was a very unlikely event, but evidently that's not true. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Jun-30, Tom Lane wrote: >> SELECT brin_desummarize_range('brinidx', 0); >> +WARNING: leftover placeholder tuple detected in BRIN index "brinidx", deleting >> So (1) the comment needs to be adjusted to mention that vacuum >> cancellation is enough to create the situation, and (2) probably the >> message needs to be downgraded quite a lot, DEBUG2 or even less. >> Or maybe we could just delete the quoted stanza altogether. > Yeah, maybe deleting the whole thing is a decent answer. When I wrote > that, I was thinking that it was a very unlikely event, but evidently > that's not true. trilobite (also a CCA animal) just reported one of these, too: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2020-07-09%2008%3A03%3A08 Were you going to fix this, or did you expect me to? If the latter, I lean to just deleting the message. regards, tom lane
On 2020-Jul-09, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2020-Jun-30, Tom Lane wrote: > >> SELECT brin_desummarize_range('brinidx', 0); > >> +WARNING: leftover placeholder tuple detected in BRIN index "brinidx", deleting > > >> So (1) the comment needs to be adjusted to mention that vacuum > >> cancellation is enough to create the situation, and (2) probably the > >> message needs to be downgraded quite a lot, DEBUG2 or even less. > >> Or maybe we could just delete the quoted stanza altogether. > > > Yeah, maybe deleting the whole thing is a decent answer. When I wrote > > that, I was thinking that it was a very unlikely event, but evidently > > that's not true. > > trilobite (also a CCA animal) just reported one of these, too: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2020-07-09%2008%3A03%3A08 > > Were you going to fix this, or did you expect me to? I have a moment now, let me have a go at it. I agree with deleting the message. I think I'll keep the comment, slightly reworded: /* * Placeholder tuples only appear during unfinished summarization, and we * hold SUE lock, so this function cannot run concurrently with * that. Any placeholder tuples that exist must be leftovers from a * crashed or aborted summarization; remove them silently. */ -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Jul-09, Alvaro Herrera wrote: > I have a moment now, let me have a go at it. I agree with deleting the > message. I think I'll keep the comment, slightly reworded: > > /* > * Placeholder tuples only appear during unfinished summarization, and we > * hold SUE lock, so this function cannot run concurrently with > * that. Any placeholder tuples that exist must be leftovers from a > * crashed or aborted summarization; remove them silently. > */ Pushed to all branches. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services