Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Дата
Msg-id 20160510211556.rumt74jrhqhsaxqx@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
On 2016-05-10 13:17:52 -0700, Jeff Janes wrote:
> On Tue, May 10, 2016 at 9:19 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-05-10 08:09:02 -0400, Robert Haas wrote:
> >> On Tue, May 10, 2016 at 3:05 AM, Andres Freund <andres@anarazel.de> wrote:
> >> > The easy way to trigger this problem would be to have an oid wraparound
> >> > - but the WAL shows that that's not the case here.  I've not figured
> >> > that one out entirely (and won't tonight). But I do see WAL records
> >> > like:
> >> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID
4302693
> >> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID
4302693
> 
> Were there any CHECKPOINT_SHUTDOWN records, or any other NEXTOID
> records, between those two records you show?

Yes, check http://www.postgresql.org/message-id/20160510210013.2akn4iee7gl4ycen@alap3.anarazel.de

I think the explanation about how the bug is occuring there makes sense.


> My current test harness updates the scalar count field on every
> iteration, but changes the (probably toasted) text_array field with a
> probability of only 1% each time.  Perhaps making that more likely (by
> changing line 186 of count.pl) would make it easier to trigger the
> bug.  I'll try that in my next iteration of tests.

So my current theory about why the whole thing is kinda hard to
reproduce is that "luck" determines how aggressively the toast table is
vacuumed, and how often it actually succeeds in being vacuumed. You also
need a good bit of bad luck for the hint bits by GetNewOidWithIndex() to
not survive, given that shared_buffers is pretty small *and* checksums
are enabled.

I guess testing with a bigger shared memory and without checksums will
make it easier to hit the bug.

Regards,

Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Perf Benchmarking and regression.