buffer assertion tripping under repeat pgbench load

Поиск
Список
Период
Сортировка
От Greg Smith
Тема buffer assertion tripping under repeat pgbench load
Дата
Msg-id 50D6B48A.1070905@2ndQuadrant.com
обсуждение исходный текст
Ответы Re: buffer assertion tripping under repeat pgbench load  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
I'm testing a checkout from a few days ago and trying to complete a day 
long pgbench stress test, with assertions and debugging on.  I want to 
make sure the base code works as expected before moving on to testing 
checksums.  It's crashing before finishing though.  Here's a sample:

2012-12-20 20:36:11 EST [26613]: LOG:  checkpoint complete: wrote 32 
buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; 
write=3.108 s, sync=0.978 s, total=4.187 s; sync files=7, longest=0.832 
s, average=0.139 s
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", 
Line: 1703)
2012-12-20 20:36:44 EST [26611]: LOG:  server process (PID 4834) was 
terminated by signal 6: Aborted
2012-12-20 20:36:44 EST [26611]: DETAIL:  Failed process was running: END;

Running the same test set again gave the same crash, so this looks 
repeatable.  It's not trivial to trigger given the average runtime to 
crash I'm seeing.  Second sample:

2012-12-22 21:27:17 EST [6006]: LOG:  checkpoint complete: wrote 34 
buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; 
write=3.310 s, sync=2.906 s, total=6.424 s; sync files=7, longest=2.648 
s, average=0.415 s
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", 
Line: 1703)
2012-12-22 21:27:23 EST [26611]: LOG:  server process (PID 12143) was 
terminated by signal 6: Aborted
2012-12-22 21:27:23 EST [26611]: DETAIL:  Failed process was running: END;

The important part:

TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", 
Line: 1703)

That's the check done by "AtEOXact_Buffers - clean up at end of 
transaction".  It fired while executing the "END;"" statement at the end 
of the standard pgbench test.  The test looks for loose things that 
weren't pinned/unpinned correctly by the transaction.  I'm not sure if 
getting a stack trace at that point will be useful.  The ref count 
mistake could easily have been made by an earlier statement in the 
transaction block, right?

This is on a server where shared_buffers=2GB, checkpoint_segments=64. 
Test runs were automated with pgbench-tools, using the following 
configuration:

MAX_WORKERS="4"
SCRIPT="tpc-b.sql"
SCALES="500 1000"
SETCLIENTS="4 8 16 32 64 96"
SETTIMES=3
RUNTIME=600

All of the crashes so far have been at scale=500, and I haven't seen 
those finish yet.  It failed 7 hours in the first time, then 4 hours in.

For code reference, the last commit in the source code tree I built 
against was:

af275a12dfeecd621ed9899a9382f26a68310263
Thu Dec 20 14:23:31 2012 +0200

Looking at recent activity, the only buffer pin related changes I saw 
was this one:

commit 62656617dbe49cca140f3da588a5e311b3fc35ea
Date:   Mon Dec 3 18:53:31 2012 +0000    Avoid holding vmbuffer pin after VACUUM.

This is my first test like this against 9.3 development though, so the 
cause could be an earlier commit.  I'm just starting with the most 
recent work as the first suspect.  Next I think I'll try autovacuum=off 
and see if the crash goes away.  Other ideas are welcome.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: pgcrypto seeding problem when ssl=on
Следующее
От: Andres Freund
Дата:
Сообщение: Re: buffer assertion tripping under repeat pgbench load