Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process

Поиск
Список
Период
Сортировка
От Keith Fiske
Тема Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Дата
Msg-id CAG1_KcB-L6AMka5NZ+JU8=0o7T9i-dqW0d90rgCnmoaSswBtaA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-bugs
Not sure if this is the same issue originally reported, but the description
sounded the same. I can get this to happen pretty reliably in 9.4beta1 with
the pgtap test suite for one of my extensions -
https://github.com/omniti-labs/mimeo
Doesn't happen at the same test or with the same table every time (usually
after test08 when large write batches complete), but the error in my
postgres logs is the same every time. I've got shared_buffers set fairly
low (128mb) since this is just my test instance I run in a VM (saw you
mention running out of memory). My tap tests pass without any issues in
9.3, 9.2 and 9.1. So far every time I've run them with 9.4 it's failed at
some point when autovacuum kicked in. If you need anymore info let me know.

From postgresql logs:
TRAP: FailedAssertion("!(CritSectionCount == 0 || (CurrentMemoryContext) ==
ErrorContext || (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c",
Line: 670)
2014-06-06 15:54:29 EDT  [] [23575] [53921aa3.5c17-4] db=,user=,e=00000
LOG:  server process (PID 25555) was terminated by signal 6: Aborted
2014-06-06 15:54:29 EDT  [] [23575] [53921aa3.5c17-5] db=,user=,e=00000
DETAIL:  Failed process was running: autovacuum: VACUUM ANALYZE
mimeo.mimeo_source_logdel_test_source_condition_q
2014-06-06 15:54:29 EDT  [] [23575] [53921aa3.5c17-6] db=,user=,e=00000
LOG:  terminating any other active server processes
2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-38]
db=keith,user=keith,e=57P02 WARNING:  terminating connection because of
crash of another server process
2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-39]
db=keith,user=keith,e=57P02 DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-40]
db=keith,user=keith,e=57P02 HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-41]
db=keith,user=keith,e=57P02 CONTEXT:  SQL statement "SELECT
dblink_exec('mimeo_logdel_refresh_mimeo_dest.logdel_test_dest','WITH a AS
(SELECT col2,col1 FROM mimeo.mimeo_source_logdel_test_source2_q ORDER BY
col2,col1 LIMIT ALL) UPDATE mimeo.mimeo_source_logdel_test_source2_q b SET
processed = true FROM a WHERE  a.col2 = b.col2 AND  a.col1 = b.col1')"
    PL/pgSQL function refresh_logdel(text,integer,boolean,boolean,boolean)
line 171 at EXECUTE statement


Back trace:

#0  0x00007f61fd14f037 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f61fd152698 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x000000000079ea87 in ExceptionalCondition
(conditionName=conditionName@entry=0x9554b0 "!(CritSectionCount == 0 ||
(CurrentMemoryContext) == ErrorContext || (MyAuxProcType ==
CheckpointerProcess))",
    errorType=errorType@entry=0x7d94b0 "FailedAssertion",
fileName=fileName@entry=0x95514e "mcxt.c", lineNumber=lineNumber@entry=670)
at assert.c:54
#3  0x00000000007c15e0 in palloc (size=128) at mcxt.c:670
#4  0x00000000007d8923 in psprintf (fmt=0x975589 "base/%u/%u") at
psprintf.c:60
#5  0x00000000006b4c17 in mdopen (behavior=EXTENSION_FAIL,
forknum=MAIN_FORKNUM, reln=0x1107360) at md.c:562
#6  mdopen (reln=0x1107360, forknum=MAIN_FORKNUM, behavior=EXTENSION_FAIL)
at md.c:552
#7  0x00000000006b5d47 in _mdfd_getseg (reln=reln@entry=0x1107360,
forknum=forknum@entry=MAIN_FORKNUM, blkno=blkno@entry=432, skipFsync=0
'\000', behavior=behavior@entry=EXTENSION_FAIL) at md.c:1707
#8  0x00000000006b66b8 in mdwrite (reln=0x1107360, forknum=MAIN_FORKNUM,
blocknum=432, buffer=0x7f61f4cf75c0 "=", skipFsync=<optimized out>) at
md.c:742
#9  0x000000000068dcad in FlushBuffer (reln=0x1107360, buf=0x7f61f3c9a540)
at bufmgr.c:2002
#10 FlushBuffer (buf=0x7f61f3c9a540, reln=<optimized out>) at bufmgr.c:1917
#11 0x000000000068e621 in BufferAlloc (foundPtr=0x7fffef8daa1e
"\272\001\177\006", strategy=0x0, blockNum=0,
forkNum=VISIBILITYMAP_FORKNUM, relpersistence=112 'p', smgr=0x1105aa8) at
bufmgr.c:687
#12 ReadBuffer_common (smgr=0x1105aa8, relpersistence=<optimized out>,
forkNum=forkNum@entry=VISIBILITYMAP_FORKNUM, blockNum=blockNum@entry=0,
mode=RBM_ZERO_ON_ERROR, strategy=0x0,
    hit=hit@entry=0x7fffef8daacf "") at bufmgr.c:335
#13 0x000000000068ecab in ReadBufferExtended (reln=reln@entry=0x7f61fe3e6e70,
forkNum=forkNum@entry=VISIBILITYMAP_FORKNUM, blockNum=blockNum@entry=0,
mode=mode@entry=RBM_ZERO_ON_ERROR,
    strategy=strategy@entry=0x0) at bufmgr.c:254
#14 0x00000000004a6ef5 in vm_readbuf (rel=rel@entry=0x7f61fe3e6e70,
blkno=blkno@entry=0, extend=extend@entry=0 '\000') at visibilitymap.c:563
#15 0x00000000004a7a94 in visibilitymap_test (rel=rel@entry=0x7f61fe3e6e70,
heapBlk=heapBlk@entry=0, buf=buf@entry=0x7fffef8dbc4c) at
visibilitymap.c:354
#16 0x00000000005b3892 in lazy_vacuum_page (onerel=onerel@entry=0x7f61fe3e6e70,
blkno=blkno@entry=0, buffer=buffer@entry=2144, tupindex=<optimized out>,
tupindex@entry=0,
    vacrelstats=vacrelstats@entry=0x10eeaf8,
vmbuffer=vmbuffer@entry=0x7fffef8dbc4c)
at vacuumlazy.c:1245
#17 0x00000000005b3b4f in lazy_vacuum_heap (onerel=onerel@entry=0x7f61fe3e6e70,
vacrelstats=vacrelstats@entry=0x10eeaf8) at vacuumlazy.c:1152
#18 0x00000000005b42ee in lazy_scan_heap (scan_all=0 '\000', nindexes=2,
Irel=<optimized out>, vacrelstats=0x10eeaf8, onerel=0x7f61fe3e6e70) at
vacuumlazy.c:1080
#19 lazy_vacuum_rel (onerel=onerel@entry=0x7f61fe3e6e70,
vacstmt=vacstmt@entry=0x7fffef8dc360, bstrategy=<optimized out>) at
vacuumlazy.c:239
#20 0x00000000005b1e3a in vacuum_rel (relid=relid@entry=1158010,
vacstmt=vacstmt@entry=0x7fffef8dc360, do_toast=do_toast@entry=0 '\000',
for_wraparound=for_wraparound@entry=0 '\000') at vacuum.c:1202
#21 0x00000000005b2c66 in vacuum (vacstmt=vacstmt@entry=0x7fffef8dc360,
relid=<optimized out>, do_toast=do_toast@entry=0 '\000',
bstrategy=<optimized out>, bstrategy@entry=0x114a380, for_wraparound=0
'\000',
    isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:234
#22 0x00000000006515d3 in autovacuum_do_vac_analyze (bstrategy=0x114a380,
tab=0x114a498) at autovacuum.c:2796
#23 do_autovacuum () at autovacuum.c:2326
#24 0x0000000000651b15 in AutoVacWorkerMain (argv=0x0, argc=0) at
autovacuum.c:1678
#25 0x00000000006521c2 in StartAutoVacWorker () at autovacuum.c:1463
---Type <return> to continue, or q <return> to quit---
#26 0x000000000065f89d in StartAutovacuumWorker () at postmaster.c:5188
#27 sigusr1_handler (postgres_signal_arg=<optimized out>) at
postmaster.c:4842
#28 <signal handler called>
#29 0x00007f61fd20ae03 in select () from /lib/x86_64-linux-gnu/libc.so.6
#30 0x000000000045f985 in ServerLoop () at postmaster.c:1530
#31 0x00000000006608ae in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1085910)
at postmaster.c:1223
#32 0x0000000000460c28 in main (argc=3, argv=0x1085910) at main.c:225



--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com


On Thu, Jun 5, 2014 at 5:34 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

> Hi,
>
> On 2014-06-05 21:42:23 +0300, Heikki Linnakangas wrote:
> > I think we'll need a variant of GetMultiXactIdMembers() that only returns
> > the update XID, avoiding the palloc(). The straight-forward fix would be
> to
> > copy-paste contents of GetMultiXactIdMembers() into
> > MultiXactIdGetUpdateXid(), and instead of returning the members in an
> array,
> > only return the update-xid. But it's a long and complicated function, so
> > copy-pasting is not a good option. I think it needs to be refactored into
> > some kind of a helper function that both MultiXactIdGetUpdateXid() and
> > GetMultiXactIdMembers() could call.
>
> While it certainly would be nice to have such a function I have my doubt
> it's realistic to get it for 9.4 and the backbranches.
>
> I haven't thought particularly much about this, but I don't really see
> why the heap_page_is_all_visible() bit needs to be in a critical
> section? Can't we just do that entire bit after the log_heap_clean()?
> Then the heap_page_is_all_visible() can be done outside a critical
> section.
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process