Обсуждение: Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

Поиск
Список
Период
Сортировка

Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
"Jim Nasby"
Дата:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > What has been happening is periodic random crashes, around
> 1 a week. I
> > now have a good core for one, as well as an assert:
>
> > TRAP: FailedAssertion("!(shared->page_number[slotno] == pageno &&
> > shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)", File:
> > "slru.c", Line: 308)
>
> > I haven't looked at that code yet, so I have no idea what
> that actually
> > means. Let me know what info y'all would like to see out of
> the core.
>
> The whole contents of *shared and the local variables of
> SimpleLruReadPage would be good for starters.

I know how to get to the SimpleLruReadPage frame, but what commands do I need to use after that?
> Also, what PG version is this exactly, again?

8.0.3.


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
Tom Lane
Дата:
"Jim Nasby" <jnasby@pervasive.com> writes:
>> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>> The whole contents of *shared and the local variables of
>> SimpleLruReadPage would be good for starters.

> I know how to get to the SimpleLruReadPage frame, but what commands do I need to use after that?

p *shared
info locals
        regards, tom lane


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
"Jim C. Nasby"
Дата:
Here's the full info from 2 different cores:

[root@pg8 coredumps]# cat slru.gdb
f 3
p *shared
p pageno
p slotno
p ok
p xid
quit
[root@pg8 coredumps]# gdb -x slru.gdb /usr/bin/postmaster core.25146 |tail -n 13

warning: svr4_current_sos: Can't read pathname for load map: Input/output error

#3  0x000000000047365f in SimpleLruReadPage (ctl=0x7d9f40, pageno=162932, xid=0) at slru.c:307
307                     Assert(shared->page_number[slotno] == pageno &&
$1 = {ControlLock = SubtransControlLock, page_buffer = {0x2a98298380 "", 0x2a9829a380 "",   0x2a9829c380 "",
0x2a9829e380"", 0x2a982a0380 "", 0x2a982a2380 "", 0x2a982a4380 "",   0x2a982a6380 ""}, page_status = {SLRU_PAGE_CLEAN,
SLRU_PAGE_READ_IN_PROGRESS,  SLRU_PAGE_CLEAN, SLRU_PAGE_CLEAN, SLRU_PAGE_DIRTY, SLRU_PAGE_READ_IN_PROGRESS,
SLRU_PAGE_READ_IN_PROGRESS,SLRU_PAGE_CLEAN}, page_number = {162878, 162877, 163050,   162883, 163270, 162761, 162980,
162797},page_lru_count = {8, 2, 5, 1, 139, 4, 0, 3}, buffer_locks = {24, 25, 26, 27, 28, 29, 30, 31},
latest_page_number= 163270}
 
$2 = 162932
$3 = 1
$4 = 1 '\001'
$5 = 0
[root@pg8 coredumps]# gdb -x slru.gdb /usr/bin/postmaster core.32555 |tail -n 13

warning: svr4_current_sos: Can't read pathname for load map: Input/output error

#3  0x000000000047365f in SimpleLruReadPage (ctl=0x7d9f40, pageno=164152, xid=0) at slru.c:307
307                     Assert(shared->page_number[slotno] == pageno &&
$1 = {ControlLock = SubtransControlLock, page_buffer = {0x2a98298380 "", 0x2a9829a380 "",   0x2a9829c380 "",
0x2a9829e380"", 0x2a982a0380 "", 0x2a982a2380 "", 0x2a982a4380 "",   0x2a982a6380 ""}, page_status =
{SLRU_PAGE_READ_IN_PROGRESS,SLRU_PAGE_CLEAN,   SLRU_PAGE_CLEAN, SLRU_PAGE_DIRTY, SLRU_PAGE_CLEAN, SLRU_PAGE_CLEAN,
SLRU_PAGE_CLEAN,  SLRU_PAGE_CLEAN}, page_number = {164145, 164146, 164147, 164153, 164148, 164150, 164151,   164149},
page_lru_count= {0, 1, 2, 106, 5, 7, 8, 6}, buffer_locks = {24, 25, 26, 27, 28,   29, 30, 31}, latest_page_number =
164153}
$2 = 164152
$3 = 0
$4 = 1 '\001'
$5 = 0
[root@pg8 coredumps]#

Also, here's the trace from a 3rd core:

[root@pg8 coredumps]# gdb /usr/bin/postgres core.13897
GNU gdb Red Hat Linux (6.3.0.0-1.63rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".


warning: core file may not match specified executable file.
Core was generated by `gdb -q -fullname /usr/bin/postmaster core.25146'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003b894688e3 in ?? ()
(gdb) bt
#0  0x0000003b894688e3 in ?? ()
#1  0x00000000004f4f20 in ExecReScanHashJoin ()
#2  0x00000000004b593c in DoCopy (stmt=Variable "stmt" is not available.
) at copy.c:767
#3  0x0000000000447190 in _hash_log2 () at hashutil.c:107
#4  0x0000000000000000 in ?? ()
(gdb)

-rw-------  1 root root   29179904 Oct 28 10:08 core.13897
-rw-------  1 root root 1166159872 Oct 28 07:13 core.25146
-rw-------  1 root root 1167413248 Oct 28 09:05 core.32555
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
Tom Lane
Дата:
BTW, what's the stack trace in those two core files?
        regards, tom lane


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
"Jim C. Nasby"
Дата:
On Fri, Oct 28, 2005 at 03:04:02PM -0400, Tom Lane wrote:
> BTW, what's the stack trace in those two core files?

From 25146:
#0  0x0000003b8942e37d in raise () from /lib64/tls/libc.so.6
#1  0x0000003b8942faae in abort () from /lib64/tls/libc.so.6
#2  0x00000000005d36f8 in ExceptionalCondition (   conditionName=0x623a <Address 0x623a out of bounds>,
errorType=0x623a<Address 0x623a out of bounds>,   fileName=0x623a <Address 0x623a out of bounds>, lineNumber=-1) at
assert.c:51
#3  0x000000000047365f in SimpleLruReadPage (ctl=0x7d9f40, pageno=162932, xid=0) at slru.c:307
#4  0x0000000000473863 in SlruSelectLRUPage (ctl=0x7d9f40, pageno=163131) at slru.c:753
#5  0x0000000000473439 in SimpleLruReadPage (ctl=0x7d9f40, pageno=163131, xid=334094300)   at slru.c:254
#6  0x0000000000473eeb in SubTransGetParent (xid=334094300) at subtrans.c:116
#7  0x0000000000473f61 in SubTransGetTopmostTransaction (xid=Variable "xid" is not available.
) at subtrans.c:153
#8  0x00000000005efa38 in HeapTupleSatisfiesSnapshot (tuple=0x2ac2fb04b0, snapshot=0x88ab98,   buffer=86685) at
tqual.c:967
#9  0x0000000000447d7a in heapgettup (relation=0x2add22b960, dir=1, tuple=0x8c4a90,   buffer=0x8c4ab0,
snapshot=0x88ab98,nkeys=0, key=0x0, pages=597) at heapam.c:305
 
#10 0x0000000000448b53 in heap_getnext (scan=0x8c4a68, direction=Variable "direction" is not available.
) at heapam.c:832
#11 0x00000000004f7f86 in SeqNext (node=Variable "node" is not available.
) at nodeSeqscan.c:102
#12 0x00000000004eec2e in ExecScan (node=0x8c3b68, accessMtd=0x4f7f20 <SeqNext>)   at execScan.c:98
#13 0x00000000004e9c9d in ExecProcNode (node=0x8c3b68) at execProcnode.c:303
#14 0x00000000004f2a75 in ExecAgg (node=0x8c3610) at nodeAgg.c:783
#15 0x00000000004e9bea in ExecProcNode (node=0x8c3610) at execProcnode.c:353
#16 0x00000000004e8ccd in ExecutorRun (queryDesc=Variable "queryDesc" is not available.
) at execMain.c:1060
#17 0x000000000056968e in PortalRunSelect (portal=0x8acb38, forward=Variable "forward" is not available.
) at pquery.c:746
#18 0x0000000000569caf in PortalRun (portal=0x8acb38, count=9223372036854775807,   dest=0x8bbae0, altdest=0x8bbae0,
completionTag=0x7fbfffdfd0"") at pquery.c:561
 
#19 0x0000000000565f12 in exec_simple_query (   query_string=0x89e0b8 "SELECT count(*) as cnt FROM queue where
machineindex= '32'")   at postgres.c:933
 
#20 0x0000000000567b33 in PostgresMain (argc=4, argv=0x846368, username=0x846328 "iacm")   at postgres.c:3007
#21 0x000000000053ac70 in ServerLoop () at postmaster.c:2836
#22 0x000000000053c374 in PostmasterMain (argc=5, argv=0x843500) at postmaster.c:918
#23 0x0000000000507fef in main (argc=5, argv=0x843500) at main.c:268

And 32555:
#0  0x0000003b8942e37d in raise () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x0000003b8942e37d in raise () from /lib64/tls/libc.so.6
#1  0x0000003b8942faae in abort () from /lib64/tls/libc.so.6
#2  0x00000000005d36f8 in ExceptionalCondition (   conditionName=0x7f2b <Address 0x7f2b out of bounds>,
errorType=0x7f2b<Address 0x7f2b out of bounds>,   fileName=0x7f2b <Address 0x7f2b out of bounds>, lineNumber=-1) at
assert.c:51
#3  0x000000000047365f in SimpleLruReadPage (ctl=0x7d9f40, pageno=164152, xid=0) at slru.c:307
#4  0x0000000000473863 in SlruSelectLRUPage (ctl=0x7d9f40, pageno=164037) at slru.c:753
#5  0x0000000000473439 in SimpleLruReadPage (ctl=0x7d9f40, pageno=164037, xid=335949336)   at slru.c:254
#6  0x0000000000473eeb in SubTransGetParent (xid=335949336) at subtrans.c:116
#7  0x0000000000473f61 in SubTransGetTopmostTransaction (xid=Variable "xid" is not available.
) at subtrans.c:153
#8  0x00000000005ef963 in HeapTupleSatisfiesSnapshot (tuple=0x2abc81e0b8, snapshot=0x8788d8,   buffer=73427) at
tqual.c:905
#9  0x0000000000448dc6 in heap_release_fetch (relation=0x2add227130, snapshot=0x8788d8,   tuple=0x8d2460,
userbuf=0x8d2480,keep_buf=1 '\001', pgstat_info=0x8d24b8) at heapam.c:979
 
#10 0x0000000000450c8f in index_getnext (scan=0x8d2418, direction=ForwardScanDirection)   at indexam.c:528
#11 0x00000000004f5012 in IndexNext (node=0x8d18c0) at nodeIndexscan.c:316
#12 0x00000000004eec2e in ExecScan (node=0x8d18c0, accessMtd=0x4f4f20 <IndexNext>)   at execScan.c:98
#13 0x00000000004e9c8d in ExecProcNode (node=0x8d18c0) at execProcnode.c:307
#14 0x00000000004e8ccd in ExecutorRun (queryDesc=Variable "queryDesc" is not available.
) at execMain.c:1060
#15 0x000000000056968e in PortalRunSelect (portal=0x8add58, forward=Variable "forward" is not available.
) at pquery.c:746
#16 0x0000000000569caf in PortalRun (portal=0x8add58, count=9223372036854775807,   dest=0x8e5c48, altdest=0x8e5c48,
completionTag=0x7fbfffdfd0"") at pquery.c:561
 
#17 0x0000000000565f12 in exec_simple_query (   query_string=0x89f2d8 "select index from daily_reports where
accountindex= '3034' and date = '1130040000'") at postgres.c:933
 
#18 0x0000000000567b33 in PostgresMain (argc=4, argv=0x846368, username=0x846328 "iacm")   at postgres.c:3007
#19 0x000000000053ac70 in ServerLoop () at postmaster.c:2836
#20 0x000000000053c374 in PostmasterMain (argc=5, argv=0x843500) at postmaster.c:918
#21 0x0000000000507fef in main (argc=5, argv=0x843500) at main.c:268

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
"Jim C. Nasby"
Дата:
Here's another core... (pid 805 for reference)

#0  0x0000003b8942e37d in raise () from /lib64/tls/libc.so.6
#0  0x0000003b8942e37d in raise () from /lib64/tls/libc.so.6
#1  0x0000003b8942faae in abort () from /lib64/tls/libc.so.6
#2  0x00000000005d36f8 in ExceptionalCondition (   conditionName=0x325 <Address 0x325 out of bounds>,   errorType=0x325
<Address0x325 out of bounds>,   fileName=0x325 <Address 0x325 out of bounds>, lineNumber=-1) at assert.c:51
 
#3  0x000000000047365f in SimpleLruReadPage (ctl=0x7d9f40, pageno=169039, xid=0) at slru.c:307
#4  0x0000000000473863 in SlruSelectLRUPage (ctl=0x7d9f40, pageno=169162) at slru.c:753
#5  0x0000000000473439 in SimpleLruReadPage (ctl=0x7d9f40, pageno=169162, xid=346445732)   at slru.c:254
#6  0x0000000000473eeb in SubTransGetParent (xid=346445732) at subtrans.c:116
#7  0x0000000000473f61 in SubTransGetTopmostTransaction (xid=Variable "xid" is not available.
) at subtrans.c:153
#8  0x00000000005efa38 in HeapTupleSatisfiesSnapshot (tuple=0x2ac4b87dd0, snapshot=0x877908,   buffer=90248) at
tqual.c:967
#9  0x0000000000447d7a in heapgettup (relation=0x2add20fd98, dir=1, tuple=0x8e7210,   buffer=0x8e7230,
snapshot=0x877908,nkeys=0, key=0x0, pages=435) at heapam.c:305
 
#10 0x0000000000448b53 in heap_getnext (scan=0x8e71e8, direction=Variable "direction" is not available.
) at heapam.c:832
#11 0x00000000004f7f86 in SeqNext (node=Variable "node" is not available.
) at nodeSeqscan.c:102
#12 0x00000000004eec2e in ExecScan (node=0x8b7c38, accessMtd=0x4f7f20 <SeqNext>)   at execScan.c:98
#13 0x00000000004e9c9d in ExecProcNode (node=0x8b7c38) at execProcnode.c:303
#14 0x00000000004f7431 in ExecNestLoop (node=0x8b64b0) at nodeNestloop.c:135
#15 0x00000000004e9c4d in ExecProcNode (node=0x8b64b0) at execProcnode.c:326
#16 0x00000000004f89f9 in ExecSort (node=0x8b6398) at nodeSort.c:102
#17 0x00000000004e9c0a in ExecProcNode (node=0x8b6398) at execProcnode.c:345
#18 0x00000000004f9048 in ExecLimit (node=0x8b6150) at nodeLimit.c:87
#19 0x00000000004e9bb4 in ExecProcNode (node=0x8b6150) at execProcnode.c:369
#20 0x00000000004e8ccd in ExecutorRun (queryDesc=Variable "queryDesc" is not available.
) at execMain.c:1060
#21 0x000000000056968e in PortalRunSelect (portal=0x8ad5a8, forward=Variable "forward" is not available.
) at pquery.c:746
#22 0x0000000000569caf in PortalRun (portal=0x8ad5a8, count=9223372036854775807,   dest=0x8e1870, altdest=0x8e1870,
completionTag=0x7fbfffdfd0"") at pquery.c:561
 
#23 0x0000000000565f12 in exec_simple_query (   query_string=0x89f168 ' ' <repeats 71 times>, "SELECT
a.index,a.jobtype,a.machineindex,a.pid,a.data,a.status,a.starttime,a.ranby,a.clientindex,a.parentindex,a.output_data,a.per"...)
 at postgres.c:933
 
#24 0x0000000000567b33 in PostgresMain (argc=4, argv=0x846368, username=0x846328 "iacm")   at postgres.c:3007
#25 0x000000000053ac70 in ServerLoop () at postmaster.c:2836
#26 0x000000000053c374 in PostmasterMain (argc=5, argv=0x843500) at postmaster.c:918
#27 0x0000000000507fef in main (argc=5, argv=0x843500) at main.c:268

#3  0x000000000047365f in SimpleLruReadPage (ctl=0x7d9f40, pageno=169039, xid=0) at slru.c:307
307                     Assert(shared->page_number[slotno] == pageno &&

$1 = {ControlLock = SubtransControlLock, page_buffer = {0x2a98298380 "", 0x2a9829a380 "",   0x2a9829c380 "",
0x2a9829e380"", 0x2a982a0380 "", 0x2a982a2380 "", 0x2a982a4380 "",   0x2a982a6380 ""}, page_status = {SLRU_PAGE_DIRTY,
SLRU_PAGE_CLEAN,  SLRU_PAGE_READ_IN_PROGRESS, SLRU_PAGE_CLEAN, SLRU_PAGE_CLEAN, SLRU_PAGE_READ_IN_PROGRESS,
SLRU_PAGE_CLEAN,SLRU_PAGE_CLEAN}, page_number = {169452, 169351, 169163, 169238, 169236,   169328, 169233, 169239},
page_lru_count= {17108, 4, 1, 3, 5, 0, 6, 2}, buffer_locks = {   24, 25, 26, 27, 28, 29, 30, 31}, latest_page_number =
169452}
$2 = 169039
$3 = 2
$4 = 1 '\001'
$5 = 0

- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-946


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
Alvaro Herrera
Дата:
Jim C. Nasby wrote:
> Here's another core... (pid 805 for reference)

All of them have in common that the slotno being passed ($3 below) is in
SLRU_PAGE_READ_IN_PROGRESS state ... could it be a problem with lock
reordering?  Maybe somebody is trying to read in a page, and somebody
else steals the buffer from under them.  Not sure how likely is that.

BTW what's the relationship with the other assertion failure (the one in
the subject)?

> #3  0x000000000047365f in SimpleLruReadPage (ctl=0x7d9f40, pageno=169039, xid=0) at slru.c:307
> 307                     Assert(shared->page_number[slotno] == pageno &&
> 
> $1 = {ControlLock = SubtransControlLock, page_buffer = {0x2a98298380 "", 0x2a9829a380 "",
>     0x2a9829c380 "", 0x2a9829e380 "", 0x2a982a0380 "", 0x2a982a2380 "", 0x2a982a4380 "",
>     0x2a982a6380 ""}, page_status = {SLRU_PAGE_DIRTY, SLRU_PAGE_CLEAN,
>     SLRU_PAGE_READ_IN_PROGRESS, SLRU_PAGE_CLEAN, SLRU_PAGE_CLEAN, SLRU_PAGE_READ_IN_PROGRESS,
>     SLRU_PAGE_CLEAN, SLRU_PAGE_CLEAN}, page_number = {169452, 169351, 169163, 169238, 169236,
>     169328, 169233, 169239}, page_lru_count = {17108, 4, 1, 3, 5, 0, 6, 2}, buffer_locks = {
>     24, 25, 26, 27, 28, 29, 30, 31}, latest_page_number = 169452}
> $2 = 169039
> $3 = 2
> $4 = 1 '\001'
> $5 = 0


-- 
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 17.7", W 73º 14' 26.8"
"Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> All of them have in common that the slotno being passed ($3 below) is in
> SLRU_PAGE_READ_IN_PROGRESS state ... could it be a problem with lock
> reordering?  Maybe somebody is trying to read in a page, and somebody
> else steals the buffer from under them.  Not sure how likely is that.

It's even more interesting than that: in all three cases,
SlruSelectLRUPage has selected a "least recently used" page that is
still in READ_IN_PROGRESS state (ie, we haven't finished faulting it in)
and is recursively calling SimpleLruReadPage to wait for that condition
to terminate.

Apparently, Jim's setup could desperately do with a larger SLRU arena
for pg_subtrans, because this is supposed to be a never-happen path ---
if you can't finish loading a page before you need its slot for
something else, you are thrashing with a capital T.

I suppose there's a bug in this path, but I'm darned if I can see what
it is.  There are a number of obvious inefficiencies, but those
shouldn't be important given that this isn't supposed to happen much.
But how's it getting to the Assert failure?
        regards, tom lane


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
"Jim C. Nasby"
Дата:
On Fri, Oct 28, 2005 at 04:58:56PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > All of them have in common that the slotno being passed ($3 below) is in
> > SLRU_PAGE_READ_IN_PROGRESS state ... could it be a problem with lock
> > reordering?  Maybe somebody is trying to read in a page, and somebody
> > else steals the buffer from under them.  Not sure how likely is that.
> 
> It's even more interesting than that: in all three cases,
> SlruSelectLRUPage has selected a "least recently used" page that is
> still in READ_IN_PROGRESS state (ie, we haven't finished faulting it in)
> and is recursively calling SimpleLruReadPage to wait for that condition
> to terminate.
> 
> Apparently, Jim's setup could desperately do with a larger SLRU arena
> for pg_subtrans, because this is supposed to be a never-happen path ---
> if you can't finish loading a page before you need its slot for
> something else, you are thrashing with a capital T.
> 
> I suppose there's a bug in this path, but I'm darned if I can see what
> it is.  There are a number of obvious inefficiencies, but those
> shouldn't be important given that this isn't supposed to happen much.
> But how's it getting to the Assert failure?

If it helps, this is a ~250G database that's (now) on an 8-way (opteron
I think) machine with 32G. shared_buffers is set to 1G. My client also
has a 4-way machine with 16G, although it seemed to be having some
issues with producing cores that were useful.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
Tom Lane
Дата:
I wrote:
> I suppose there's a bug in this path, but I'm darned if I can see what
> it is.  There are a number of obvious inefficiencies, but those
> shouldn't be important given that this isn't supposed to happen much.
> But how's it getting to the Assert failure?

While I'm disinclined to change anything until we can explain why it's
crashing, I suspect that the solution may be to avoid the recursive call
of SimpleLruReadPage, as in the attached patch.  Jim, are you interested
in seeing if this patch makes the problem go away for you?

            regards, tom lane


Вложения

Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
"Jim C. Nasby"
Дата:
On Fri, Oct 28, 2005 at 05:45:51PM -0400, Tom Lane wrote:
> I wrote:
> > I suppose there's a bug in this path, but I'm darned if I can see what
> > it is.  There are a number of obvious inefficiencies, but those
> > shouldn't be important given that this isn't supposed to happen much.
> > But how's it getting to the Assert failure?
> 
> While I'm disinclined to change anything until we can explain why it's
> crashing, I suspect that the solution may be to avoid the recursive call
> of SimpleLruReadPage, as in the attached patch.  Jim, are you interested
> in seeing if this patch makes the problem go away for you?
Well, this is a production system... what's the risk with that patch?

BTW, is it typical to see a 10 difference between asserts on and off? My
client has a process that was doing 10-20 records/sec with asserts on
and 90-110 with asserts off.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

От
Tom Lane
Дата:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Fri, Oct 28, 2005 at 05:45:51PM -0400, Tom Lane wrote:
>> Jim, are you interested
>> in seeing if this patch makes the problem go away for you?
> Well, this is a production system... what's the risk with that patch?

Well, it's utterly untested, which means it might crash your system,
which is where you are now, no?

> BTW, is it typical to see a 10 difference between asserts on and off? My
> client has a process that was doing 10-20 records/sec with asserts on
> and 90-110 with asserts off.

Not typical, but I can believe there are some code paths like that.
        regards, tom lane