Обсуждение: pg_filedump strangeness
Hi, I'm chasing an apparent index corruption problem, and I came across something I can't quite explain in pg_filedump. Say I dump a non-leaf btree index page: ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 8.3.0 * * File: 31141 * Options used: -fi -x -R 6246 * * Dump created on: Tue Apr 6 17:40:28 2010 ******************************************************************* Block 6246 ******************************************************** <Header> -----Block Offset: 0x030cc000 Offsets: Lower 36 (0x0024)Block: Size 8192 Version 4 Upper 8120 (0x1fb8)LSN: logid 1077 recoff 0x45c8b660 Special 8176 (0x1ff0)Items: 3 FreeSpace: 8084TLI: 0x0001 Prune XID: 0x00000000 Flags: 0x0000 ()Length (including item array): 36 0000: 35040000 60b6c845 01000000 2400b81f 5...`..E....$... 0010: f01f0420 00000000 d89f3000 d09f1000 ... ......0.....0020: b89f3000 ..0. <Data> ------ Item 1 -- Length: 24 Offset: 8152 (0x1fd8) Flags: NORMAL Block Id: 6232 linp Index: 1 Size: 24 HasNulls: 32768 Has Varwidths: 0 1fd8: 00005818 01001880 01000000 00000000 ..X............. 1fe8: 80bcc57d 74230100 ...}t#.. Item 2 -- Length: 8 Offset: 8144 (0x1fd0) Flags: NORMAL Block Id: 2756 linp Index: 1 Size: 8 Has Nulls: 0 Has Varwidths:0 1fd0: 0000c40a 01000800 ........ Item 3 -- Length: 24 Offset: 8120 (0x1fb8) Flags: NORMAL Block Id: 6231 linp Index: 1 Size: 24 Has Nulls: 32768 Has Varwidths: 0 1fb8: 00005718 01001880 01000000 00000000 ..W............. 1fc8: 4009cc7f 73230100 @...s#.. <Special Section> -----BTree Index Section: Flags: 0x0000 () Blocks: Previous (6109) Next (6305) Level (1) CycleId (0) 1ff0: dd170000 a1180000 01000000 00000000 ................ *** End of Requested Range Encountered. Last Block Read: 6246 *** Notice how item 2 has size 8, but regular entries have size 24. I know this is related to the high key of this page, but I can't quite figure out why the short entry is 2 not 1. Is item 2 just assumed to be greater than the previous' page high key? Page's 6109 high key is: Item 1 -- Length: 24 Offset: 8152 (0x1fd8) Flags: NORMAL Block Id: 6101 linp Index: 1 Size: 24 Has Nulls: 32768 Has Varwidths: 0 1fd8: 0000d517 01001880 01000000 00000000 ................ 1fe8: 8004f17d 6f230100 ...}o#.. Note that the data values are integer timestamp without time zone in little endian byte order. (The "Has Nulls" bit is somewhat bogus -- it displays 32768 when the 0x8000 bit is on, which is rather surprising. I'd expect it to display 1). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > I'm chasing an apparent index corruption problem, and I came across > something I can't quite explain in pg_filedump. Say I dump a non-leaf > btree index page: I think this is actually OK. Remember that in a non-rightmost page, item 1 is the high key not a data entry. On the other hand, in a non-leaf page, we don't bother to store the key for the first downlink entry, since the associated key is really "minus infinity". Cf nbtree/README: On a non-leaf page, the data items are down-links to child pages withbounding keys. The key in each data item is the *lower*bound forkeys on that child page, so logically the key is to the left of thatdownlink. The high key (if present)is the upper bound for the lastdownlink. The first data item on each such page has no lower bound--- or lower boundof minus infinity, if you prefer. The comparisonroutines must treat it accordingly. The actual key stored in theitemis irrelevant, and need not be stored at all. This arrangementcorresponds to the fact that an L&Y non-leaf page hasone more pointerthan key. So item 2 doesn't have a key in it. The other two items have null keys, which means they need a null bitmap. I don't however understand why there seems to be data as well as a null bitmap in there --- is this perhaps a two-column index? > (The "Has Nulls" bit is somewhat bogus -- it displays 32768 when the > 0x8000 bit is on, which is rather surprising. I'd expect it to display > 1). Yeah, I noticed that too. Made a note to myself to fix it in the next revision of pg_filedump, which I suppose I'd better get on with producing... regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I'm chasing an apparent index corruption problem, and I came across > > something I can't quite explain in pg_filedump. Say I dump a non-leaf > > btree index page: > > I think this is actually OK. Remember that in a non-rightmost page, > item 1 is the high key not a data entry. On the other hand, in a > non-leaf page, we don't bother to store the key for the first downlink > entry, since the associated key is really "minus infinity". Cf > nbtree/README: > > On a non-leaf page, the data items are down-links to child pages with > bounding keys. The key in each data item is the *lower* bound for > keys on that child page, so logically the key is to the left of that > downlink. The high key (if present) is the upper bound for the last > downlink. The first data item on each such page has no lower bound > --- or lower bound of minus infinity, if you prefer. The comparison > routines must treat it accordingly. The actual key stored in the > item is irrelevant, and need not be stored at all. This arrangement > corresponds to the fact that an L&Y non-leaf page has one more pointer > than key. Ahh, I had forgotten that bit completely. Thanks. > So item 2 doesn't have a key in it. The other two items have null > keys, which means they need a null bitmap. Correct. > I don't however understand > why there seems to be data as well as a null bitmap in there --- is > this perhaps a two-column index? Eh, yeah, it's a two column index, so it's OK. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I'm chasing an apparent index corruption problem, and I came across > > something I can't quite explain in pg_filedump. Say I dump a non-leaf > > btree index page: > > I think this is actually OK. Remember that in a non-rightmost page, > item 1 is the high key not a data entry. Oh, BTW, this is not what's corrupted about this index -- I just had trouble following what pg_filedump was reporting. The corruption is more subtle: vacuum cannot find the parent page when trying to mark a page for deletion. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.