Обсуждение: Error "vacuum pg_proc"
Hi, I got this error vacuuming pg_proc: ERROR: _bt_endpoint: leftmost page (20) has not leftmost flag It's postgresql 6.5.3 (Linux). []'s Mateus Cordeiro Inssa --------------------- Linux User: 76186 Kernel: 2.3.34 ICQ (Licq): 15243895 --------------------- mateus@ifnet.com.br mateus@cwb.fnn.net Fri Dec 24 09:55:14 EDT 1999
Mateus Cordeiro Inssa <mateus@ifnet.com.br> writes: > I got this error vacuuming pg_proc: > ERROR: _bt_endpoint: leftmost page (20) has not leftmost flag Hmm, I wonder if this could be yet another manifestation of the problems that btree indexes have with oversized key values. Do you have any procedures with long definitions? "Long" in this context means over about 4K. If you're not sure, tryselect proname from pg_proc where length(prosrc) > 4000; If you do, try breaking them up into smaller procedures. You might have to dump and rebuild the database to get rid of the corruption in pg_proc's index, though. The reason this is an issue is that btree wants to be able to store at least two index tuples per disk page, so it has problems with indexing values over half-a-page-less-overhead. I'm not sure exactly what the critical size is, but it is somewhere around 4000 bytes. And pg_proc has an index on the prosrc field. The prosrc index is actually completely unnecessary, so we've removed it for 7.0. Work is in progress to fix the tuple-size problem as well, but that will probably take longer. regards, tom lane
I wrote: > Mateus Cordeiro Inssa <mateus@ifnet.com.br> writes: >> I got this error vacuuming pg_proc: >> ERROR: _bt_endpoint: leftmost page (20) has not leftmost flag > Hmm, I wonder if this could be yet another manifestation of the problems > that btree indexes have with oversized key values. Do you have any > procedures with long definitions? "Long" in this context means over > about 4K. I spent some more time looking into this, and found out that actually the safe upper limit for btree index entries is not ~ BLCKSZ/2, but ~ BLCKSZ/3. btree needs to be able to insert two items on every page, but it *also* keeps an extra item (the "high key") on non-rightmost pages. So if any item exceeds one-third the available space, you run a risk of failure depending on what page it ends up on and what else is on that same page. It turns out that for an index on a single text column, the maximum safe text length is 2700 bytes. So the correct check for dangerous procedure definitions in 6.5.* is select proname from pg_proc where length(prosrc) > 2700; I have committed a check for dangerously long index entries into both current sources and REL6_5 branch. The patch is attached if you want to add it to your installation right away. Note that this only defends against oversize new entries; if you've already got oversize index entries, you still have trouble waiting to happen... regards, tom lane *** src/backend/access/nbtree/nbtinsert.c.orig Wed Sep 1 13:54:00 1999 --- src/backend/access/nbtree/nbtinsert.c Sun Dec 26 15:44:15 1999 *************** *** 268,273 **** --- 268,285 ---- * consistent */ /* + * Check whether the item can fit on a btree page at all. + * (Eventually, we ought to try to apply TOAST methods if not.) + * We actually need to be able to fit three items on every page, + * so restrict any one item to 1/3 the per-page available space. + * Note that at this point, itemsz doesn't include the ItemId. + */ + if (itemsz > (PageGetPageSize(page)-sizeof(PageHeaderData)-MAXALIGN(sizeof(BTPageOpaqueData)))/3 - sizeof(ItemIdData)) + elog(ERROR, "btree: index item size %d exceeds maximum %d", + itemsz, + (PageGetPageSize(page)-sizeof(PageHeaderData)-MAXALIGN(sizeof(BTPageOpaqueData)))/3 - sizeof(ItemIdData)); + + /* * If we have to insert item on the leftmost page which is the first * page in the chain of duplicatesthen: 1. if scankey == hikey (i.e. * - new duplicate item) then insert it here; 2. if scankey < hikey
> I spent some more time looking into this, and found out that actually > the safe upper limit for btree index entries is not ~ BLCKSZ/2, but > ~ BLCKSZ/3. btree needs to be able to insert two items on every page, > but it *also* keeps an extra item (the "high key") on non-rightmost > pages. So if any item exceeds one-third the available space, you run > a risk of failure depending on what page it ends up on and what else > is on that same page. > > It turns out that for an index on a single text column, the maximum > safe text length is 2700 bytes. So the correct check for dangerous > procedure definitions in 6.5.* is This is another argument to try and get long tuples into 7.0. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> It turns out that for an index on a single text column, the maximum >> safe text length is 2700 bytes. > This is another argument to try and get long tuples into 7.0. I think Jan might have enough on his plate already without trying to TOAST the index code along with the plain-table code. But if he can get it done, great! One thing this does bring up is that the maximum safe tuple length is dependent on the index or table type. The toaster's API had better be designed accordingly... regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> It turns out that for an index on a single text column, the maximum > >> safe text length is 2700 bytes. > > > This is another argument to try and get long tuples into 7.0. > > I think Jan might have enough on his plate already without trying to > TOAST the index code along with the plain-table code. But if he can > get it done, great! > > One thing this does bring up is that the maximum safe tuple length is > dependent on the index or table type. The toaster's API had better > be designed accordingly... In talking to Jan, the index code will make use of the toast entries automatically. He said the heap_insert will do any toasting, and after that the tuple already has any toast pointers, and that gets inserted into the index. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> One thing this does bring up is that the maximum safe tuple length is >> dependent on the index or table type. The toaster's API had better >> be designed accordingly... > In talking to Jan, the index code will make use of the toast entries > automatically. He said the heap_insert will do any toasting, and after > that the tuple already has any toast pointers, and that gets inserted > into the index. If that's his plan, then it's broken by design. Toasting a complete tuple cannot be the basis for toasting index entries related to the tuple, because (a) the index entries will typically use only some of the fields appearing in the tuple; (b) index entries have different length limits than tuples do; (c) indexes might be created after the original table is. Heck, index *types* might be created after the original table is. If index toasting is dependent on toasting of the main table, the only way it can work is to toast every varlena attribute down to a prechosen maximum length that Jan hopes will satisfy every index type, now or hereafter --- no matter whether the column in question has or ever will have an index of any type. And that'll still crash and burn for multicolumn indexes. I thought the plan was to toast indexes independently of the main table, ie, an index would have its own toast-table and its own storage of oversize attributes --- where the *index* decides what is oversize, not the main table. If main tables and indexes point at the same toast-table entries, I think VACUUM will become rather an interesting problem, too... although maybe that could be worked around if VACUUM destroys indexes and rebuilds them from scratch. regards, tom lane
I wrote: > And that'll still crash and burn for multicolumn indexes. Not to mention functional indexes, which typically store values that don't appear in the referenced tuple at all. Basically, indexes have to have their own toasters. There's no other way. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> One thing this does bring up is that the maximum safe tuple length is > >> dependent on the index or table type. The toaster's API had better > >> be designed accordingly... > > > In talking to Jan, the index code will make use of the toast entries > > automatically. He said the heap_insert will do any toasting, and after > > that the tuple already has any toast pointers, and that gets inserted > > into the index. > > If that's his plan, then it's broken by design. Toasting a complete > tuple cannot be the basis for toasting index entries related to the > tuple, because (a) the index entries will typically use only some of the > fields appearing in the tuple; (b) index entries have different length > limits than tuples do; (c) indexes might be created after the original Yes, I see your point. This could be quite complicated. He is targeting BLKSZ and indexes are BLKSZ/3. He could toast any single attribute of size >= BLKSZ/3, but as you mentioned multi-column indexes would be a problem. I wonder if he could create the TOAST entries and modify the heap tuple during index creation? Wow, this clearly is going to be messy. Maybe he is going to have to have a separate TOAST table for the index. If he got fancy, he could use the heap TOAST table as needed, and have a secondary index TOAST for cases the tuple needs further TOASTS for indexes. It shows we haven't thought through all this yet. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane writes:> Mateus Cordeiro Inssa <mateus@ifnet.com.br> writes:> > I got this error vacuuming pg_proc:> > ERROR: _bt_endpoint: leftmost page (20) has not leftmost flag> > Hmm, I wonder if this could be yet another manifestationof the problems> that btree indexes have with oversized key values. Do you have any> procedures with long definitions? "Long" in this context means over> about 4K. If you're not sure, try> select proname from pg_proc wherelength(prosrc) > 4000; Yes, I have some functions from 3k to 5k. > If you do, try breaking them up into smaller procedures. You might have> to dump and rebuild the database to get rid ofthe corruption in> pg_proc's index, though. Ok. > The prosrc index is actually completely unnecessary, so we've removed> it for 7.0. Work is in progress to fix the tuple-sizeproblem as well,> but that will probably take longer. Oh, I would ask why there was this index. I had problems with it since version 6.4. I'd like to suggest the creation of a new command: ALTER FUNCTION. I use pltcl to program in the server, so, no need for checking the function code. The problems with pg_proc always occurred to me when changing functions: DROP/CREATE. This command would do just an update on prosrc field (that doesn't have index anymore). []'s Mateus Cordeiro Inssa --------------------- Linux User: 76186 Kernel: 2.3.34 ICQ (Licq): 15243895 --------------------- mateus@ifnet.com.br mateus@cwb.fnn.net Mon Dec 27 11:15:41 EDT 1999 .
Tom Lane wrote: > I wrote: > > And that'll still crash and burn for multicolumn indexes. > > Not to mention functional indexes, which typically store values that > don't appear in the referenced tuple at all. > > Basically, indexes have to have their own toasters. There's no other > way. You're right. I think it's best to delay index toasting until we have some experience with normal, main tuple attribute toasting.It'd be nice if the solution had covered huge values to be indexed automatically (what it doesn't any more).But I think most people can live with a database, that cannot index huge values, but is capable to store andretrieve them for now. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
At 07:25 PM 1/3/00 +0100, Jan Wieck wrote: > I think it's best to delay index toasting until we have some > experience with normal, main tuple attribute toasting. It'd be > nice if the solution had covered huge values to be indexed > automatically (what it doesn't any more). But I think most > people can live with a database, that cannot index huge > values, but is capable to store and retrieve them for now. >From my personal POV, I would certainly agree with this. The stuff I'm working on is probably pretty typical, using an integer key to identify rows which contain large chunks of text, photographs, etc. I'm perfectly happy not being able to index the big chunks, and suspect a large percentage of users would feel the same. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.