Обсуждение: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP
Greetings, I've got an annoying problem. I'm currently running PostgreSQL-7.3.4 on Linux (x86). This problem started with 7.3.3. I've got a database that is on the larger side (about 3GB dump). I run "vacuumdb -z -a -f" religiously via a cronjob three times a day. All of a sudden last month (after about 3 years) I started getting this warning when vacuumdb was run: INFO: Index pg_largeobject_loid_pn_index: Pages 903; Tuples 323847: Deleted 0. CPU 0.04s/0.07u sec elapsed 0.10 sec. WARNING: Index pg_largeobject_loid_pn_index: NUMBER OF INDEX' TUPLES (323847) IS NOT THE SAME AS HEAP' (323802). Recreate the index. So I put postgresql into standalone mode, recreated the index, and everything was ok for about 2 days, and then the problem would return. I did some Googling and found that this was a potential bug in older versions of postgresql, but was supposedly fixed in 7.3.4 and later versions. So I upgraded to 7.3.4 (using the semi-official RPMs on the postgresql.org ftp servers). Dropped into standalone mode, reindexed, and everything was fine for about the past month. Until this morning when it came back again. The server where this is running isn't having any hardware problems, isn't getting shutdown improperly or anything like that. Its current uptime is 209 days, and postgresql is never shutdown improperly. Now I'd be willing to upgrade further, but I really can't afford unnecessary downtime. So I'd like some guidance/input on which version of postgresql will not have this bug. Or maybe this isn't the bug at all, and there's some other weird problem? Either way, any and all advice is appreciated. Thanks! Lonni
Lonni Friedman <netllama@gmail.com> writes: > All of a sudden last month (after about 3 years) I started getting > this warning when vacuumdb was run: > INFO: Index pg_largeobject_loid_pn_index: Pages 903; Tuples 323847: > Deleted 0. CPU 0.04s/0.07u sec elapsed 0.10 sec. > WARNING: Index pg_largeobject_loid_pn_index: NUMBER OF INDEX' TUPLES > (323847) IS NOT THE SAME AS HEAP' (323802). > Recreate the index. Hmm. Is it always that same index, or have you seen this on other indexes? I'm not aware of any known bug in 7.3.* that could lead to this sort of error. Don't suppose you have any suggestions about how to reproduce the problem? regards, tom lane
Lonni Friedman <netllama@gmail.com> writes: > Unfortunately, i have no clue how to replicate this. It was happening > fairly consistantly before i upgraded from 7.3.3 to 7.3.4 (like nearly > every vacuumdb run). > Then nothing for a month after going to 7.3.4, and now its happening > every vacuumdb run again. Once the complaint starts appearing, I'd expect it to continue until you reindex the index. regards, tom lane
Its _always_ that same index. No others have had this problem. Unfortunately, i have no clue how to replicate this. It was happening fairly consistantly before i upgraded from 7.3.3 to 7.3.4 (like nearly every vacuumdb run). Then nothing for a month after going to 7.3.4, and now its happening every vacuumdb run again. On Wed, 05 May 2004 11:30:33 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Lonni Friedman <netllama@gmail.com> writes: > > All of a sudden last month (after about 3 years) I started getting > > this warning when vacuumdb was run: > > > INFO: Index pg_largeobject_loid_pn_index: Pages 903; Tuples 323847: > > Deleted 0. CPU 0.04s/0.07u sec elapsed 0.10 sec. > > WARNING: Index pg_largeobject_loid_pn_index: NUMBER OF INDEX' TUPLES > > (323847) IS NOT THE SAME AS HEAP' (323802). > > Recreate the index. > > Hmm. Is it always that same index, or have you seen this on other indexes? > > I'm not aware of any known bug in 7.3.* that could lead to this sort of > error. Don't suppose you have any suggestions about how to reproduce > the problem? > > regards, tom lane
Lonni Friedman <netllama@gmail.com> writes: > On Wed, 05 May 2004 12:31:21 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Once the complaint starts appearing, I'd expect it to continue until you >> reindex the index. > That's exactly what happens. It consistantly errors until reindexed. > Any suggestions? thanks. You are seemingly triggering some sort of bug in the backend's large-object code, causing extra index entries to be made ... but I sure haven't the foggiest idea how that would happen. Perhaps you could look at the parts of your application code that work with large objects and see if you are doing anything "off the beaten track" that might suggest a way to trigger the bug reproducibly. I'm sure we could fix it if we could see it happening. regards, tom lane
On Wed, 05 May 2004 12:31:21 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Lonni Friedman <netllama@gmail.com> writes: > > Unfortunately, i have no clue how to replicate this. It was happening > > fairly consistantly before i upgraded from 7.3.3 to 7.3.4 (like nearly > > every vacuumdb run). > > > Then nothing for a month after going to 7.3.4, and now its happening > > every vacuumdb run again. > > Once the complaint starts appearing, I'd expect it to continue until you > reindex the index. That's exactly what happens. It consistantly errors until reindexed. Any suggestions? thanks.
Lonni Friedman <netllama@gmail.com> writes: > hrmmm, i'm not sure what would constitute 'off the beaten track'. Neither am I ... if we knew what you were doing that triggers the bug, we'd already be halfway there :-( regards, tom lane
On Wed, 05 May 2004 13:56:41 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni Friedman <netllama@gmail.com> writes: > > On Wed, 05 May 2004 12:31:21 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Once the complaint starts appearing, I'd expect it to continue until you > >> reindex the index. > > > That's exactly what happens. It consistantly errors until reindexed. > > Any suggestions? thanks. > > You are seemingly triggering some sort of bug in the backend's > large-object code, causing extra index entries to be made ... > but I sure haven't the foggiest idea how that would happen. > > Perhaps you could look at the parts of your application code that work > with large objects and see if you are doing anything "off the beaten > track" that might suggest a way to trigger the bug reproducibly. > I'm sure we could fix it if we could see it happening. hrmmm, i'm not sure what would constitute 'off the beaten track'. is there something specific that i could look for? or what is considered to be normal? I can tell you that the large objects that are getting chucked into the database are generally under 1MB in size (each), as they're mostly M$ documents. There are a few that are between 1MB & 10MB but I don't think there's anything larger than 10MB.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lonni Friedman wrote: |>>Then nothing for a month after going to 7.3.4, and now its happening |>>every vacuumdb run again. |>Once the complaint starts appearing, I'd expect it to continue until you |>reindex the index. | That's exactly what happens. It consistantly errors until reindexed. | Any suggestions? thanks. A long shot, but we experienced a similar kind of issue, though our dataset was a bit different (not so many large objects). After having reindexed a couple of times with about the same results (working fine for some time, then complaining again), I dropped the index and recreated it. That seemed to fix the issue at our installation, we had no such problems after that. Ran for some months after that before I took the plunge and moved to 7.4.. This might not at all work for you though, and as removing the index will probably kill your performance for a while I cannot promise you a fix with this workaround.. Best Regards - -- Denis -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr2 (Windows XP) iD8DBQFAmqdovsCA6eRGOOARAptYAJ0YF6+tGtyA55hqOf8UaF+q2JnN4wCfdQtV r4nBoDljvFDGlXh3OhNSPcM= =6xLx -----END PGP SIGNATURE-----
Thanks for your reply. I thought (perhaps erroneously) that there wasn't any real difference between dropping an index then recreating it, and just reindexing an index? On Thu, 06 May 2004 23:00:25 +0200, Denis Braekhus <denis@startsiden.no> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Lonni Friedman wrote: > |>>Then nothing for a month after going to 7.3.4, and now its happening > |>>every vacuumdb run again. > |>Once the complaint starts appearing, I'd expect it to continue until you > |>reindex the index. > | That's exactly what happens. It consistantly errors until reindexed. > | Any suggestions? thanks. > > A long shot, but we experienced a similar kind of issue, though our > dataset was a bit different (not so many large objects). > After having reindexed a couple of times with about the same results > (working fine for some time, then complaining again), I dropped the > index and recreated it. That seemed to fix the issue at our > installation, we had no such problems after that. Ran for some months > after that before I took the plunge and moved to 7.4.. > > This might not at all work for you though, and as removing the index > will probably kill your performance for a while I cannot promise you a > fix with this workaround.. > > Best Regards > - -- > Denis
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lonni Friedman wrote: | Thanks for your reply. I thought (perhaps erroneously) that there | wasn't any real difference between dropping an index then recreating | it, and just reindexing an index? I am definitely not sure, and I agree it sounds logical that they would produce the same results. However my experience was that dropping and re-creating the index worked. The docs say : "Another approach to dealing with a corrupted user-table index is just to drop and recreate it. This may in fact be preferable if you would like to maintain some semblance of normal operation on the table meanwhile. REINDEX acquires exclusive lock on the table, while CREATE INDEX only locks out writes not reads of the table." Indicating that they should produce the same results, but that they work differently. I am not sure what that implies, but maybe someone else knows ? Regards - -- Denis -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr2 (Windows XP) iD8DBQFAnPzivsCA6eRGOOARAl1OAKC0zcgN409n7ylgyHV61J9/o4LsBgCgqEpJ yT24Y03fQItzhbRlxHyUg8s= =YBoz -----END PGP SIGNATURE-----
Denis Braekhus <denis@startsiden.no> writes: > Indicating that they should produce the same results, but that they work > differently. I am not sure what that implies, but maybe someone else knows ? The only difference the docs are talking about is what kind of lock is held while the rebuild proceeds. A reindex builds a new index file from scratch, and AFAICS should give the same results as dropping/recreating the index --- at least in terms of what's in the file proper. The only theory I can come up with for your experience is that there was some corruption in the system catalog rows describing the index. That would not get fixed by a reindex. However, I haven't the foggiest idea what sort of corruption might allow the index to seem to work (and not, say, crash the reindex itself which is going to use that information...) yet allow problems to appear much later on. Too bad the evidence is gone now. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: |>Indicating that they should produce the same results, but that they work |>differently. I am not sure what that implies, but maybe someone else knows ? | The only difference the docs are talking about is what kind of lock is | held while the rebuild proceeds. Yes I understood that, but the docs didn't (as you do now) excplicitly explain the different ways they work. I expected it to be as you say, however I was not 100% sure. Thanks for clarifying. | A reindex builds a new index file from scratch, and AFAICS should give | the same results as dropping/recreating the index --- at least in terms | of what's in the file proper. The only theory I can come up with for | your experience is that there was some corruption in the system catalog | rows describing the index. That would not get fixed by a reindex. | However, I haven't the foggiest idea what sort of corruption might | allow the index to seem to work (and not, say, crash the reindex itself | which is going to use that information...) yet allow problems to appear | much later on. Too bad the evidence is gone now. Yes, sorry about not bringing up the issue at the right time, however my main focus at that time was to bring the production system back to normal.. Regards - -- Denis -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr2 (Windows XP) iD8DBQFAnh+gvsCA6eRGOOARAlivAKCl8aIuii8GeSFLetWn+exBVXnptwCeKMUr wjAEgS7gP1LQeS/xZdiC03g= =ZRI6 -----END PGP SIGNATURE-----
On Sat, May 08, 2004 at 12:31:28PM -0400, Tom Lane wrote: > However, I haven't the foggiest idea what sort of corruption might > allow the index to seem to work (and not, say, crash the reindex itself > which is going to use that information...) yet allow problems to appear > much later on. Too bad the evidence is gone now. Dunno if this is any help, but on a 7.2 system I saw a REINDEX which was interrupted leave the index at least partially working. We ended up with an index which seemed fine, but which didn't contain certain rows (so those rows were not visible when the SELECT criterion was the indexed field). This was extremely puzzling, but a DROP INDEX; CREATE INDEX pair solved it. A -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
Andrew Sullivan <ajs@crankycanuck.ca> writes: > Dunno if this is any help, but on a 7.2 system I saw a REINDEX which > was interrupted leave the index at least partially working. We ended > up with an index which seemed fine, but which didn't contain certain > rows (so those rows were not visible when the SELECT criterion was > the indexed field). This was extremely puzzling, but a DROP INDEX; > CREATE INDEX pair solved it. Hmm, I would expect that behavior for an overwrite-in-place REINDEX, but 7.2 only seems to use overwrite-in-place for critical system catalogs. What were you reindexing exactly? Were you running a standalone backend? regards, tom lane
On Mon, May 10, 2004 at 07:49:42PM -0400, Tom Lane wrote: > > Hmm, I would expect that behavior for an overwrite-in-place REINDEX, > but 7.2 only seems to use overwrite-in-place for critical system > catalogs. What were you reindexing exactly? Were you running a > standalone backend? Not as far as I know (I didn't cause the problem, I only fixed it later, so I'm relying on the report of the person who ticked the problem to understand what happened). It was definitely on a user column, and IIRC it was a two-field unique index. My memory is hazy on it, though -- I didn't report it because it wasn't a system I could leave in that state for debugging. A -- Andrew Sullivan | ajs@crankycanuck.ca