Обсуждение: Broken index?
Hi, I am having a problem with duplicates appearing in a unique index and I have no idea why. The database in question has a high number of inserts and updates to a table (around 5000-10000 per hour) and occasionally duplicates slip through. To provide some background, I have listed the properties of the table and the index in question: player_stats=> \d player Table "player" Attribute | Type | Modifier --------------+--------------+------------------------------------------------- id | integer | not null default nextval('player_serial'::text) name | varchar(50) | not null password | varchar(50) | icqnumber | varchar(20) | emailaddress | varchar(255) | flatname | varchar(50) | Indices: player_flatname_idx, player_flatname_unique_idx, player_id_idx, player_name_key, player_pkey player_stats=> \d player_name_key Index "player_name_key" Attribute | Type -----------+------------- name | varchar(50) unique btree The attribute that appears to sometimes allow duplicates to be inserted is 'name' and only 'player_name_key' operates on this attribute. I can't say how often these duplicates slip through, but it seems to only happen *very* rarely, but is obviously completely mangling my data. Not only that, but VACUUM ANALYZE is taking *days* to complete and I can only assume this is due to these duplicates. The 'player' table contains around 180000 records, so if this fall of in performance when running vacuum can be associated with this, I'd appreciate some feedback. Anyway, I'd just like to know if anyone else has had this problem or if it is a known bug. All the database operations are handled through the JDBC driver and transactions are used to bundle these operations in to sets of 20 (maybe a potential problem here?). I'd *really* appreciate some feedback as it is becoming a very annoying problem. If you'd like to check the site out where it is used, go to http://www.fragmeter.com. Cheers! -- Paul Green Programmer Jippii Midlands 85 London Road, Leicester, England, LE2 0PF. tel: 0116 2230662 fax: 0116 2221305 Please visit www.businessjippii.co.uk to view our latest Internet and Telecommunication Products and Services. We offer it all! Want to order a free dial up account online? Go to www.dialjippii.co.uk. Want a quick and easy way to get a domain name? Order online at www.domainsjippii.co.uk. For the latest ringtones, logos and fun games go to the Jippii portal at www.jippii.co.uk. NOTICE: This e-mail is strictly confidential and is intended solely for the person or organisation to whom it is addressed. It may contain privileged and confidential information and if you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this communication in error, please advise us by e-mail and delete the file from your system. Whilst all efforts are made to safeguard Inbound and Outbound e-mails, Webleicester Ltd., its subsidiaries or associates cannot guarantee that attachments are Virus-free or compatible with your system and does not accept any liability in respect of viruses or computer problems experienced. Any views expressed in this message are those of the individual sender, except where specifically stated to be the view of Webleicester Ltd., its subsidiaries or associates.
Paul Green <traktion@webleicester.co.uk> writes: > I am having a problem with duplicates appearing in a unique index and I > have no idea why. Hmm. What PG version is this, on what platform? Have you tried dropping and recreating the index? > I can't say how often these duplicates slip through, but it seems to only > happen *very* rarely, but is obviously completely mangling my data. Not > only that, but VACUUM ANALYZE is taking *days* to complete and I can only > assume this is due to these duplicates. The 'player' table contains around > 180000 records, so if this fall of in performance when running vacuum can > be associated with this, I'd appreciate some feedback. Clearly something broken here; that's not an especially large table... regards, tom lane
Paul Green <traktion@webleicester.co.uk> writes: >> Hmm. What PG version is this, on what platform? > Redhat 7.1, (Linux 2.4.2-2) on a K6-2 300 with 128 mb ram running PG 7.0 > What *is* wierd is that if I do, say, "select * > from player where name='bob'" I get one result where as if I do a "select * > from player where name like '%bob', I get many bobs (about 10) returned. This is entirely consistent with the theory of a broken index. The %bob query does a sequential scan and looks at every row, but the other will do an index lookup and return only the rows it finds via the index. Did you compile 7.0 with locale support? If so, a plausible theory is that you've started the postmaster with different locale settings at different times --- a change in locale would alter the effective sort order, rendering what had been a valid index corrupt. The btree search and update algorithms fail completely if the data in the index is not really in the correct order. Unfortunately it is *very* easy to start the postmaster with inconsistent locale environment, since what you have in your login profile is often different from the environment that system boot scripts run in. 7.1 contains fixes that force PG always to use the initdb-time locale setting, so as to eliminate this class of problems. I'd suggest an upgrade. Meanwhile, you need to take steps to eliminate the duplicate rows from your table. > Also, if I do a "select count(id) from player" I get the > same number as "select distinct count(id) from player" even though they > should clearly be different. Eh? Looks like the same thing to me. Perhaps you are thinking of "select count(distinct id) from player" > Also, I've just tried to do a query to update all rows by "update player > set flatname=lower(name)" and that failed complaining about duplicates, yet > if you specify the id, the row is altered. Crazy huh? Once the index becomes corrupt, the behavior will be awfully unpredictable... regards, tom lane
On 2001.09.07 15:10 Tom Lane wrote: > Paul Green <traktion@webleicester.co.uk> writes: > > I am having a problem with duplicates appearing in a unique index and I > > have no idea why. > > Hmm. What PG version is this, on what platform? Redhat 7.1, (Linux 2.4.2-2) on a K6-2 300 with 128 mb ram running PG 7.0 using jdbc7.1-1.2.jar for databas access. > > Have you tried dropping and recreating the index? I've tried creating a new table and selecting * (create table blah as select * from player) and then creating an index on this, but it compains about duplicates and fails. What *is* wierd is that if I do, say, "select * from player where name='bob'" I get one result where as if I do a "select * from player where name like '%bob', I get many bobs (about 10) returned. Unless psql is removing white space and this is somehow creeping in (although I 'trim()' all values before insertion) I can't understand how this can be so. Also, if I do a "select count(id) from player" I get the same number as "select distinct count(id) from player" even though they should clearly be different. Also, I've just tried to do a query to update all rows by "update player set flatname=lower(name)" and that failed complaining about duplicates, yet if you specify the id, the row is altered. Crazy huh? > > > I can't say how often these duplicates slip through, but it seems to > only > > happen *very* rarely, but is obviously completely mangling my data. Not > > only that, but VACUUM ANALYZE is taking *days* to complete and I can > only > > assume this is due to these duplicates. The 'player' table contains > around > > 180000 records, so if this fall of in performance when running vacuum > can > > be associated with this, I'd appreciate some feedback. > > Clearly something broken here; that's not an especially large table... > > regards, tom lane > Yeah, something is definately going a bit odd. I could recrease the database starting from scratch (importing everything again etc), but if the same situation occurs, I'll just have wasted a lot of time, so if you have any ideas, I'd appreciate the feedback. Thanks in advance, Paul
On 2001.09.07 15:53 Tom Lane wrote: > Paul Green <traktion@webleicester.co.uk> writes: > >> Hmm. What PG version is this, on what platform? > > > Redhat 7.1, (Linux 2.4.2-2) on a K6-2 300 with 128 mb ram running PG > 7.0 > > > What *is* wierd is that if I do, say, "select * > > from player where name='bob'" I get one result where as if I do a > "select * > > from player where name like '%bob', I get many bobs (about 10) > returned. > > This is entirely consistent with the theory of a broken index. The %bob > query does a sequential scan and looks at every row, but the other will > do an index lookup and return only the rows it finds via the index. Ahh yes, that seems to make sense. > > Did you compile 7.0 with locale support? If so, a plausible theory is > that you've started the postmaster with different locale settings at > different times --- a change in locale would alter the effective sort > order, rendering what had been a valid index corrupt. The btree search > and update algorithms fail completely if the data in the index is not > really in the correct order. > > Unfortunately it is *very* easy to start the postmaster with > inconsistent locale environment, since what you have in your login > profile is often different from the environment that system boot scripts > run in. I didn't actually compile it at all - I used the default RPMs provided by Redhat. I'm unsure whether they have compiled in support or not, but I know I always start/stop postgres using the init.d scripts, which always start the database under the user postgres. As far as I'm aware, I've never changed the locale of the system/db either. I just checked using 'locale' to see how the system was set up under the usernames I use to log in to the machine and they are all en_GB anyway. I also checked /etc/sysconfig/i18n and that too was set to en_GB, so I think I'm safe on this front. Any ideas? > > 7.1 contains fixes that force PG always to use the initdb-time locale > setting, so as to eliminate this class of problems. I'd suggest an > upgrade. Meanwhile, you need to take steps to eliminate the duplicate > rows from your table. If you are convinced that the above is not the problem, I will try to delay the upgrade for now. If you think that it may still be a locale setting, then I guess I will have to though :) > > > Also, if I do a "select count(id) from player" I get the > > same number as "select distinct count(id) from player" even though they > > should clearly be different. > > Eh? Looks like the same thing to me. Perhaps you are thinking of > "select count(distinct id) from player" Yes, sorry, that is what I meant. > > > Also, I've just tried to do a query to update all rows by "update > player > > set flatname=lower(name)" and that failed complaining about duplicates, > yet > > if you specify the id, the row is altered. Crazy huh? > > Once the index becomes corrupt, the behavior will be awfully > unpredictable... :) > > regards, tom lane > -- Paul Green Programmer Jippii Midlands 85 London Road, Leicester, England, LE2 0PF. tel: 0116 2230662 fax: 0116 2221305 Please visit www.businessjippii.co.uk to view our latest Internet and Telecommunication Products and Services. We offer it all! Want to order a free dial up account online? Go to www.dialjippii.co.uk. Want a quick and easy way to get a domain name? Order online at www.domainsjippii.co.uk. For the latest ringtones, logos and fun games go to the Jippii portal at www.jippii.co.uk. NOTICE: This e-mail is strictly confidential and is intended solely for the person or organisation to whom it is addressed. It may contain privileged and confidential information and if you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this communication in error, please advise us by e-mail and delete the file from your system. Whilst all efforts are made to safeguard Inbound and Outbound e-mails, Webleicester Ltd., its subsidiaries or associates cannot guarantee that attachments are Virus-free or compatible with your system and does not accept any liability in respect of viruses or computer problems experienced. Any views expressed in this message are those of the individual sender, except where specifically stated to be the view of Webleicester Ltd., its subsidiaries or associates.
Paul Green <traktion@webleicester.co.uk> writes: > I didn't actually compile it at all - I used the default RPMs provided by > Redhat. I'm unsure whether they have compiled in support or not, They do... > but I know > I always start/stop postgres using the init.d scripts, which always start > the database under the user postgres. As far as I'm aware, I've never > changed the locale of the system/db either. I just checked using 'locale' > to see how the system was set up under the usernames I use to log in to the > machine and they are all en_GB anyway. I also checked /etc/sysconfig/i18n > and that too was set to en_GB, so I think I'm safe on this front. Nonetheless, it certainly appears that you've got a corrupt index, so I'm guessing that at one time or another the wrong locale environment was used. Even one out-of-order entry in the index will cause long-lasting problems. You could try a direct observation of index ordering, say SELECT name FROM table ORDER BY name; If you run this first with enable_sort turned off and second with enable_indexscan turned off, you should get indexscan and explicit-sort plans respectively (use EXPLAIN to make sure). Dump the outputs into files and compare... > Any ideas? My recommendation is to update to 7.1.3 in any case. regards, tom lane