Обсуждение: Could not create relation: File exists error
Hi PG Listers, Postgres newbie here with a DB maintenance problem. I'm running a nightly maintenance script on a PG database with ~50 tablesand ~2.8 million rows distributed rather unevenly across those tables. It is updated hourly with 7-10k new rows perhour. Every night I run a bash script that contains sequential psql calls to do various maintenance jobs on the DB. The hourlyupdating is paused during this maintenance time. The maintenance script: 1. Clusters several of the indexes on the largest 5 tables 2. Deletes data older than 2 weeks from a subset (10) of the tables 3. Reindexes the tables I've deleted info from (10) 4. Vacuum full analyze on the entire DB This script has been running successfully for several months (70-90 min each night). Out of the blue I'm now getting thefollowing error message after each psql command, except the last vacuum full analyze command: ERROR: could not create relation 1663/835021/4294967254: File exists The last set of numbers iterates upwards slowly while the first two sets have not changed over the several nights that thisproblem has occurred. The script eventually does finish its work but ~3x slower. Subsequent hourly updates run smoothlyas do normal database queries. I have tried rebooting the server and also restarting postmaster with no success. I have also run several of the psql commandindividually and have had no problems. Other databases on the same server run similar nightly maintenance routinesand have not reported any similar issues. Having a hard time finding others who have run into this problem? Anyadvice is greatly appreciated - Thanks! Postgres 8.0.8 with PostGIS 1.0.4 on FC 4 Jesse -- Jesse Cleary Department of Marine Sciences UNC Chapel Hill 334 Chapman Hall (919) 962-4987 jcleary@email.unc.edu
Jesse Cleary <jcleary@email.unc.edu> writes: > This script has been running successfully for several months (70-90 min each night). Out of the blue I'm now getting thefollowing error message after each psql command, except the last vacuum full analyze command: > ERROR: could not create relation 1663/835021/4294967254: File exists Hm, it seems suggestive that the last number is only a few counts short of UINT_MAX. Not sure what to make of it though. Do you in fact have files under $PGDATA/base/835021/ with names in that range? regards, tom lane
Tom Lane wrote: > Jesse Cleary <jcleary@email.unc.edu> writes: > >> This script has been running successfully for several months (70-90 min each night). Out of the blue I'm now gettingthe following error message after each psql command, except the last vacuum full analyze command: >> > > >> ERROR: could not create relation 1663/835021/4294967254: File exists >> > > Hm, it seems suggestive that the last number is only a few counts short > of UINT_MAX. Not sure what to make of it though. Do you in fact have > files under $PGDATA/base/835021/ with names in that range? > > regards, tom lane > > Thanks Tom Some further digging: I see that the last numbers in the path repeat with every nightly run. So I'm getting the exact same error messages about the same files every night. They range from 4294840092-94 4294967252-58 4294967264-67 All of these files do exist in $PGDATA/base/835021/ and all have a file size > 0 and a current timestamp coincident with the latest of my hourly update runs, not the maintenance script run. One exception is notable - $PGDATA/base/835021/4294967264 - this file has a size of 0 and a timestamp that coincides with the date and time range of the nightly run when these errors first appeared. This seems like a good clue, but not sure what I should do next? Thanks... Jesse -- Jesse Cleary Department of Marine Sciences UNC Chapel Hill 334 Chapman Hall (919) 962-4987 jcleary@email.unc.edu
Jesse Cleary <jcleary@email.unc.edu> writes: > One exception is notable - $PGDATA/base/835021/4294967264 - this file > has a size of 0 and a timestamp that coincides with the date and time > range of the nightly run when these errors first appeared. This seems > like a good clue, but not sure what I should do next? Thanks... Hm. I guess the next question is whether any of those 429... numbers appear in pg_class.relfilenode of your database? regards, tom lane
Jesse Cleary <jcleary@email.unc.edu> writes: > This script has been running successfully for several months (70-90 min each night). Out of the blue I'm now getting thefollowing error message after each psql command, except the last vacuum full analyze command: > ERROR: could not create relation 1663/835021/4294967254: File exists > Postgres 8.0.8 with After looking back at the 8.0 code I'm pretty sure I know approximately what is happening, though not the exact details. Somehow, CheckMaxObjectId is firing and forcing the OID counter up to almost-maximum, which constrains the numbers that REINDEX and CLUSTER try to select as file names. And there wasn't any code in 8.0 to recover from a chance filename collision, hence the error. A fairly likely cause for this is that one of the tables being CLUSTERed has OIDs and there is a row with an almost-maximum OID in there --- when the row is copied across to the newly clustered table, its OID would be shown to CheckMaxObjectId. So every night, the OID counter would have the exact same value just after the CLUSTER step, and subsequent reindexes would always try to pick the same filenames as they did before. We fixed that whole horrid mess in 8.1, so really the best answer would be to update to 8.1 or 8.2. If you can't do that, are you actually using the OIDs in these tables? If not, "ALTER TABLE SET WITHOUT OIDS" would be a good and quick fix. Failing that, I think you need to find the high-numbered OIDs and get rid of them (just delete and reinsert the rows should work). regards, tom lane
Tom Lane wrote: > Jesse Cleary <jcleary@email.unc.edu> writes: > >> This script has been running successfully for several months (70-90 min each night). Out of the blue I'm now gettingthe following error message after each psql command, except the last vacuum full analyze command: >> ERROR: could not create relation 1663/835021/4294967254: File exists >> > > >> Postgres 8.0.8 with >> > > After looking back at the 8.0 code I'm pretty sure I know approximately > what is happening, though not the exact details. Somehow, > CheckMaxObjectId is firing and forcing the OID counter up to > almost-maximum, which constrains the numbers that REINDEX and CLUSTER > try to select as file names. And there wasn't any code in 8.0 to > recover from a chance filename collision, hence the error. > > A fairly likely cause for this is that one of the tables being > CLUSTERed has OIDs and there is a row with an almost-maximum OID in > there --- when the row is copied across to the newly clustered table, > its OID would be shown to CheckMaxObjectId. So every night, the OID > counter would have the exact same value just after the CLUSTER step, > and subsequent reindexes would always try to pick the same filenames > as they did before. > > We fixed that whole horrid mess in 8.1, so really the best answer > would be to update to 8.1 or 8.2. If you can't do that, are you > actually using the OIDs in these tables? If not, "ALTER TABLE SET > WITHOUT OIDS" would be a good and quick fix. Failing that, I think > you need to find the high-numbered OIDs and get rid of them > (just delete and reinsert the rows should work). > > regards, tom lane > > Thanks Tom - that seems to be it. OIDs on each table were up in the 4.29496 billion range and two tables had max OIDs just exactly prior to the error message filename IDs. Updating PG is not an option for now, but I can drop the OID field as you suggest. We actually ran into another OID max issue with a previous instance of this DB and so have a seq field to supply a unique ID in our mapping application instead of using OID. I didn't realize that our OID field was even still around 'til now. I ran my CLUSTER and REINDEX commands on the tables I dropped the OID from and they ran smoothly. Will keep the list posted if my nightly maintenance acts weird, but I think this will fix it. Thanks so much for your advice and time solving this - I and many others on the list really appreciate your efforts... Jesse -- Jesse Cleary Department of Marine Sciences UNC Chapel Hill 334 Chapman Hall (919) 962-4987 jcleary@email.unc.edu