Re: pg_xlog disk full error, i need help
От | Janning Vygen |
---|---|
Тема | Re: pg_xlog disk full error, i need help |
Дата | |
Msg-id | 200503292041.01234.vygen@gmx.de обсуждение исходный текст |
Ответ на | Re: pg_xlog disk full error, i need help (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
Am Dienstag, 29. März 2005 16:37 schrieb Tom Lane: > Janning Vygen <vygen@gmx.de> writes: > > Am Montag, 28. März 2005 18:06 schrieb Tom Lane: > >> The only way for pg_xlog to bloat vastly beyond what it's supposed to be > >> (which is to say, about twice your checkpoint_segments setting) is if > >> checkpoints are somehow blocked from happening. The only mechanism I > >> know about for that is that in 7.4.* (maybe 7.3.* too) a very large > >> btree CREATE INDEX or REINDEX operation can block checkpoints until it > >> completes. Did you have something like that going on? > > > > It looks like something/someone wrote so much data in my table that the > > nightly clustering process just didn't succeed because of disk full > > failure after writing too many pg_xlog files. The writing of so many > > pg_xlog files now makes more sense to me when clustering 68 GByte of > > data. > > Yeah, CLUSTER proceeds by rewriting the table and then invoking REINDEX > on each index, so the checkpoint lockout problem will apply. Tell you > the truth, the best and perhaps only answer for you is to update to 8.0 > where that problem is solved. > > > How do i get the tablename using this filenode? (restarting the database > > is not an option) > > Sure it is. pg_resetxlog will allow you to restart ... possibly you > will lose some transactions, but if the only thing going on was the > CLUSTER, nothing of value will be lost. * I just can't restart it: I zipped all my pg_xlog files in the crashed database to have enough space to get my backup running. As my database server is not in my LAN i can't download 100 GB of files and i can't gunzip all the pg_xlog files again to start it on the same server. So i could delete all files in my pg_xlog directory and then try to start the database with another compiled instance of postgresql. But as this is a production database, its not a good idea. And i cant move 100 GB (or only 60 GB in the base directory) to another server because no server has enough space nor is transfering 60 GB very cheap. I have no idea how to get it running again. * But what i really want to know is how to interpret the results of pg_filedump. I didn't found any documentation besides the README. Most output is easy to understand, others are not. example: what means "Flags: USED" in a data item? * anyway: i am still confused how my table could get this big over night. I stop all database activities by shutting down apache in a nightly cronjob and then my nightly job runs pg_dump -Fc $DBNAME > $BACKUP_FILE psql -c 'SELECT update_tspt_aktuell();' $DBNAME psql -c 'CLUSTER;' $DBNAME a) The dump file ist just fine and the one table is not as big as 60 GB! The whole base directory after reinstalling is 1.4 GB. b) The Function is this: ---------------------------- CREATE OR REPLACE FUNCTION update_tspt_aktuell () RETURNS integer LANGUAGE 'plpgsql' AS ' DECLARE var_count integer; BEGIN UPDATE Tippspieltage SET tspt_aktuell = false WHERE tspt_aktuell; UPDATE Tippspieltage SET tspt_aktuell = true FROM ( SELECT DISTINCT ON (tspt2sp.tr_kurzname) tspt2sp.tr_kurzname, tspt2sp.tspt_sort, MIN(abs(EXTRACT(epoch FROM date_trunc(''day'', sp.sp_termin) - CURRENT_DATE))) AS timediff FROM Tippspieltage2Spiele AS tspt2sp LEFT JOIN Spiele AS sp USING (sp_id) GROUP BY tspt2sp.tr_kurzname, tspt2sp.tspt_sort ORDER BY tspt2sp.tr_kurzname, timediff ASC, tspt2sp.tspt_sort DESC ) as tspt_akt WHERE Tippspieltage.tr_kurzname = tspt_akt.tr_kurzname AND Tippspieltage.tspt_sort = tspt_akt.tspt_sort ; GET DIAGNOSTICS var_count = ROW_COUNT; RETURN var_count; END; '; ---------------------------- my cron mail reports success and modified rows: update_tspt_aktuell --------------------- 5872 (1 row) So there is no reason i can see that this function produced 60 GB of data. c) after this function cluster fails. The Clustering fails starting with "PANIC: could not write to file "/home/postgres/data/pg_xlog/xlogtemp.24223": No space left on device" the cron job took 2:15 hours to run and to report this failure. I guess the clustering is somewhat broken (maybe because i use many multi-column natural keys) * conclusion: i think i will stop clustering every night until i upgraded to 8.0, but it leaves me very unsatisfied not to know the reason for this kind of db failure (or human error or whatever it is) > What I would expect to be happening in a CLUSTER is that there would be > an "old" file plus a "new" file of similar size, for both the table > itself and each index that's been processed (up to the point of failure, > where you will have a partially-written new index). After restart with > this method, you will find only the "old" files listed in pg_class. > You'll want to manually delete the unreferenced "new" files. Ok but the clustering should not scale the file from 500 MB to 64 GB (separated on many files). another strange thing: i looked at one of my database servers. and it has checkpoint_segments = 30 and 62 files in pg_xlog instead of 2*30+1. Normally i wouldn't care, but maybe there is something wrong. i really appreciate your help so far. Would be even nicer to send some more thoughts to my questions as i am willing to learn. kind regards, janning
В списке pgsql-general по дате отправления: