Обсуждение: Vacuum full - disk space eaten by WAL logfiles

Поиск
Список
Период
Сортировка

Vacuum full - disk space eaten by WAL logfiles

От
"Thomas Madsen"
Дата:
Hi!
 
I ran in to a serious problem with a server running Postgresql 7.2.1. I know that an upgrade would be in its place, but this is not an option at the moment.
 
The problem started with the server running out of space (toast table galore) due to a too low max_fsm_pages number (1,000,000 was too low on this particular server). 
I raised it to 3,000,000 and tried a "VACUUM FULL" on the table. Out popped an error when it reached the toast table index: "Number of tuples (xxx) not same at number in heap (XXX), recreate index". The numbers (XXX) differed by just 2. It then started recreating the index, but stopped short when another error popped up: "insertion of duplicate keys"-failure.
 
I have subsequently attempted to reindex the tables/indexes, but it didn't help a bit. I still got the: "Number of tuples (xxx) not same at number in heap (XXX), recreate index"- error.
 
I then tried to drop the indexes on the table in question, and then run another "VACUUM FULL". The result is this:
SNIP ------------------------------------------------------------------------------------------------------
backend>  vacuum full infected;
DEBUG:  --Relation infected--
DEBUG:  Pages 71795: Changed 0, reaped 16560, Empty 0, New 0; Tup 748830: Vac 0, Keep/VTL 0/0, UnUsed 81433, MinLen 142, MaxLen 2032; Re-using: Free/Avail. Space 13279172/10747696; EndEmpty/Avail. Pages 0/21243.
        CPU 2.05s/0.24u sec elapsed 69.52 sec.
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  Rel infected: Pages: 71795 --> 71795; Tuple(s) moved: 0.
        CPU 1.37s/1.23u sec elapsed 112.46 sec.
DEBUG:  --Relation pg_toast_16581--
DEBUG:  Pages 1517406: Changed 48938, reaped 1134683, Empty 217179, New 0; Tup 1598162: Vac 201754, Keep/VTL 0/0, UnUsed 4053916, MinLen 45, MaxLen 2034; Re-using: Free/Avail. Space 9420709796/9419556500; EndEmpty/Avail. Pages 0/1354671.
        CPU 38.72s/5.26u sec elapsed 850.95 sec.
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
....
...
...
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
FATAL 2:  ZeroFill failed to write /var/lib/pgsql/data//pg_xlog/xlogtemp.25146: No space left on device
DEBUG:  shutting down
FATAL 2:  open of /var/lib/pgsql/data//pg_xlog/00000023000000FE (log file 35, segment 254) failed: No such file or directory
---------------------------------------------------------------------------------------------------------------------/SNIP
 
The last lines keep coming until no more diskspace is left. I searched this forum and got some hints to this problem.
I raised the "checkpoint" default values in postgres.conf to these values, which should have decreased the need for diskspace:
-------------
checkpoint_segments = 12   # in logfile segments (16MB each), min 1
checkpoint_timeout = 3600  # in seconds, range 30-3600
-------------
But the log files keep coming. And my original problem of vacuuming full to reclaim space is unsolved....
 
I then moved pg_xlog to a different partition with 2 gigs of diskspace, and raised wal_files further from 24 to 46. Ran "VACUUM FULL" once more.
Those 2 gigs was also used up pretty quickly...
 
So this is where i am now... desperate! Any suggestions will be greatly appriciated!!!
What can i do to fix this mess??
 
Regards,
Thomas.

Re: Vacuum full - disk space eaten by WAL logfiles

От
Tom Lane
Дата:
"Thomas Madsen" <tm@softcom.dk> writes:
> I raised it to 3,000,000 and tried a "VACUUM FULL" on the table. Out
> popped an error when it reached the toast table index: "Number of tuples
> (xxx) not same at number in heap (XXX), recreate index". The numbers
> (XXX) differed by just 2. It then started recreating the index, but
> stopped short when another error popped up: "insertion of duplicate
> keys"-failure.

VACUUM FULL will not try to "recreate the index", so you misinterpreted
what you were seeing.

> I have subsequently attempted to reindex the tables/indexes, but it
> didn't help a bit. I still got the: "Number of tuples (xxx) not same at
> number in heap (XXX), recreate index"- error.

I have some vague memory that it's difficult to get the system to accept
a REINDEX on a toast table or toast index in older versions.  REINDEXing
the base table will *not* fix it, you need to specify the toast object
by name.  Does it let you do that?

> The last lines keep coming until no more diskspace is left. I searched
> this forum and got some hints to this problem.
> I raised the "checkpoint" default values in postgres.conf to these
> values, which should have decreased the need for diskspace:

It's checkpoints that are responsible for recycling WAL files.
It sounds to me like checkpoints just aren't happening at all, which
seems odd.  Does a checkpoint subprocess appear while this is going on?

It might well be that your best bet is to dump and reload that
particular table.  You don't necessarily have to use pg_dump, you
could do something like
    CREATE TABLE dup AS SELECT * from badtable;
    TRUNCATE badtable;
    INSERT INTO badtable SELECT * FROM dup;
    DROP TABLE dup;
(7.2 had TRUNCATE didn't it?  I forget.)

In any case I would *strongly* recommend an update to 7.2.5, which you
can do in-place without a dump/reload.  We do not make bugfix releases
on a whim --- there are some pretty serious bugs fixed between 7.2.1 and
7.2.5.

            regards, tom lane