Обсуждение: Cannot vacuum! Stops on first table, pg_type
Hi! This is really strange. Starting from the beginning, I was running postmaster with logs redirected to a file (on a different filesystem). This has been going OK for quite some time. Suddenly, it filled up quickly, and I had to rm it and restart the server this friday. Today, it happened again; the log file area was filled up completely by postgres' 700 Mbyte log file. I threw it away immediately, so unfortunately I don't have these logs handy (I think I can find some on tape, though...umm...). Have set up this machine to use rotatelogs(8) (from apache) now, like on most of my newer machines :) Also, when this happened, two postgres processes went amok filling up the data directory with pg_temp.$$* files. About 25000 files (more or less empty, if memory serves me) before I realized this had happened (maybe a minute or two...?) Ran out of file descriptors, it was really hot for a while :) I shut down postmaster and removed them, since nothing worked anymore while they were there (I guess postgres couldn't create more tempfiles in that dir :) Now, $PGDATA/pg_log is about 480 Meg. Hardly normal, heh? # ls -l /usr/local/pgsql/data/pg_log -rw------- 1 pgsql pgsql 471556096 Aug 29 02:09 /usr/local/pgsql/data/pg_log But: # df -k Filesystem 1K-blocks Used Avail Capacity Mounted on ... /dev/da0s1h 1986495 120813 1706763 7% /usr/local/pgsql/data Used 120 MB, but the file is 480 MB? One of those files with "holes" in them, I presume? Anyway, please note that the PGDATA file system was never filled up, only the log FS. Long story, sorry, but I thought it be best if no details were left out... The problem now: After the last incident, I cannot vacuum one database (the big one)... other DBs are all right, but that doesn't help me... :-/ vacuum stops on the very first table to vacuum, pg_type: StartTransactionCommand query: vacuum; ProcessUtility: vacuum pg_type; DEBUG: --Relation pg_type-- DEBUG: Pages 6: Changed 0, Reapped 2, Empty 0, New 0; Tup 354: Vac 22, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 105, MaxLen109; Re-using: Free/Avail. Space 8160/176; EndEmpty/Avail. Pages 0/1. Elapsed 0/0 sec. Here it can sit for hours (believe me, I've tried!) eating CPU time (top(1) reports 95-100%) Killing the vacuuming psql off (kill -TERM) gives: proc_exit(0) [#0] shmem_exit(0) [#0] exit(0) I seems, the database and it application is working OK, but my guess is some indices are probably broken, so things may fail that I cannot see immediately, and I can't vacuum to fix this... Can I do anything besides pg_dump; dropdb; createdb; psql < dump.sql? This server is live, and I'd rather not take it down for more than seconds, if possible. Also, this should really be recoverable, right? What about the large pg_log file? I use this combo: FreeBSD-3.5 RELEASE Postgres 6.5.2 -- Palle
check out the REINDEX command:
http://www.postgresql.org/docs/user/sql-reindex.htm
that may/may not help ...
On 29 Aug 2000, Palle Girgensohn wrote:
> Hi!
>
> This is really strange. Starting from the beginning, I was running
> postmaster with logs redirected to a file (on a different
> filesystem). This has been going OK for quite some time. Suddenly, it
> filled up quickly, and I had to rm it and restart the server this
> friday. Today, it happened again; the log file area was filled up
> completely by postgres' 700 Mbyte log file. I threw it away
> immediately, so unfortunately I don't have these logs handy (I think I
> can find some on tape, though...umm...). Have set up this machine to use
> rotatelogs(8) (from apache) now, like on most of my newer machines :)
>
> Also, when this happened, two postgres processes went amok filling up
> the data directory with pg_temp.$$* files. About 25000 files (more or
> less empty, if memory serves me) before I realized this had happened
> (maybe a minute or two...?) Ran out of file descriptors, it was really
> hot for a while :)
>
> I shut down postmaster and removed them, since nothing worked anymore
> while they were there (I guess postgres couldn't create more tempfiles
> in that dir :)
>
> Now, $PGDATA/pg_log is about 480 Meg. Hardly normal, heh?
>
> # ls -l /usr/local/pgsql/data/pg_log
> -rw------- 1 pgsql pgsql 471556096 Aug 29 02:09 /usr/local/pgsql/data/pg_log
>
> But:
>
> # df -k
> Filesystem 1K-blocks Used Avail Capacity Mounted on
> ...
> /dev/da0s1h 1986495 120813 1706763 7% /usr/local/pgsql/data
>
> Used 120 MB, but the file is 480 MB? One of those files with "holes"
> in them, I presume?
>
> Anyway, please note that the PGDATA file system was never filled up,
> only the log FS.
>
> Long story, sorry, but I thought it be best if no details were left
> out...
>
> The problem now: After the last incident, I cannot vacuum one database
> (the big one)... other DBs are all right, but that doesn't help
> me... :-/
>
> vacuum stops on the very first table to vacuum, pg_type:
>
> StartTransactionCommand
> query: vacuum;
> ProcessUtility: vacuum pg_type;
> DEBUG: --Relation pg_type--
> DEBUG: Pages 6: Changed 0, Reapped 2, Empty 0, New 0; Tup 354: Vac 22, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 105,
MaxLen109; Re-using: Free/Avail. Space 8160/176; EndEmpty/Avail. Pages 0/1. Elapsed 0/0 sec.
>
> Here it can sit for hours (believe me, I've tried!) eating CPU time
> (top(1) reports 95-100%)
>
> Killing the vacuuming psql off (kill -TERM) gives:
>
> proc_exit(0) [#0]
> shmem_exit(0) [#0]
> exit(0)
>
> I seems, the database and it application is working OK, but my guess
> is some indices are probably broken, so things may fail that I cannot
> see immediately, and I can't vacuum to fix this...
>
> Can I do anything besides pg_dump; dropdb; createdb; psql < dump.sql?
> This server is live, and I'd rather not take it down for more than
> seconds, if possible. Also, this should really be recoverable, right?
> What about the large pg_log file?
>
> I use this combo:
>
> FreeBSD-3.5 RELEASE
> Postgres 6.5.2
>
> --
> Palle
>
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
(Sorry for double posting, not used to this mail prog yet) The Hermit Hacker <scrappy@hub.org> writes: > check out the REINDEX command: > > http://www.postgresql.org/docs/user/sql-reindex.htm > > that may/may not help ... reindex is not available in 6.5.2, unfortunately... BTW, will dump/restore get everything back, even with the gigantic pg_log? -- Palle
Hello again, OK, I dumped && restored (had to fix a couple of duplicates that had been hidden by indices, not too bad), and updated to 6.5.3 + patches, so I guess that makes me happy for now. I never did like 6.5.2 anyway... ;-) -- Palle
On 29 Aug 2000, Palle Girgensohn wrote: > Hello again, > > OK, I dumped && restored (had to fix a couple of duplicates that had > been hidden by indices, not too bad), and updated to 6.5.3 + patches, > so I guess that makes me happy for now. I never did like 6.5.2 > anyway... ;-) would highly recommend moving to v7.x ...
The Hermit Hacker wrote: > > On 29 Aug 2000, Palle Girgensohn wrote: > > > Hello again, > > > > OK, I dumped && restored (had to fix a couple of duplicates that had > > been hidden by indices, not too bad), and updated to 6.5.3 + patches, > > so I guess that makes me happy for now. I never did like 6.5.2 > > anyway... ;-) > > would highly recommend moving to v7.x ... mmm... We are not through testing our application just yet. There are some distinct SQL differences, like select distinct *on*, for example. But yes, we use pg7 for all development, and I really love it. This is a "release" machine, though, and it isn't easily "tampered with"... need thorough tests first... :( -- Palle