Обсуждение: HINT: Perhaps out of disk space?
I'm investigating a problem that happened last night and I would appreciate any recommendations. The logs indicate that the disks were full, but I truly doubt that since we only use about 14GB out of the available 65GB. I found entries like this in the logs: ERROR: could not write block 2354 of temporary file: No space left on device HINT: Perhaps out of disk space? .... ERROR: could not extend relation "parent_table": No space left on device HINT: Check free disk space. .... LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on device According to the logs, the problem went away after a reboot. I wonder if the kernel or the RAID device got confused and postgres was simply echoing what it was told. We run a couple hundred postgres servers and we have not seen this before (except when the disks truly were full). Everything is in the root filesystem, which has plenty of room. Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 67756724 14344392 49970408 23% / tmpfs 1034768 0 1034768 0% /dev/shm PostgreSQL 7.4.7 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc (GCC) 3.3.5 (Debian 1:3.3.5-12) Debian Sarge with Linux kernel 2.4.27-2-686-smp Dell PowerEdge 1800 Dell MegaRAID PERC 4/DC RAID Controller, 128MB cache w/BBU 2x SEAGATE Cheetah 10K.7 ST373207LC in RAID 1 (mirroring) Folks are a little jittery because our customers do very heavy business this month and we don't want frantic support calls when we should be drinking eggnog. -Mike
Michael Adler <adler@pobox.com> writes: > I'm investigating a problem that happened last night and I would > appreciate any recommendations. The logs indicate that the disks were > full, but I truly doubt that since we only use about 14GB out of the > available 65GB. > I found entries like this in the logs: > ERROR: could not write block 2354 of temporary file: No space left on device > HINT: Perhaps out of disk space? > .... > ERROR: could not extend relation "parent_table": No space left on device > HINT: Check free disk space. > .... > LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on device > According to the logs, the problem went away after a reboot. I wonder > if the kernel or the RAID device got confused and postgres was simply > echoing what it was told. We run a couple hundred postgres servers and > we have not seen this before (except when the disks truly were full). I'm inclined to think that a query created a 50GB temporary file ... the postmaster cleans out temp files when restarted, so that would have destroyed the evidence. regards, tom lane
On Fri, Dec 23, 2005 at 11:36:54AM -0500, Tom Lane wrote: > Michael Adler <adler@pobox.com> writes: > > I'm investigating a problem that happened last night and I would > > appreciate any recommendations. The logs indicate that the disks were > > full, but I truly doubt that since we only use about 14GB out of the > > available 65GB. > > > I found entries like this in the logs: > > > ERROR: could not write block 2354 of temporary file: No space left on device > > HINT: Perhaps out of disk space? > > .... > > ERROR: could not extend relation "parent_table": No space left on device > > HINT: Check free disk space. > > .... > > LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on device > > > According to the logs, the problem went away after a reboot. I wonder > > if the kernel or the RAID device got confused and postgres was simply > > echoing what it was told. We run a couple hundred postgres servers and > > we have not seen this before (except when the disks truly were full). > > I'm inclined to think that a query created a 50GB temporary file ... > the postmaster cleans out temp files when restarted, so that would > have destroyed the evidence. I'm curious about what could have resulted in so much temporary storage for a database that fits entirely in 2.5GB space. I can imagine taking the largest table and joining it against itself many times without a WHERE clause. What else would use a lot of temp storage? How long would it take to clean out 50GB of temp files? It looks like the postmaster was able to start up instantly after the reboot (ready less than 1 second after "LOG: database system was shut down at...") I really appreciate any guidance you could offer. -Mike
On Fri, 23 Dec 2005 13:42:13 -0500, Michael Adler wrote: > On Fri, Dec 23, 2005 at 11:36:54AM -0500, Tom Lane wrote: >> Michael Adler <adler@pobox.com> writes: >> > I'm investigating a problem that happened last night and I would >> > appreciate any recommendations. The logs indicate that the disks were >> > full, but I truly doubt that since we only use about 14GB out of the >> > available 65GB. >> >> > I found entries like this in the logs: >> >> > ERROR: could not write block 2354 of temporary file: No space left on device >> > HINT: Perhaps out of disk space? >> > .... >> > ERROR: could not extend relation "parent_table": No space left on device >> > HINT: Check free disk space. >> > .... >> > LOG: could not close temporary statistics file "/var/lib/postgres/data/global/pgstat.tmp.1464": No space left on device >> >> > According to the logs, the problem went away after a reboot. I wonder >> > if the kernel or the RAID device got confused and postgres was simply >> > echoing what it was told. We run a couple hundred postgres servers and >> > we have not seen this before (except when the disks truly were full). > I really appreciate any guidance you could offer. > Are there any errors about running out of shared memory? I have seen the "No space left on device" error for that on FreeBSD before.