Обсуждение: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
От
 
		    	"Oliver Elphick"
		    Дата:
		        I found the answer to this: the partition had filled up, and so the problem
was lack of disk space.
Could we have a more helpful error message?  I was just looking in the
wrong direction because of the contents of the message.
*** postgresql-7.1.1.orig/src/backend/access/transam/xlog.c    Tue May 22
16:45:14 2001
--- postgresql-7.1.1/src/backend/access/transam/xlog.c    Tue May 22 16:48:12
2001***************
*** 1334,1340 ****
              unlink(tmppath);
              errno = save_errno;
!             elog(STOP, "ZeroFill(%s) failed: %m", tmppath);
          }
      }
--- 1334,1340 ----
              unlink(tmppath);
              errno = save_errno;
!             elog(STOP, "ZeroFill failed to create or write %s: %m", tmppath);
          }
      }
--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47  6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "We are troubled on every side, yet not distressed; we
      are perplexed, but not in despair; persecuted, but not
      forsaken; cast down, but not destroyed; Always bearing
      about in the body the dying of the Lord Jesus, that
      the life also of Jesus might be made manifest in our
      body."        II Corinthians 4:8-10
			
		Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
От
 
		    	Bruce Momjian
		    Дата:
		        Looks safe. Patch applied. > I found the answer to this: the partition had filled up, and so the problem > was lack of disk space. > > Could we have a more helpful error message? I was just looking in the > wrong direction because of the contents of the message. > > *** postgresql-7.1.1.orig/src/backend/access/transam/xlog.c Tue May 22 > 16:45:14 2001 > --- postgresql-7.1.1/src/backend/access/transam/xlog.c Tue May 22 16:48:12 > 2001*************** > *** 1334,1340 **** > unlink(tmppath); > errno = save_errno; > > ! elog(STOP, "ZeroFill(%s) failed: %m", tmppath); > } > } > > --- 1334,1340 ---- > unlink(tmppath); > errno = save_errno; > > ! elog(STOP, "ZeroFill failed to create or write %s: %m", tmppath); > } > } > > > -- > Oliver Elphick Oliver.Elphick@lfix.co.uk > Isle of Wight http://www.lfix.co.uk/oliver > PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 > GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C > ======================================== > "We are troubled on every side, yet not distressed; we > are perplexed, but not in despair; persecuted, but not > forsaken; cast down, but not destroyed; Always bearing > about in the body the dying of the Lord Jesus, that > the life also of Jesus might be made manifest in our > body." II Corinthians 4:8-10 > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
"Oliver Elphick" <olly@lfix.co.uk> writes:
> I found the answer to this: the partition had filled up, and so the problem
> was lack of disk space.
> Could we have a more helpful error message?
Indeed.  I don't like your solution however, since it's just papering
over the real problem which is lack of a suitable error code from
write().  Evidently write() isn't setting errno as long as it's able
to write at least some data.  Perhaps we could do
    errno = 0;
    if (write(...) != expectedbytecount)
    {
        int    save_errno = errno;
        unlink(tmp);
        errno = save_errno ? save_errno : ENOSPC;
        elog(...);
    }
Comments?  Is it reasonable to guess that the problem must be ENOSPC
if write doesn't write all the bytes but also doesn't set errno?
Are there any systems that don't define ENOSPC?
            regards, tom lane
			
		On Wed, May 23, 2001 at 12:50:46PM -0400, Tom Lane wrote:
>     errno = 0;
>     if (write(...) != expectedbytecount)
>     {
>         int    save_errno = errno;
>
>         unlink(tmp);
>
>         errno = save_errno ? save_errno : ENOSPC;
>
>         elog(...);
>     }
>
> Comments?  Is it reasonable to guess that the problem must be ENOSPC
> if write doesn't write all the bytes but also doesn't set errno?
No, it could be any number of other things.  The first that comes to
mind is EINTR.  How about something closer to:
totalwritten = 0;
while(totalwritten < expectedbytecount) {
    lastwritten = write(...);
    if(lastwritten == -1) {
        /* errno is guaranteed to be set */
        if(errno == EINTR) {
            continue;
        }
        unlink(tmp);
        elog(...);
        break;
    } else if(lastwritten == 0) {
        /* errno should be 0.  Considering this an error is probably a
           BAD idea. */
        unlink(tmp);
        elog(...);
        break;
    } else {
        /* we got a partial write count.  No problem; try again. */
        totalwritten += lastwritten;
    }
}
Chris
--
chris@mt.sri.com -----------------------------------------------------
Chris Jones                                    SRI International, Inc.
                                                           www.sri.com
			
		Вложения
Chris Jones <chris@mt.sri.com> writes:
> No, it could be any number of other things.  The first that comes to
> mind is EINTR.  How about something closer to:
Writes to disk files don't suffer EINTR as far as I've ever heard
(if they do, there are an awful lot of broken programs out there).
More to the point, a kernel that aborted a write because of an interrupt
*and failed to set errno* would certainly be broken.  The question is
what to assume when we see that the write did not change errno.
            regards, tom lane
			
		On Wed, May 23, 2001 at 11:39:15AM -0600, Chris Jones wrote:
> No, it could be any number of other things.  The first that comes to
> mind is EINTR.  How about something closer to:
Hmm.  Actually, write(2) shouldn't return EINTR; it should return a
short read count.  But other errno values include EDQUOT and EFBIG.
So the code I suggested is not very good, either.  Better to just do:
> totalwritten = 0;
> while(totalwritten < expectedbytecount) {
>     lastwritten = write(...);
>     if(lastwritten == -1) {
>         /* errno is guaranteed to be set */
>         unlink(tmp);
>         elog(...);
>         break;
>     } else {
>         /* we got a partial write count.  No problem; try again. */
>         totalwritten += lastwritten;
>     }
> }
Chris
--
chris@mt.sri.com -----------------------------------------------------
Chris Jones                                    SRI International, Inc.
                                                           www.sri.com
			
		Вложения
On Wed, May 23, 2001 at 01:47:37PM -0400, Tom Lane wrote:
> Chris Jones <chris@mt.sri.com> writes:
> > No, it could be any number of other things.  The first that comes to
> > mind is EINTR.  How about something closer to:
>
> Writes to disk files don't suffer EINTR as far as I've ever heard
> (if they do, there are an awful lot of broken programs out there).
Yeah, my mistake.
> More to the point, a kernel that aborted a write because of an interrupt
> *and failed to set errno* would certainly be broken.  The question is
> what to assume when we see that the write did not change errno.
If write didn't return -1, it shouldn't have set errno.  A short write
count isn't an error condition.
Chris
--
chris@mt.sri.com -----------------------------------------------------
Chris Jones                                    SRI International, Inc.
                                                           www.sri.com
			
		Вложения
Chris Jones <chris@mt.sri.com> writes:
>> /* we got a partial write count.  No problem; try again. */
>> totalwritten +=3D lastwritten;
No.  An infinite loop is NOT an acceptable response to running out of
disk space.  This is a disk file we are writing, not a socket.
            regards, tom lane
			
		Chris Jones <chris@mt.sri.com> writes:
> If write didn't return -1, it shouldn't have set errno.  A short write
> count isn't an error condition.
On disk files it certainly is; there's no non-error reason to do that,
and AFAICS no reason for the application to try again.
            regards, tom lane
			
		On Wed, May 23, 2001 at 02:18:31PM -0400, Tom Lane wrote:
> No.  An infinite loop is NOT an acceptable response to running out of
> disk space.  This is a disk file we are writing, not a socket.
Ack.  You're right, of course.  Sorry for the noise.
Chris
--
chris@mt.sri.com -----------------------------------------------------
Chris Jones                                    SRI International, Inc.
                                                           www.sri.com
			
		Вложения
Ian Lance Taylor <ian@airs.com> writes:
> Probably true, but on Unix you certainly can't assume that write will
> set errno if it does not return -1.
Right.  The code you propose is isomorphic to what I suggested
originally.  The question is which error condition should we assume
if errno has not been set; is disk-full sufficiently likely to be the
cause that we should just say that, or are there plausible alternatives?
            regards, tom lane
			
		Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
От
 
		    	Ian Lance Taylor
		    Дата:
		        Tom Lane <tgl@sss.pgh.pa.us> writes:
> Chris Jones <chris@mt.sri.com> writes:
> > If write didn't return -1, it shouldn't have set errno.  A short write
> > count isn't an error condition.
>
> On disk files it certainly is; there's no non-error reason to do that,
> and AFAICS no reason for the application to try again.
Probably true, but on Unix you certainly can't assume that write will
set errno if it does not return -1.  On Linux systems, for example,
this does not happen.  As Chris says, Posix only promises to set errno
if there is an error indication.  The only error indication for write
is a return of -1.
A portable way to check whether errno was set would be to do something
like
    errno = 0;
    if (write(...) != ...)
    {
        if (errno == 0)
            error("unexpected short write--disk full?")
        else
            error("write failed: %s", strerror(errno));
    }
Ian
			
		Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
От
 
		    	"Denis A. Doroshenko"
		    Дата:
		        On Wed, May 23, 2001 at 02:04:51PM -0400, Tom Lane wrote:
> Chris Jones <chris@mt.sri.com> writes:
> > If write didn't return -1, it shouldn't have set errno.  A short write
> > count isn't an error condition.
>
> On disk files it certainly is; there's no non-error reason to do that,
> and AFAICS no reason for the application to try again.
i've tried to get partial write on disk shortage condition and had no
success. on OpenBSD, if there is no space write() seems to write the
whole buffer or fail with -1/errno. i used such proggie attached to
the and (owell, i'm not sure about forks, but it adds more
simultaneosity... huh?). BTW. i didn't see anywhere i looked whetjer
write on disk files can fail after writting some part of data.
--
Denis A. Doroshenko  [GPRS/IN/WAP, VAS group engineer] .-.        _|_  |
[Omnitel Ltd., T.Sevcenkos st. 25, Vilnius, Lithuania] | | _ _  _ .| _ |
[Phone: +370 9863486 E-mail: d.doroshenko@omnitel.net] |_|| | || |||(/_|_
---[a.c]------------------------------------------------------------
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define SIZ        (12345)
#define CHILDREN    (5)
#define FILE        "/tmp/garbage.XXXXXXXXXX"
int
main (void)
{
    char *buf;
    char *file;
    int fd, i, j, rc;
    warnx("[%d] allocating %d of memory", getpid(), SIZ);
    if ( (buf = malloc(SIZ)) == NULL )
        err(1, "malloc()");
    file = strdup(FILE);
    warnx("[%d] creating %s", getpid(), file);
    if ( (fd = mkstemp(file)) == -1 )
        err(1, "open()");
    warnx("[%d] forking...", getpid());
    for ( j = 0; j < CHILDREN; j++ ) {
        if ( fork() == 0 ) {
            warnx("[%d:%d]: filling %s with junk",
                  getppid(), j, file);
            for ( i = 0; ; i++ ) {
                if ( (rc = write(fd, buf, SIZ)) == -1 ) {
                    warn("[%d:%d] write()", getppid(), j);
                    break;
                }
                if ( rc == SIZ ) {
                    (void)fputc(j + '0', stderr);
                    continue;
                }
                warn("[%d:%d] write(%d written)",
                     getppid(), j, rc);
            }
            (void)close(fd);
            return (0);
        }
    }
    /* father */
    while ( (j = wait(&i)) != - 1 )
        ;
    warnx("[%d] destroying %s", getpid(), file);
    (void)close(fd);
    (void)unlink(file);
    return (0);
}
			
		Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
От
 
		    	Ian Lance Taylor
		    Дата:
		        Tom Lane <tgl@sss.pgh.pa.us> writes: > Ian Lance Taylor <ian@airs.com> writes: > > Probably true, but on Unix you certainly can't assume that write will > > set errno if it does not return -1. > > Right. The code you propose is isomorphic to what I suggested > originally. The question is which error condition should we assume > if errno has not been set; is disk-full sufficiently likely to be the > cause that we should just say that, or are there plausible alternatives? Sufficiently likely? Dunno. I can think of some other possibilities. If the file is on a file system mounted via NFS or any other remote file system, you might get any number of errors. If there is a disk error after at least one disk block has been copied and written, the kernel might return a short count. If the kernel is severely overloaded, and fails to allocate a buffer after allocating and writing at least one buffer successfully, it might return a short count. If the file is very large, and the write would push it over the maximum file size, you might get a short count up to the maximum file size. A similar case might happen if the file is closed to the process resource limit (RLIMIT_FSIZE). I assume we can rule out cases like a write from a buffer at the end of user memory such that some data can be copied into kernel space and then a segmentation violation occurs--on some systems that could cause a short count if a full block can be written before the invalid memory is reached. Obviously a full disk is the most likely case. This is particularly true if the write is for less than a full disk block. But otherwise I could believe that at least the disk error case might happen to somebody someday. Ian
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
От
 
		    	Ian Lance Taylor
		    Дата:
		        "Denis A. Doroshenko" <d.doroshenko@omnitel.net> writes: > i've tried to get partial write on disk shortage condition and had no > success. on OpenBSD, if there is no space write() seems to write the > whole buffer or fail with -1/errno. i used such proggie attached to > the and (owell, i'm not sure about forks, but it adds more > simultaneosity... huh?). BTW. i didn't see anywhere i looked whetjer > write on disk files can fail after writting some part of data. Try writing more bytes in a single call to write(). Like, 100000 bytes or something. You will only get a short return from write() if you write more than the disk block size. On modern file systems the disk block size can get fairly large. Ian
Re: Re: ZeroFill(.../pg_xlog/xlogtemp.20148) failed: No such file or directory
От
 
		    	"Denis A. Doroshenko"
		    Дата:
		        On Wed, May 23, 2001 at 02:24:44PM -0700, Ian Lance Taylor wrote: > "Denis A. Doroshenko" <d.doroshenko@omnitel.net> writes: > > Try writing more bytes in a single call to write(). Like, 100000 > bytes or something. > > You will only get a short return from write() if you write more than > the disk block size. On modern file systems the disk block size can > get fairly large. the program i sent, had 800K blocks, but believe me, the first variant has been using 1M writes. the result was the same... -- Denis A. Doroshenko [GPRS/IN/WAP, VAS group engineer] .-. _|_ | [Omnitel Ltd., T.Sevcenkos st. 25, Vilnius, Lithuania] | | _ _ _ .| _ | [Phone: +370 9863486 E-mail: d.doroshenko@omnitel.net] |_|| | || |||(/_|_