PG 7.1.2 Crash: cannot read xlog dir

Поиск
Список
Период
Сортировка
От kay
Тема PG 7.1.2 Crash: cannot read xlog dir
Дата
Msg-id NGBBKFMOILMAGDABPFEGEEADENAA.efesar@nmia.com
обсуждение исходный текст
Ответы Re: PG 7.1.2 Crash: cannot read xlog dir
Список pgsql-admin
My situation: PG 7.1.2, Redhat 7.2, running in a chroot jail on a "VDS"
server at my new ISP. I can't recompile anything, can't upgrade PG
(basically, I'm stuck with 7.1.2).

This issue was previously noted in a thread in late 2002. The actual thread
that Tom Lane suggests it might be a permissions issue is missing from the
archive, but I found it in Google's cache ( for two Webcrawler docs:
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%22cannot+read+xlog+d
ir%22+&btnG=Google+Search ). As to why they aren't on
archives.postgresql.org ... ya got me.

I changed permissions to the most permissive setting I know (0777), plus I
own the directory, I own the files, and I own the postmaster process, so the
only thing I can think is that 'readdir' is badly linked or has some freaky
kernel interaction. I have Python, perl and PHP on the system, and they all
use 'opendir' and 'readdir' and 'closedir' just fine on the pg_xlog
directory.

My problem: I've deduced that the 'readdir' call is broken in my PG. I
examined the source code for 7.1 very very thoroughly (
http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/access/t
ransam/xlog.c?rev=1.65.2.1&content-type=text/x-cvsweb-markup&only_with_tag=R
EL7_1_STABLE see MoveOfflineLogs). What I've found is that 'opendir' seems
to open the directory fine (does not return a NULL value), but when
'readdir' tries to grab a filename something bombs with a file system error
'No such file or directory' and it returns a NULL and 'errno' gets set. The
strange thing is that it gets in there ONCE and does ONE file
(0000000000000000) and then it won't do anymore, ever again, until I stop
the server and run initdb again.

At this point I know that there's nothing wrong with the XLOG directory or
the files in it, because PG has been writing transactions fine for 7-8 hours
up to this point. It can only be a bad 'readdir' call.

My question: Is there some runtime setting I can use to prevent
MoveOfflineLogs() from ever being called? I would MUCH rather have a couple
of old XLOGs lying around than a fatal crash. Maybe by CHECKPOINTing every
hour or something ... I've tried playing with a bunch of different WAL
settings and ... I can't stop MoveOfflineLogs from being called.

Please keep in mind my hands are tied, and I can't recompile and I can't
upgrade. Even if I could upgrade, I imagine that 'readdir' would still be
broken, and I'd still have this issue.

If anybody can think of a workaround I'd really appreciate it. I've been
racking my brain on this for a week.

Thanks

-Keith


==================

Here's the log.

/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 24626 exited with status 0
XLogFlush: rqst 0/12259528; wrt 0/0; flsh 0/0
XLogFlush: rqst 0/17078212; wrt 0/17078248; flsh 0/17078248
XLogFlush: rqst 0/17078152; wrt 0/17078248; flsh 0/17078248
XLogFlush: rqst 0/0; wrt 0/17078248; flsh 0/17078248
INSERT @ 0/17078248: prev 0/17078212; xprev 0/0; xid 0: XLOG - checkpoint:
redo 0/17078248; undo 0/0; sui 28; xid 3495; oid 36603; online
XLogFlush: rqst 0/17078312; wrt 0/17078248; flsh 0/17078248
DEBUG:  MoveOfflineLogs: remove 0000000000000000
FATAL 2:  MoveOfflineLogs: cannot read xlog dir: No such file or directory
DEBUG:  proc_exit(2)
DEBUG:  shmem_exit(2)
DEBUG:  exit(2)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 24736 exited with status
512
Server process (pid 24736) exited with status 512 at Sat May 31 09:57:57
2003
Terminating any active server processes...
Server processes were terminated at Sat May 31 09:57:57 2003
Reinitializing shared memory and semaphores
invoking IpcMemoryCreate(size=1236992)
DEBUG:  database system was interrupted at 2003-05-31 09:57:57 EDT
DEBUG:  CheckPoint record at (0, 17078248)
DEBUG:  Redo record at (0, 17078248); Undo record at (0, 0); Shutdown FALSE
DEBUG:  NextTransactionId: 3495; NextOid: 36603
DEBUG:  database system was not properly shut down; automatic recovery in
progress...
DEBUG:  ReadRecord: record with zero len at (0, 17078312)
DEBUG:  redo is not required
INSERT @ 0/17078312: prev 0/17078248; xprev 0/0; xid 0: XLOG - checkpoint:
redo 0/17078312; undo 0/0; sui 28; xid 3495; oid 36603; shutdown
XLogFlush: rqst 0/17078376; wrt 0/17078312; flsh 0/17078312
FATAL 2:  MoveOfflineLogs: cannot read xlog dir: No such file or directory
DEBUG:  proc_exit(2)
DEBUG:  shmem_exit(2)
DEBUG:  exit(2)

=========================

Here's the code from 7.1.

static void
MoveOfflineLogs(uint32 log, uint32 seg)
{
        DIR                   *xldir;
        struct dirent *xlde;
        char                lastoff[32];
        char                path[MAXPGPATH];

        Assert(XLOG_archive_dir[0] == 0);        /* ! implemented yet */

        xldir = opendir(XLogDir);
        if (xldir == NULL)
                elog(STOP, "MoveOfflineLogs: cannot open xlog dir: %m");

        sprintf(lastoff, "%08X%08X", log, seg);

        errno = 0;
        while ((xlde = readdir(xldir)) != NULL)
        {
                if (strlen(xlde->d_name) == 16 &&
                        strspn(xlde->d_name, "0123456789ABCDEF") == 16 &&
                        strcmp(xlde->d_name, lastoff) <= 0)
                {
                        elog(LOG, "MoveOfflineLogs: %s %s",
(XLOG_archive_dir[0]) ?
                                 "archive" : "remove", xlde->d_name);
                        sprintf(path, "%s%c%s", XLogDir, SEP_CHAR,
xlde->d_name);
                        if (XLOG_archive_dir[0] == 0)
                                unlink(path);
                }
                errno = 0;
        }
        if (errno)
                elog(STOP, "MoveOfflineLogs: cannot read xlog dir: %m");
        closedir(xldir);
}



В списке pgsql-admin по дате отправления:

Предыдущее
От: "utomo restu"
Дата:
Сообщение: installation error ?
Следующее
От: Bob Wheldon
Дата:
Сообщение: unsubscribe