Re: Why are some WAL files in pg_xlog symlinks to old files?

Поиск
Список
Период
Сортировка
От Nigel
Тема Re: Why are some WAL files in pg_xlog symlinks to old files?
Дата
Msg-id AANLkTi=O8jumOW2Xxt6aWRwFc7cYPpY_bvWUM7CGe8vW@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Why are some WAL files in pg_xlog symlinks to old files?  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: Why are some WAL files in pg_xlog symlinks to old files?  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-admin
Thank you for your response!
 
So I guess what's happening is that the old symlink from 3 weeks ago (generated by pg_standby -l) is now stuck in the primary's pg_xlog, and gets repeatedly recycled and renamed to be a new WAL file.  I checked the mod date of the target of the symlink, and confirmed that it's being updated as that file is rewritten with recycled WAL data.
 
To get out of this situation, I guess I should replace the symlink in pg_xlog with the file that's the target of the symlink, renamed with the name of the symlink?  (In other words, "follow" the symlink by hand so the file in pg_xlog is an ordinary file again.)  That would break us out of having that symlink recycled over and over.  And then we'll change the new standby server to not use the -l option with pg_standby anymore.  (-:
 
Thanks,
Chris
On Tue, Sep 28, 2010 at 10:50 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Sep 29, 2010 at 10:15 AM, Nigel <nigelspleen@gmail.com> wrote:
> Hello,
>
> We're running PG 8.3 in a warm standby configuration.  About 3 weeks ago we
> had to fail over from the primary to the standby.  That worked fine, but
> we're having problems getting standby mode set up again.  On the new
> standby, everything works fine for a little while: WALs were rsynced over
> and processed correctly as far as I can tell.  But every 65-75 minutes (very
> regularly), a WAL file is copied that's actually a symlink.  When the
> standby tries to read the rsynced symlink, it hangs indefinitely, presumably
> because the target of the link doesn't exist on the standby.
>
> In the primary's pg_xlog, I see the expected WAL files with increasing
> numbers and recent modification dates, but every 65-75 files there's one of
> these symlinks. For example:
>
> Sep 28 16:13 0000000300000A5C00000070
> Sep 28 16:15 0000000300000A5C00000071
> Sep 28 16:12 0000000300000A5C00000072
> Sep  5 01:00 0000000300000A5C00000073 ->
> /srv/db/chdbprod_wal_archives/00000001000009D6000000D6
> Sep 28 16:21 0000000300000A5C00000074
> Sep 28 16:19 0000000300000A5C00000075
>
> The "/srv/db/chdbprod_wal_archives" directory is where incoming WAL files
> used to go, back when the current primary server was the standby.  The
> September 5 date you see above is shortly before the failover was done.  It
> confused me at first until I remembered that it's the mod date of the target
> of the symlink, not the link itself (which in this case was presumably
> created around 16:20).  The target of the symlinks is always the same.
>
> pg_xlog also contains a 00000003.history file, which references the target
> of the symlinks.  Here's its contents:
>
> 1       00000001000009D6000000D6        before transaction 0 at 2000-01-01
> 00:00:00+00
>
> I gather that my problems here are due to having a primary server that was
> itself formerly a standby, but I'm not sure what action to take.  I don't
> know enough about how the history files work and what the significance of
> the symlinks is.  What purpose to the symlinks serve?  Why are they
> recreated regularly at slighly more than hourly intervals?  Why do they
> point to a directory that was only used back when the primary was a
> standby?  (If it makes any difference, back when the primary server was a
> standby, it was running pg_standby with the -l option.)  Does their presence
> mean that something's wrong on the primary, or should they be ignored when
> copying to the standby?

I guess that the cause is -l option. The symlink to the archived WAL file is
created in pg_xlog by "pg_standby -l". At the failover, unfortunately that
symlink in pg_xlog is renamed to the new for WAL recycling. Then, the symlink
to old archived WAL file remains in pg_xlog.

AFAIR, because of this problem, -l option was removed from pg_standby.
http://archives.postgresql.org/pgsql-committers/2009-06/msg00323.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

В списке pgsql-admin по дате отправления:

Предыдущее
От: Anuj Pankaj
Дата:
Сообщение: Re: could not connect to server: Connection refused (0x0000274D/10061)
Следующее
От: Vladimir Rusinov
Дата:
Сообщение: Re: could not connect to server: Connection refused (0x0000274D/10061)