Hello,
> I think it can so happen that last checkpoint is with old timeline and there
> are operations with new timeline which might have caused the problem Heikki
> has seen.
I suppose to have seen that.
After adding an SQL command to modify the DB on standby-B after
passive(propagated?) promotion of standby-C,
< pg_ctl -D data-standbyB promote
---
>
> pg_ctl -D data-standbyB -w promote
> sleep 20
> psql -p 5433 postgres -c 'create table t (a int); checkpoint;'
I saw following complaint from standby-C.
> B 2013-05-09 17:00:06.019 JST 30268 LOG: checkpoint starting: immediate force wait
> C 2013-05-09 17:00:06.211 JST 30277 LOG: invalid magic number 0000 in log segment 000000010000000000000005, offset
4169728
> C 2013-05-09 17:00:06.211 JST 30295 FATAL: terminating walreceiver process due to administrator command
> B 2013-05-09 17:00:06.219 JST 30268 LOG: checkpoint complete: wrote 18 buffers (0.1%); 0 transaction log file(s)
added,0 removed, 0 recycled; write=0.000 s, sync=0.157 s, total=0.199 s; sync files=13, longest=0.041 s, average=0.012
s
> CHECKPOINT
> C 2013-05-09 16:41:33.974 JST 29624 LOG: record with zero length at 0/53F8B90
> C 2013-05-09 16:41:33.974 JST 29624 LOG: record with zero length at 0/53F8B90
> C 2013-05-09 16:41:38.980 JST 29624 LOG: record with zero length at 0/53F8B90
.. repeats forever ..
This seems to be caused from a kind of timeline inconsistency at
first glance.. Explicit checkpoint before the modification on
Sby-B does not help.
> psql -p 5433 postgres -c 'checkpoint;create table t (a int); checkpoint;'
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center