Hi,
Yesterday we (that's me and my colleague Ricardo Gomez) were working on
an issue where a monitoring script was returning increasing lag
information on a primary instead of a NULL value.
The query used involved the following functions (the function was
amended to work-around the issue I'm reporting here):
pg_last_wal_receive_lsn()
pg_last_wal_replay_lsn()
pg_last_xact_replay_timestamp()
Under normal circumstances we would expect to receive NULLs from all
three functions on a primary node, and code comments back up my thoughts.
The problem is, what if the node is a standby which was promoted without
restarting, or that had to perform crash recovery?
So during the time it's recovering the values in ` XLogCtl` are updated
with recovery information, and once the recovery finishes, due to crash
recovery reaching a consistent state, or a promotion of a standby
happening, those values are not reset to startup defaults.
That's when you start seeing non-null values returned by
`pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.
Now, I don't know if we should call this a bug, or an undocumented
anomaly. We could fix the bug by resetting the values from ` XLogCtl`
after finishing recovery, or document that we might see non-NULL values
in certain cases.
Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services