Обсуждение: Non-null values of recovery functions after promote or crash ofprimary
Hi, Yesterday we (that's me and my colleague Ricardo Gomez) were working on an issue where a monitoring script was returning increasing lag information on a primary instead of a NULL value. The query used involved the following functions (the function was amended to work-around the issue I'm reporting here): pg_last_wal_receive_lsn() pg_last_wal_replay_lsn() pg_last_xact_replay_timestamp() Under normal circumstances we would expect to receive NULLs from all three functions on a primary node, and code comments back up my thoughts. The problem is, what if the node is a standby which was promoted without restarting, or that had to perform crash recovery? So during the time it's recovering the values in ` XLogCtl` are updated with recovery information, and once the recovery finishes, due to crash recovery reaching a consistent state, or a promotion of a standby happening, those values are not reset to startup defaults. That's when you start seeing non-null values returned by `pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`. Now, I don't know if we should call this a bug, or an undocumented anomaly. We could fix the bug by resetting the values from ` XLogCtl` after finishing recovery, or document that we might see non-NULL values in certain cases. Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Greetings, * Martín Marqués (martin@2ndquadrant.com) wrote: > pg_last_wal_receive_lsn() > pg_last_wal_replay_lsn() > pg_last_xact_replay_timestamp() > > Under normal circumstances we would expect to receive NULLs from all > three functions on a primary node, and code comments back up my thoughts. Agreed. > The problem is, what if the node is a standby which was promoted without > restarting, or that had to perform crash recovery? > > So during the time it's recovering the values in ` XLogCtl` are updated > with recovery information, and once the recovery finishes, due to crash > recovery reaching a consistent state, or a promotion of a standby > happening, those values are not reset to startup defaults. > > That's when you start seeing non-null values returned by > `pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`. > > Now, I don't know if we should call this a bug, or an undocumented > anomaly. We could fix the bug by resetting the values from ` XLogCtl` > after finishing recovery, or document that we might see non-NULL values > in certain cases. IMV, and not unlike other similar cases I've talked about on another thread, these should be cleared when the system is promoted as they're otherwise confusing and nonsensical. Thanks, Stephen
Вложения
Greetings, * Martín Marqués (martin@2ndquadrant.com) wrote: > pg_last_wal_receive_lsn() > pg_last_wal_replay_lsn() > pg_last_xact_replay_timestamp() > > Under normal circumstances we would expect to receive NULLs from all > three functions on a primary node, and code comments back up my thoughts. Agreed. > The problem is, what if the node is a standby which was promoted without > restarting, or that had to perform crash recovery? > > So during the time it's recovering the values in ` XLogCtl` are updated > with recovery information, and once the recovery finishes, due to crash > recovery reaching a consistent state, or a promotion of a standby > happening, those values are not reset to startup defaults. > > That's when you start seeing non-null values returned by > `pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`. > > Now, I don't know if we should call this a bug, or an undocumented > anomaly. We could fix the bug by resetting the values from ` XLogCtl` > after finishing recovery, or document that we might see non-NULL values > in certain cases. IMV, and not unlike other similar cases I've talked about on another thread, these should be cleared when the system is promoted as they're otherwise confusing and nonsensical. Thanks, Stephen
Hi, > IMV, and not unlike other similar cases I've talked about on another > thread, these should be cleared when the system is promoted as they're > otherwise confusing and nonsensical. Keep in mind that this also happens when the server crashes and has to perform crash recovery. In that case the server was always a primary. -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, > IMV, and not unlike other similar cases I've talked about on another > thread, these should be cleared when the system is promoted as they're > otherwise confusing and nonsensical. Keep in mind that this also happens when the server crashes and has to perform crash recovery. In that case the server was always a primary. -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services