Обсуждение: Non-null values of recovery functions after promote or crash ofprimary

Поиск
Список
Период
Сортировка

Non-null values of recovery functions after promote or crash ofprimary

От
Martín Marqués
Дата:
Hi,

Yesterday we (that's me and my colleague Ricardo Gomez) were working on
an issue where a monitoring script was returning increasing lag
information on a primary instead of a NULL value.

The query used involved the following functions (the function was
amended to work-around the issue I'm reporting here):

pg_last_wal_receive_lsn()
pg_last_wal_replay_lsn()
pg_last_xact_replay_timestamp()

Under normal circumstances we would expect to receive NULLs from all
three functions on a primary node, and code comments back up my thoughts.

The problem is, what if the node is a standby which was promoted without
restarting, or that had to perform crash recovery?

So during the time it's recovering the values in ` XLogCtl` are updated
with recovery information, and once the recovery finishes, due to crash
recovery reaching a consistent state, or a promotion of a standby
happening, those values are not reset to startup defaults.

That's when you start seeing non-null values returned by
`pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.

Now, I don't know if we should call this a bug, or an undocumented
anomaly. We could fix the bug by resetting the values from ` XLogCtl`
after finishing recovery, or document that we might see non-NULL values
in certain cases.

Regards,

-- 
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Non-null values of recovery functions after promote or crash ofprimary

От
Stephen Frost
Дата:
Greetings,

* Martín Marqués (martin@2ndquadrant.com) wrote:
> pg_last_wal_receive_lsn()
> pg_last_wal_replay_lsn()
> pg_last_xact_replay_timestamp()
>
> Under normal circumstances we would expect to receive NULLs from all
> three functions on a primary node, and code comments back up my thoughts.

Agreed.

> The problem is, what if the node is a standby which was promoted without
> restarting, or that had to perform crash recovery?
>
> So during the time it's recovering the values in ` XLogCtl` are updated
> with recovery information, and once the recovery finishes, due to crash
> recovery reaching a consistent state, or a promotion of a standby
> happening, those values are not reset to startup defaults.
>
> That's when you start seeing non-null values returned by
> `pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.
>
> Now, I don't know if we should call this a bug, or an undocumented
> anomaly. We could fix the bug by resetting the values from ` XLogCtl`
> after finishing recovery, or document that we might see non-NULL values
> in certain cases.

IMV, and not unlike other similar cases I've talked about on another
thread, these should be cleared when the system is promoted as they're
otherwise confusing and nonsensical.

Thanks,

Stephen

Вложения

Re: Non-null values of recovery functions after promote or crash ofprimary

От
Stephen Frost
Дата:
Greetings,

* Martín Marqués (martin@2ndquadrant.com) wrote:
> pg_last_wal_receive_lsn()
> pg_last_wal_replay_lsn()
> pg_last_xact_replay_timestamp()
>
> Under normal circumstances we would expect to receive NULLs from all
> three functions on a primary node, and code comments back up my thoughts.

Agreed.

> The problem is, what if the node is a standby which was promoted without
> restarting, or that had to perform crash recovery?
>
> So during the time it's recovering the values in ` XLogCtl` are updated
> with recovery information, and once the recovery finishes, due to crash
> recovery reaching a consistent state, or a promotion of a standby
> happening, those values are not reset to startup defaults.
>
> That's when you start seeing non-null values returned by
> `pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.
>
> Now, I don't know if we should call this a bug, or an undocumented
> anomaly. We could fix the bug by resetting the values from ` XLogCtl`
> after finishing recovery, or document that we might see non-NULL values
> in certain cases.

IMV, and not unlike other similar cases I've talked about on another
thread, these should be cleared when the system is promoted as they're
otherwise confusing and nonsensical.

Thanks,

Stephen

Re: Non-null values of recovery functions after promote or crash of primary

От
Martín Marqués
Дата:
Hi,

> IMV, and not unlike other similar cases I've talked about on another
> thread, these should be cleared when the system is promoted as they're
> otherwise confusing and nonsensical.

Keep in mind that this also happens when the server crashes and has to
perform crash recovery. In that case the server was always a primary.

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Non-null values of recovery functions after promote or crash of primary

От
Martín Marqués
Дата:
Hi,

> IMV, and not unlike other similar cases I've talked about on another
> thread, these should be cleared when the system is promoted as they're
> otherwise confusing and nonsensical.

Keep in mind that this also happens when the server crashes and has to
perform crash recovery. In that case the server was always a primary.

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services