Обсуждение: Logging replication state changes
Hi hackers,
I noticed that below critical replication state changes are DEBUG1 level logged. Any concern about changing the below two messages log level to LOG? If this is too verbose, we can introduce a new GUC, log_replication_state_changes that logs the replication state changes when enabled irrespective of the log level.
1/
/*
* If we're in catchup state, move to streaming. This is an
* important state change for users to know about, since before
* this point data loss might occur if the primary dies and we
* need to failover to the standby. The state change is also
* important for synchronous replication, since commits that
* started to wait at that point might wait for some time.
*/
if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
{
ereport(DEBUG1,
(errmsg_internal("\"%s\" has now caught up with upstream server",
application_name)));
WalSndSetState(WALSNDSTATE_STREAMING);
}
* If we're in catchup state, move to streaming. This is an
* important state change for users to know about, since before
* this point data loss might occur if the primary dies and we
* need to failover to the standby. The state change is also
* important for synchronous replication, since commits that
* started to wait at that point might wait for some time.
*/
if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
{
ereport(DEBUG1,
(errmsg_internal("\"%s\" has now caught up with upstream server",
application_name)));
WalSndSetState(WALSNDSTATE_STREAMING);
}
2/
ereport(DEBUG1,
(errmsg_internal("standby \"%s\" now has synchronous standby priority %u",
application_name, priority)));
(errmsg_internal("standby \"%s\" now has synchronous standby priority %u",
application_name, priority)));
Thanks,
Satya
SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> writes: > I noticed that below critical replication state changes are DEBUG1 level > logged. Any concern about changing the below two messages log level to LOG? Why? These seem like perfectly routine messages. regards, tom lane
On Wed, Dec 29, 2021 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> writes:
> I noticed that below critical replication state changes are DEBUG1 level
> logged. Any concern about changing the below two messages log level to LOG?
Why? These seem like perfectly routine messages.
Consider a scenario where we have a primary and two sync standby (s1 and s2) where s1 is a preferred failover target and s2 is next with synchronous_standby_names = 'First 1 ('s1','s2')'. In an event, s1 streaming replication is broken and reestablished because of a planned or an unplanned event then s2 participates in the sync commits and makes sure the writes are not stalled on the primary. I would like to know the time window where s1 is not actively acknowledging the commits and the writes are dependent on s2. Also if the service layer decides to failover to s2 instead of s1 because s1 is lagging I need evidence in the log to explain the behavior.
regards, tom lane
On Thu, Dec 30, 2021 at 4:18 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: > > On Wed, Dec 29, 2021 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> writes: >> > I noticed that below critical replication state changes are DEBUG1 level >> > logged. Any concern about changing the below two messages log level to LOG? >> >> Why? These seem like perfectly routine messages. > > > Consider a scenario where we have a primary and two sync standby (s1 and s2) where s1 is a preferred failover target ands2 is next with synchronous_standby_names = 'First 1 ('s1','s2')'. In an event, s1 streaming replication is broken andreestablished because of a planned or an unplanned event then s2 participates in the sync commits and makes sure the writesare not stalled on the primary. I would like to know the time window where s1 is not actively acknowledging the commitsand the writes are dependent on s2. Also if the service layer decides to failover to s2 instead of s1 because s1 islagging I need evidence in the log to explain the behavior. > Isn't it better to get this information via pg_stat_replication view (via state and sync_priority) columns? -- With Regards, Amit Kapila.
On Sat, Jan 8, 2022 at 3:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 30, 2021 at 4:18 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> On Wed, Dec 29, 2021 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> writes:
>> > I noticed that below critical replication state changes are DEBUG1 level
>> > logged. Any concern about changing the below two messages log level to LOG?
>>
>> Why? These seem like perfectly routine messages.
>
>
> Consider a scenario where we have a primary and two sync standby (s1 and s2) where s1 is a preferred failover target and s2 is next with synchronous_standby_names = 'First 1 ('s1','s2')'. In an event, s1 streaming replication is broken and reestablished because of a planned or an unplanned event then s2 participates in the sync commits and makes sure the writes are not stalled on the primary. I would like to know the time window where s1 is not actively acknowledging the commits and the writes are dependent on s2. Also if the service layer decides to failover to s2 instead of s1 because s1 is lagging I need evidence in the log to explain the behavior.
>
Isn't it better to get this information via pg_stat_replication view
(via state and sync_priority) columns?
We need the historical information to analyze and root cause in addition to the live debugging. It would be good to have better control over replication messages.
--
With Regards,
Amit Kapila.
On Sat, Jan 8, 2022, at 6:29 PM, SATYANARAYANA NARLAPURAM wrote:
We need the historical information to analyze and root cause in addition to the live debugging. It would be good to have better control over replication messages.
I think the answer to this demand is not to change the message level but to
implement a per-module log_min_messages. This idea is in the TODO [1] for more
than a decade. Check the archives.
I agree with Tom that the referred messages are noisy, hence, DEBUG1 is fine
for it.