Re: Add Information during standby recovery conflicts

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Add Information during standby recovery conflicts
Дата
Msg-id CA+fd4k6oBCoF+4sAgWqsC5+B5+EWQfT=FyPg9zVo4wh+zTXhUg@mail.gmail.com
обсуждение исходный текст
Ответ на Add Information during standby recovery conflicts  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Ответы Re: Add Information during standby recovery conflicts  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Список pgsql-hackers
On Thu, 18 Jun 2020 at 16:28, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi hackers,
>
> I've attached a patch to add information during standby recovery conflicts.
> The motive behind is to be able to get some information:
>
> On the apply side
> On the blocker(s) side
>
> Motivation:
>
> When a standby recovery conflict occurs it could be useful to get more information to be able to dive deep on the
rootcause and find a way to avoid/mitigate new occurrences. 

I think this is a good feature. Like log_lock_waits, it will help the
users to investigate recovery conflict issues.

>
> Adding this information would make the investigations easier, it could help answering questions like:
>
> On which LSN was the WAL apply blocked?
> What was the purpose of the bocked WAL record?
> On which relation (if any) was the blocked WAL record related to?
> What was the blocker(s) doing?
> When did the blocker(s) started their queries (if any)?
> What was the blocker(s) waiting for? on which wait event?
>
> Technical context and proposal:
>
> There is 2 points in this patch:
>
> Add the information about the blocked WAL record. This is done in standby.c (ResolveRecoveryConflictWithVirtualXIDs,
ResolveRecoveryConflictWithDatabase,StandbyTimeoutHandler) 

I think we already have the information about the WAL record being
applied in errcontext.

I wonder if we can show the recovery conflict information in the main
LOG message, the blocker information in errdetail, and use errcontext
with regard to WAL record information. For example:

LOG:  process 500 waiting for recovery conflict on snapshot
DETAIL:  conflicting transition id: 100, 200, 300
CONTEXT:  WAL redo at 0/3001970 for Heap2/CLEAN: remxid 506

> Outcome Example:
>
> 2020-06-15 06:48:23.774 UTC [6971] LOG:  wal record apply is blocked by 2 connection(s), reason: User query might
haveneeded to see row versions that must be removed. 
> 2020-06-15 06:48:23.774 UTC [6971] LOG:  blocked wal record rmgr: Heap2, lsn: 0/038E2678, received at: 2020-06-15
06:48:23.774098+00,desc: CLEAN, relation: rel 1663/13586/16652 fork main blk 0 
> 2020-06-15 06:48:54.773 UTC [7088] LOG:  about to interrupt pid: 7088, backend_type: client backend, state: active,
wait_event_type:Timeout, wait_event: PgSleep, query_start: 2020-06-15 06:48:14.87672+00 

I'm concerned that the above information on the process who is about
to be interrupted is very detailed but I'm not sure it will be helpful
for the users. If the blocker is waiting on something lock, the
information should be logged by log_lock_waits. Also the canceled
backend will emit the ERROR log with the message "canceling statement
due to conflict with recovery”, and its pid can be logged at the log
prefix. The user can know who has been canceled by recovery conflicts
and the query.

Probably we need to consider having a time threshold (or boolean to
turn on/off) to emit this information to the server logs. Otherwise,
we will end up writing every conflict information, making the log
dirty needlessly.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: [PATCH] postgres_fdw connection caching - cause remote sessionslinger till the local session exit
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: track_planning causing performance regression