Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value
Дата
Msg-id 20221020.115946.1240575541659272220.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value
Список pgsql-hackers
At Wed, 19 Oct 2022 13:06:08 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Wed, Oct 19, 2022 at 12:39 PM sirisha chamarthi
> <sirichamarthi22@gmail.com> wrote:
> >
> > The current code comment says that the replication stream on a slot with the given targetLSN can't continue after a
restartbut even without a restart the stream cannot continue. The slot is invalidated and the walsender process is
terminatedby the checkpoint process. Attaching a small patch to fix the comment.
 

The description was correct at the early stage of the development of
max_slot_wal_keep_size_mb. At that time the affected processes
(walsenders) are not actively killed.  On the other hand, the current
implement actively kills the affected walsenders before removing
segments.  Thus considering the whole current mechanism of this
feature, WALAVAIL_REMOVED represents that "the segment has been
removed, along with the affected processeses having been already
killed, too".

In short, the proposed fix alone seems fine to me. If we want to show
further details, I would add a bit as follows.

| * * WALAVAIL_REMOVED means it has been removed. A replication stream on
| *   a slot with this LSN cannot continue.  Note that the affected
| *   processes have been terminated by checkpointer, too.


> I think the walsender/replication stream can still continue even
> before the checkpointer signals it to terminate, there's an
> illuminating comment (see [1]) specifying when it can happen. It means

It is a description about the possible advancement of restart_lsn
after the function reads it. So the point is a bit off the said
proposal.
  
> that the GetWALAvailability() can return WALAVAIL_REMOVED but the
> checkpointer hasn't yet signalled/in the process of signalling the
> walsender to terminate.
> 
>  * * WALAVAIL_REMOVED means it has been removed. A replication stream on
>  *   a slot with this LSN cannot continue after a restart.
> 
> The above existing comment, says that the slot isn't usable if
> "someone" (either checkpoitner or walsender or entire server itself)
> got restarted. It looks fine, no?
>
> [1]
>             case WALAVAIL_REMOVED:
> 
>                 /*
>                  * If we read the restart_lsn long enough ago, maybe that file
>                  * has been removed by now.  However, the walsender could have
>                  * moved forward enough that it jumped to another file after
>                  * we looked.  If checkpointer signalled the process to
>                  * termination, then it's definitely lost; but if a process is
>                  * still alive, then "unreserved" seems more appropriate.
>                  *
>                  * If we do change it, save the state for safe_wal_size below.
>                  */
>                 if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
>                 {

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Documentation for building with meson
Следующее
От: Zhang Mingli
Дата:
Сообщение: Documentation refinement for Parallel Scans