RE: archive status ".ready" files may be created too early

Поиск
Список
Период
Сортировка
От matsumura.ryo@fujitsu.com
Тема RE: archive status ".ready" files may be created too early
Дата
Msg-id OSAPR01MB5027F3C28DBC8B930E15C6A6E8980@OSAPR01MB5027.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: archive status ".ready" files may be created too early  ("Bossart, Nathan" <bossartn@amazon.com>)
Ответы Re: archive status ".ready" files may be created too early  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
> On 5/28/20, 11:42 PM, "matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com>
> wrote:
> > I'm preparing a patch that backend inserting segment-crossboundary
> > WAL record leaves its EndRecPtr and someone flushing it checks
> > the EndRecPtr and notifies..


I'm sorry for my slow work.

I attach a patch.
I also attach a simple target test for primary.


1. Description in primary side

[Basic problem]
  A process flushing WAL record doesn't know whether the flushed RecPtr is 
  EndRecPtr of cross-segment-boundary WAL record or not because only process 
  inserting the WAL record knows it and it never memorizes the information to anywhere.

[Basic concept of the patch in primary]
  A process inserting a cross-segment-boundary WAL record memorizes its EndRecPtr
  (I call it CrossBoundaryEndRecPtr) to a new structure in XLogCtl.
  A flushing process creates .ready (Later, I call it just 'notify'.) against 
  a segment that is previous one including CrossBoundaryEndRecPtr only when its 
  flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

[Detail of implementation in primary]
* Structure of CrossBoundaryEndRecPtrs
  Requirement of structure is as the following:
  - System must memorize multiple CrossBoundaryEndRecPtr.
  - A flushing process must determine to notify or not with only flushed RecPtr briefly.

  Therefore, I implemented the structure as an array (I call it CrossBoundaryEndRecPtr array)
  that is same as xlblck array.  Strictly, it is enogh that the length is
  'xbuffers/wal_segment_size', but I choose 'xbuffers' for simplicity that makes
  enable the flushing process to use XLogRecPtrToBufIdx().
  See also the definition of XLogCtl, XLOGShmemSize(), and XLOGShmemInit() in my patch.

* Action of inserting process
  A inserting process memorie its CrossBoundaryEndRecPtr to CrossBoundaryEndRecPtr
  array element calculated by XLogRecPtrToBufIdx with its CrossBoundaryEndRecPtr.
  If the WAL record crosses many segments, only element against last segment
  including the EndRecPtr is set and others are not set.
  See also CopyXLogRecordToWAL() in my patch.

* Action of flushing process
  Overview has been already written as the follwing.
    A flushing process creates .ready (Later, I call it just 'notify'.) against 
    a segment that is previous one including CrossBoundaryEndRecPtr only when its 
    flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

  An additional detail is as the following.  The flushing process may notify
  many segments if the record crosses many segments, so the process memorizes
  latest notified segment number to latestArchiveNotifiedSegNo in XLogCtl.
  The process notifies from latestArchiveNotifiedSegNo + 1 to
  flushing segment number - 1.

  And latestArchiveNotifiedSegNo is set to EndOfLog after Startup process exits
  replay-loop.  Standby also set same timing (= before promoting).

  Mutual exlusion about latestArchiveNotifiedSegNo is not required because
  the timing of accessing has been already included in WALWriteLock critical section.


2. Description in standby side

* Who notifies?
  walreceiver also doesn't know whether the flushed RecPtr is EndRecPtr of
  cross-segment-boundary WAL record or not.  In standby, only Startup process
  knows the information because it is hidden in WAL record itself and only
  Startup process reads and builds WAL record.

* Action of Statup process
  Therefore, I implemented that walreceiver never notify and Startup process does it.
  In detail, when Startup process reads one full-length WAL record, it notifies
  from a segment that includes head(ReadRecPtr) of the record to a previous segment that 
  includes EndRecPtr of the record.

  Now, we must pay attention about switching time line.
  The last segment of previous TimeLineID must be notified before switching.
  This case is considered when RM_XLOG_ID is detected.


3. About other notifying for segment
Two notifyings for segment are remain.  They are not needed to fix.

(1) Notifying for partial segment
It is not needed to be completed, so it's OK to notify without special consideration.

(2) Re-notifying
Currently, Checkpointer has notified through XLogArchiveCheckDone().
It is a safe-net for failure of notifying by backend or WAL writer.
Backend or WAL writer doesn't retry to notify if falis, but Checkpointer retries
to notify when it removes old segment. If it fails to notify, then it does not
remove the segment.  It makes Checkpointer to retry notify until the notifying suceeeds.
Also, in this case, we can just notify whithout special consideration
because Checkpointer guarantees that all WAL record included in the segment have been already flushed.


Please, your review and comments.


Regards
Ryo Matsumura

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: doing something about the broken dynloader.h symlink
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: update substring pattern matching syntax