Re: Race condition in recovery?

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: Race condition in recovery?
Дата
Msg-id CAFiTN-sc+81KjM+ecpnd4jvPv0WQNdNpVZ+uyk2PEYJZpSLthQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Race condition in recovery?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
On Fri, May 21, 2021 at 7:51 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> https://www.postgresql.org/message-id/50E43C57.5050101%40vmware.com
>
> > That leaves one case not covered: If you take a backup with plain
> > "pg_basebackup" from a standby, without -X, and the first WAL segment
> > contains a timeline switch (ie. you take the backup right after a
> > failover), and you try to recover from it without a WAL archive, it
> > doesn't work. This is the original issue that started this thread,
> > except that I used "-x" in my original test case. The problem here is
> > that even though streaming replication will fetch the timeline history
> > file when it connects, at the very beginning of recovery, we expect that
> > we already have the timeline history file corresponding the initial
> > timeline available, either in pg_xlog or the WAL archive. By the time
> > streaming replication has connected and fetched the history file, we've
> > already initialized expectedTLEs to contain just the latest TLI. To fix
> > that, we should delay calling readTimeLineHistoryFile() until streaming
> > replication has connected and fetched the file.
> > If the first segment read by recovery contains a timeline switch, the first
> > pages have older timeline than segment timeline. However, if
> > exepectedTLEs contained only the segment timeline, we cannot know if
> > we can use the record.  In that case the following error is issued.
>
> If expectedTLEs is initialized with the pseudo list,
> tliOfPointInHistory always return the recoveryTargetTLI regardless of
> the given LSN so the checking about timelines later doesn't work. And
> later ReadRecord is surprised to see a page of an unknown timeline.

From this whole discussion (on the thread given by you), IIUC the
issue was that if the checkpoint LSN does not exist on the
"ControlFile->checkPointCopy.ThisTimeLineID". If that is true then I
agree that we will just initialize expectedTLE based on the online
entry (ControlFile->checkPointCopy.ThisTimeLineID) and later we will
fail to find the checkpoint record on this timeline because the
checkpoint LSN is smaller than the start LSN of this timeline. Right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Move pg_attribute.attcompression to earlier in struct for reduced size?
Следующее
От: Dmitry Dolgov
Дата:
Сообщение: Re: Index Skip Scan (new UniqueKeys)