Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Дата
Msg-id CAB7nPqR+J1Xw+yzfsrehiQ+rh3ac+n5sEUgP7UOQ4_ymFnO9wg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On Thu, Sep 7, 2017 at 12:33 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <andres@anarazel.de> wrote in
<20170906192353.ufp2dq7wm5fd6qa7@alap3.anarazel.de>
>> I'm not following. All we need to use is the beginning of the relevant
>> records, that's easy enough to keep track of. We don't need to read the
>> WAL or anything.
>
> The beginning is already tracked and nothing more to do.

I have finally allocated time to look at your newly-proposed patch,
sorry for the time it took. Patch 0002 forgot to include sys/stat.h to
allow the debugging tool to compile :)

> The first *problem* was WaitForWALToBecomeAvailable requests the
> beginning of a record, which is not on the page the function has
> been told to fetch. Still tliRecPtr is required to determine the
> TLI to request, it should request RecPtr to be streamed.

[...]

> The rest to do is let XLogPageRead retry other sources
> immediately. To do this I made ValidXLogPageHeader@xlogreader.c
> public (and renamed to XLogReaderValidatePageHeader).
>
> The patch attached fixes the problem and passes recovery
> tests. However, the test for this problem is not added. It needs
> to go to the last page in a segment then put a record continues
> to the next segment, then kill the standby after receiving the
> previous segment but before receiving the whole record.

+typedef struct XLogPageHeaderData *XLogPageHeader;
[...]
+/* Validate a page */
+extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
+                                       XLogRecPtr recptr, XLogPageHeader hdr);
Instead of exposing XLogPageHeaderData, I would recommend just using
char* and remove this declaration. The comment on top of
XLogReaderValidatePageHeader needs to make clear what caller should
provide.

+    if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
+                                      (XLogPageHeader) readBuf))
+        goto next_record_is_invalid;
+
[...]
-                            ptr = tliRecPtr;
+                            ptr = RecPtr;                            tli = tliOfPointInHistory(tliRecPtr,
expectedTLEs);
                            if (curFileTLI > 0 && tli < curFileTLI)
The core of the patch is here (the patch has no comment so it is hard
to understand what's the point of what is being done), and if I
understand that correctly, you allow the receiver to fetch the
portions of a record spawned across multiple segments from different
sources, and I am not sure that we'd want to break that promise.
Shouldn't we instead have the receiver side track the beginning of the
record and send that position for the physical slot's restart_lsn?
This way the receiver would retain WAL segments from the real
beginning of a record. restart_lsn for replication slots is set when
processing the standby message in ProcessStandbyReplyMessage() using
now the flush LSN, so a more correct value should be provided using
that. Andres, what's your take on the matter?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Khandekar
Дата:
Сообщение: Re: [HACKERS] UPDATE of partition key
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: [HACKERS] [POC] hash partitioning