Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Дата
Msg-id CABOikdNm=Nz49_nE90eYamGEX-ToFVsjTMt3Z6NuKS=qiEGwFA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Changing WAL Header to reduce contention duringReserveXLogInsertLocation()  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Список pgsql-hackers


On Thu, Feb 1, 2018 at 11:05 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, Jan 13, 2018 at 03:40:01PM +0000, Simon Riggs wrote:
> The new two byte value is protected by CRC. The 2 byte value repeats
> every 32768 WAL files. Any bit error in that value that made it appear
> to be a current value would need to have a rare set of circumstances.

If you use the two lower bytes of the segment number, then this gets
repeated every 64k segments, no?  In terms of space this represents
500GB worth of WAL segments with a default segment size.  Hence the more
PostgreSQL scales, the more there is a risk of collision, and I am
pretty sure that there are already deployments around allocating
hundreds of gigs worth of space for the WAL partition.  There are no
problems of this class if using the 8-byte field xl_prev.  It seems to
me that we don't want to take any risks here.

IMHO we're missing a point here. When xl_prev is changed to a 2-byte value (as the patch suggests), the risk only comes when an old WAL file is recycled for some future WAL file and the old and the future WAL file's segment numbers share the same low order 16-bits. Now as the patch stands, we take precaution to never reuse a WAL file with conflicting low order bits.

This particular piece of the patch deals with that:

@@ -3496,6 +3495,16 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                return false;
            }
            (*segno)++;
+
+           /*
+            * If avoid_conflicting_walid is true, then we must not recycle the
+            * WAL files so that the old and the recycled WAL segnos have the
+            * same lower order 16-bits.
+            */
+           if (avoid_conflicting_walid &&
+               XLogSegNoToSegID(tmpsegno) == XLogSegNoToSegID(*segno))
+               (*segno)++;
+
            XLogFilePath(path, ThisTimeLineID, *segno, wal_segment_size);
        }
    }

For example, old WAL file 000000010000000000000001 will NEVER be reused as  000000010000000000010001. So even if there are any intact old WAL records in the recycled file, they will be detected correctly during replay since the SegID stored in the WAL record will not match the SegID as derived from the WAL file segment number. The replay code does this for every WAL record it sees.

There were some concerns about bit-flipping, which may inadvertently SegID stored in the carried over WAL record so that it now matches with the current WAL files' SegID, but TBH I don't see why we can't trust CRC to detect that. Because if we can't, then there would be other problems during WAL replay.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yugo Nagata
Дата:
Сообщение: Re: CURRENT OF causes an error when IndexOnlyScan is used
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] REINDEX CONCURRENTLY 2.0