Re: Multiple full page writes in a single checkpoint?

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Multiple full page writes in a single checkpoint?
Дата
Msg-id 20210203232913.t3fng3evt4qucm3g@alap3.anarazel.de
обсуждение исходный текст
Ответ на Multiple full page writes in a single checkpoint?  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: Multiple full page writes in a single checkpoint?
Список pgsql-hackers
Hi,

On 2021-02-03 18:05:56 -0500, Bruce Momjian wrote:
> log_hint_bits already gives us a unique nonce for the first hint bit
> change on a page during a checkpoint, but we only encrypt on page write
> to the file system, so I am researching if log_hint_bits will already
> generate a unique LSN for every page write to the file system, even if
> there are multiple hint-bit-caused page writes to the file system during
> a single checkpoint.  (We already know this works for multiple
> checkpoints.)

No, it won't:

> However, imagine these steps:
> 
> 1.  checkpoint starts
> 2.  page is modified by row or hint bit change
> 3.  page gets a new LSN and is marked as dirty
> 4.  page image is flushed to WAL
> 5.  pages is written to disk and marked as clean
> 6.  page is modified by data or hint bit change
> 7.  pages gets a new LSN and is marked as dirty
> 8.  page image is flushed to WAL
> 9.  checkpoint completes
> 10. pages is written to disk and marked as clean
> 
> Is the above case valid, and would it cause two full page writes to WAL?
> More specifically, wouldn't it cause every write of the page to the file
> system to use a new LSN?

No. 8) won't happen.  Look e.g. at XLogSaveBufferForHint():

    /*
     * Update RedoRecPtr so that we can make the right decision
     */
    RedoRecPtr = GetRedoRecPtr();

    /*
     * We assume page LSN is first data on *every* page that can be passed to
     * XLogInsert, whether it has the standard page layout or not. Since we're
     * only holding a share-lock on the page, we must take the buffer header
     * lock when we look at the LSN.
     */
    lsn = BufferGetLSNAtomic(buffer);

    if (lsn <= RedoRecPtr)
        /* wal log hint bit */

The RedoRecPtr is determined at 1. and doesn't change between 4) and
8). The LSN for 4) has to be *past* the RedoRecPtr from 1). Therefore we
don't do another FPW.


Changing this is *completely* infeasible. In a lot of workloads it'd
cause a *massive* explosion of WAL volume. Like quadratically. You'll
need to find another way to generate a nonce.

In the non-hint bit case you'll automatically have a higher LSN in 7/8
though. So you won't need to do anything about getting a higher nonce.

For the hint bit case in 8 you could consider just using any LSN generated
after 4 (preferrably already flushed to disk) - but that seems somewhat
ugly from a debuggability POV :/. Alternatively you could just create
tiny WAL record to get a new LSN, but that'll sometimes trigger new WAL
flushes when the pages are dirtied.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Multiple full page writes in a single checkpoint?
Следующее
От: Andres Freund
Дата:
Сообщение: Re: logical replication worker accesses catalogs in error context callback