Re: WAL format and API changes (9.5)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: WAL format and API changes (9.5)
Дата
Msg-id 27502.1396539428@sss.pgh.pa.us
обсуждение исходный текст
Ответ на WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> The big change in creating WAL records is that the buffers involved in 
> the WAL-logged operation are explicitly registered, by calling a new 
> XLogRegisterBuffer function.

Seems reasonable, especially if we can make the buffer numbering business
less error-prone.

> void XLogRegisterBuffer(int blockref_id, Buffer buffer, bool buffer_std)

> blockref_id: An arbitrary ID given to this block reference. It is used 
> in the redo routine to open/restore the same block.
> buffer: the buffer involved
> buffer_std: is the page in "standard" page layout?

> That's for the normal cases. We'll need a couple of variants for also 
> registering buffers that don't need full-page images, and perhaps also a 
> function for registering a page that *always* needs a full-page image, 
> regardless of the LSN. A few existing WAL record types just WAL-log the 
> whole page, so those ad-hoc full-page images could be replaced with this.

Why not just one function with an additional flags argument?

Also, IIRC there are places that WAL-log full pages that aren't in a
shared buffer at all (btree build does this I think).  How will that fit
into this model?

> (While we're at it, perhaps we should let XLogInsert set the LSN of all 
> the registered buffers, to reduce the amount of boilerplate code).

Yeah, maybe so.  I'm not sure why that was separate to begin with.

> Let's simplify that, and have one new function, XLogOpenBuffer, which 
> returns a return code that indicates which of the four cases we're 
> dealing with. A typical redo function looks like this:

>     if (XLogOpenBuffer(0, &buffer) == BLK_REPLAY)
>     {
>         /* Modify the page */
>         ...

>         PageSetLSN(page, lsn);
>         MarkBufferDirty(buffer);
>     }
>     if (BufferIsValid(buffer))
>         UnlockReleaseBuffer(buffer);

> The '0' in the XLogOpenBuffer call is the ID of the block reference 
> specified in the XLogRegisterBuffer call, when the WAL record was created.

+1, but one important step here is finding the data to be replayed.
That is, a large part of the complexity of replay routines has to do
with figuring out which parts of the WAL record were elided due to
full-page-images, and locating the remaining parts.  What can we do
to make that simpler?

Ideally, if XLogOpenBuffer (bad name BTW) returns BLK_REPLAY, it would
also calculate and hand back the address/size of the logged data that
had been pointed to by the associated XLogRecData chain item.  The
trouble here is that there might've been multiple XLogRecData items
pointing to the same buffer.  Perhaps the magic ID number you give to
XLogOpenBuffer should be thought of as identifying an XLogRecData chain
item, not so much a buffer?  It's fairly easy to see what to do when
there's just one chain item per buffer, but I'm not sure what to do
if there's more than one.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: jsonb is also breaking the rule against nameless unions
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: It seems no Windows buildfarm members are running find_typedefs