Re: WAL format and API changes (9.5)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WAL format and API changes (9.5)
Дата
Msg-id 533D851F.3070608@vmware.com
обсуждение исходный текст
Ответ на Re: WAL format and API changes (9.5)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WAL format and API changes (9.5)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 04/03/2014 06:37 PM, Tom Lane wrote:
> Also, IIRC there are places that WAL-log full pages that aren't in a
> shared buffer at all (btree build does this I think).  How will that fit
> into this model?

Hmm. We could provide a function for registering a block with given 
content, without a Buffer. Something like:

XLogRegisterPage(int id, RelFileNode, BlockNumber, Page)

>> Let's simplify that, and have one new function, XLogOpenBuffer, which
>> returns a return code that indicates which of the four cases we're
>> dealing with. A typical redo function looks like this:
>
>>     if (XLogOpenBuffer(0, &buffer) == BLK_REPLAY)
>>     {
>>         /* Modify the page */
>>         ...
>
>>         PageSetLSN(page, lsn);
>>         MarkBufferDirty(buffer);
>>     }
>>     if (BufferIsValid(buffer))
>>         UnlockReleaseBuffer(buffer);
>
>> The '0' in the XLogOpenBuffer call is the ID of the block reference
>> specified in the XLogRegisterBuffer call, when the WAL record was created.
>
> +1, but one important step here is finding the data to be replayed.
> That is, a large part of the complexity of replay routines has to do
> with figuring out which parts of the WAL record were elided due to
> full-page-images, and locating the remaining parts.  What can we do
> to make that simpler?

We can certainly add more structure to the WAL records, but any extra 
information you add will make the records larger. It might be worth it, 
and would be lost in the noise for more complex records like page 
splits, but we should keep frequently-used records like heap insertions 
as lean as possible.

> Ideally, if XLogOpenBuffer (bad name BTW) returns BLK_REPLAY, it would
> also calculate and hand back the address/size of the logged data that
> had been pointed to by the associated XLogRecData chain item.  The
> trouble here is that there might've been multiple XLogRecData items
> pointing to the same buffer.  Perhaps the magic ID number you give to
> XLogOpenBuffer should be thought of as identifying an XLogRecData chain
> item, not so much a buffer?  It's fairly easy to see what to do when
> there's just one chain item per buffer, but I'm not sure what to do
> if there's more than one.

Hmm. You could register a separate XLogRecData chain for each buffer. 
Along the lines of:

rdata[0].data = data for buffer
rdata[0].len = ...
rdata[0].next = &rdata[1];
rdata[1].data = more data for same buffer
rdata[1].len = ...
rdata[2].next = NULL;

XLogRegisterBuffer(0, buffer, &data[0]);

At replay:

if (XLogOpenBuffer(0, &buffer, &xldata, &len) == BLK_REPLAY)
{/* xldata points to the data registered for this buffer */
}

Plus one more chain for the data not associated with a buffer.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: It seems no Windows buildfarm members are running find_typedefs
Следующее
От: Tom Lane
Дата:
Сообщение: Re: WAL format and API changes (9.5)