Re: WAL format and API changes (9.5)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WAL format and API changes (9.5)
Дата
Msg-id 54527AEF.20506@vmware.com
обсуждение исходный текст
Ответ на Re: WAL format and API changes (9.5)  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: WAL format and API changes (9.5)  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On 10/30/2014 06:02 PM, Andres Freund wrote:
> On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote:
>> On 10/06/2014 02:29 PM, Andres Freund wrote:
>>> I've not yet really looked,
>>> but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't
>>> make me happy...
>>
>> Ok.. Can you elaborate?
>
> To me the split between xloginsert.c doing some of the record assembly,
> and xlog.c doing the lower level part of the assembly is just wrong.

Really? To me, that feels like a natural split. xloginsert.c is 
responsible for constructing the final WAL record, with the backup 
blocks and all. It doesn't access any shared memory (related to WAL; it 
does look at Buffers, to determine what to back up). The function in 
xlog.c inserts the assembled record to the WAL, as is. It handles the 
locking and WAL buffer management related to that.

What would you suggest? I don't think putting everything in one 
XLogInsert function, like we have today, is better. Note that the second 
patch makes xloginsert.c a lot more complicated.

> And it leads to things like the already complicated logic to retry after
> detecting missing fpws is now split across two files seems to confirm
> that. What happens right now is that XLogInsert() (with some helper
> routines) assembles the record. Then hands that off to
> XLogInsertRecData(). Which sometimes returns InvalidXLogRecPtr, and
> returns back to XLogInsert(), which reverses *some* of its work and then
> retries. Brr.

Modifying the chain that was already constructed is not pretty, but it's 
not this patch's fault. I don't think splitting the functions makes that 
any worse. (BTW, the follow-up patch to change the WAL format fixes 
that; the per-buffer data is kept separately from the main data chain).

> Similary the 'page_writes_omitted' logic doesn't make me particularly
> happy. Previously we retried when there actually was a page affected by
> the different RedoRecPtr. Now we do it as soon as our copy of RedoRecPtr
> is out of date? Won't that quite often spuriously trigger retries? Am I
> missing something?
> Arguably this doesn't happen often enough to matter, but it's still
> something that we should explicitly discuss.

The situation where it leads to a spurious retry is when the constructed 
WAL record skipped the FPW of a buffer, and RedoRecPtr was updated, but 
the buffer stilll doesn't need to be FPW according to the updated 
RedoRecPtr. That only happens if the same buffer was updated (by another 
backend) after the new checkpoint began. I believe that's extremely rare.

We could make that more fine-grained, though. Instead of passing a 
boolean 'fpw_sensitive' flag to XLogInsertRecData, pass the lowest LSN 
among the pages whose FPW was skipped. If that's less than the new 
RedoRecPtr, then retry is needed. That would eliminate the spurious retries.

> The implementation of the split seems to change the meaning of
> TRACE_POSTGRESQL_XLOG_INSERT - it's now counting retries. I don't
> particularly care about those tracepoints, but I see no simplification
> due to changing its meaning.

I wasn't sure what to do about that. I don't use tracepoints, and I 
don't know what others use them for.

> I'm not a big fan of the naming for the new split. We have
> * XLogInsert() which is the external interface
> * XLogRecordAssemble() which builds the rdata chain that will be
>    inserted
> * XLogInsertRecData() which actually inserts the data into the xl buffers.
>
> To me that's pretty inconsistent.

Got a better suggestion? I wanted to keep the name XLogInsert() 
unchanged, to avoid code churn, and because it's still a good name for 
that. XLogRecordAssemble is pretty descriptive for what it does, 
although "XLogRecord" implies that it construct an XLogRecord struct. It 
does fill in that too, but the main purpose is to build the XLogRecData 
chain. Perhaps XLogAssembleRecord() would be better.

I'm not very happy with XLogInsertRecData myself. XLogInsertRecord?

> I'm also somewhat confused about the new split of the headers. I'm not
> adverse to splitting them, but it's getting a bit hard to understand:
> * xlog.h - generic mishmash
> * xlogdefs.h - Three typedefs and some #defines
> * xlog_fn.h - SQL functions
> * xlog_internal.h - Pretty random collection of stuff
> * xlogutils.h - "Utilities for replaying WAL records"
> * xlogreader.h - "generic XLog reading facility"
> * xloginsert.h - "Functions for generating WAL records"
> * xlogrecord.h - "Definitions for the WAL record format."
>
> Isn't that a bit too much? Pretty much the only files that have a
> recognizable separation to me are xlog_fn.h and xlogreader.h.

This is a matter of taste, I guess. I find xlog.h, xlogdefs.h and 
xlog_internal.h fairly random, while the rest are have a clear function. 
xlogutils.h is needed by redo functions, and xloginsert.h is needed for 
generating WAL.

Note that the second patch adds more stuff to xloginsert.h and xlogrecord.h.

> You've removed the
> - *
> - * NB: this routine feels free to scribble on the XLogRecData structs,
> - * though not on the data they reference.  This is OK since the
> XLogRecData
> - * structs are always just temporaries in the calling code.
> comment, but we still do, no?

True. XLogInsertRecData doesn't scribble on its input, but XLogInsert 
still does. (this is moot after the second patch though)

> While not this patches blame, I see currently we finish the critical
> section in XLogInsert/XLogInsertRecData pretty early:
>          END_CRIT_SECTION();
>
>          /*
>           * Update shared LogwrtRqst.Write, if we crossed page boundary.
>           */
>          if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
>          {
>                  SpinLockAcquire(&XLogCtl->info_lck);
>                  /* advance global request to include new block(s) */
>                  if (XLogCtl->LogwrtRqst.Write < EndPos)
>                          XLogCtl->LogwrtRqst.Write = EndPos;
>                  /* update local result copy while I have the chance */
>                  LogwrtResult = XLogCtl->LogwrtResult;
>                  SpinLockRelease(&XLogCtl->info_lck);
>          }
> I don't think this is particularly critical, but it still looks wrong to me.

Updating LogwrtRqst.Write is not critical. If it's not done for some 
reason, everything still works. Keeping the critical section as small as 
possible is a good idea, no?

> Hm. I think that's what I see for now.

Thanks!

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: infinite loop in _bt_getstackbuf
Следующее
От: Robert Haas
Дата:
Сообщение: Re: BRIN indexes - TRAP: BadArgument