Re: Improve WALRead() to suck data directly from WAL buffers when possible

Поиск
Список
Период
Сортировка
От Melih Mutlu
Тема Re: Improve WALRead() to suck data directly from WAL buffers when possible
Дата
Msg-id CAGPVpCQrRD5OXjxb=qAP6jfM_S3b=d9iT59HR7wzUxnJbBYE1Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
Hi Bharath,

Thanks for working on this. It seems like a nice improvement to have.

Here are some comments on 0001 patch.

1-  xlog.c
+ /*
+ * Fast paths for the following reasons: 1) WAL buffers aren't in use when
+ * server is in recovery. 2) WAL is inserted into WAL buffers on current
+ * server's insertion TLI. 3) Invalid starting WAL location.
+ */

Shouldn't the comment be something like "2) WAL is *not* inserted into WAL buffers on current server's insertion TLI" since the condition to break is tli != GetWALInsertionTimeLine() 

2-
+ /*
+ * WAL being read doesn't yet exist i.e. past the current insert position.
+ */
+ if ((startptr + count) > reservedUpto)
+ return ntotal;

This question may not even make sense but I wonder whether we can read from startptr only to reservedUpto in case of startptr+count exceeds reservedUpto? 

3-
+ /* Wait for any in-progress WAL insertions to WAL buffers to finish. */
+ if ((startptr + count) > LogwrtResult.Write &&
+ (startptr + count) <= reservedUpto)
+ WaitXLogInsertionsToFinish(startptr + count);

Do we need to check if (startptr + count) <= reservedUpto as we already verified this condition a few lines above?

4-
+ Assert(nread > 0);
+ memcpy(dst, data, nread);
+
+ /*
+ * Make sure we don't read xlblocks down below before the page
+ * contents up above.
+ */
+ pg_read_barrier();
+
+ /* Recheck if the read page still exists in WAL buffers. */
+ endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
+
+ /* Return if the page got initalized while we were reading it. */
+ if (expectedEndPtr != endptr)
+ break;
+
+ /*
+ * Typically, we must not read a WAL buffer page that just got
+ * initialized. Because we waited enough for the in-progress WAL
+ * insertions to finish above. However, there can exist a slight
+ * window after the above wait finishes in which the read buffer page
+ * can get replaced especially under high WAL generation rates. After
+ * all, we are reading from WAL buffers without any locks here. So,
+ * let's not count such a page in.
+ */
+ phdr = (XLogPageHeader) page;
+ if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC &&
+   phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) &&
+   phdr->xlp_tli == tli))
+ break;

I see that you recheck if the page still exists and so at the end. What would you think about memcpy'ing only after being sure that we will need and use the recently read data? If we break the loop during the recheck, we simply discard the data read in the latest attempt. I guess that this may not be a big deal but the data would be unnecessarily copied into the destination in such a case.

5- xlogreader.c
+ nread = XLogReadFromBuffers(startptr, tli, count, buf);
+
+ if (nread > 0)
+ {
+ /*
+ * Check if its a full read, short read or no read from WAL buffers.
+ * For short read or no read, continue to read the remaining bytes
+ * from WAL file.
+ *
+ * XXX: It might be worth to expose WAL buffer read stats.
+ */
+ if (nread == count) /* full read */
+ return true;
+ else if (nread < count) /* short read */
+ {
+ buf += nread;
+ startptr += nread;
+ count -= nread;
+ }

Typo in the comment. Should be like "Check if *it's* a full read, short read or no read from WAL buffers."

Also I don't think XLogReadFromBuffers() returns anything less than 0 and more than count. Is verifying nread > 0 necessary? I think if nread does not equal to count, we can simply assume that it's a short read. (or no read at all in case nread is 0 which we don't need to handle specifically)

6-
+ /*
+ * We determined how much of the page can be validly read as 'count', read
+ * that much only, not the entire page. Since WALRead() can read the page
+ * from WAL buffers, in which case, the page is not guaranteed to be
+ * zero-padded up to the page boundary because of the concurrent
+ * insertions.
+ */

I'm not sure about pasting this into the most places we call WalRead(). Wouldn't it be better if we mention this somewhere around WALRead() only once? 


Best,
--
Melih Mutlu
Microsoft

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Remove unused fields in ReorderBufferTupleBuf
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Remove unused fields in ReorderBufferTupleBuf