Re: WAL prefetch

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: WAL prefetch
Дата
Msg-id 8da3c2dd-8577-2141-d64a-d109ac038388@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: WAL prefetch  (Andres Freund <andres@anarazel.de>)
Ответы Re: WAL prefetch  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 06/16/2018 09:02 PM, Andres Freund wrote:
> On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
>>
>>
>> On 06/15/2018 08:01 PM, Andres Freund wrote:
>>> On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
>>>>
>>>>
>>>> On 14.06.2018 09:52, Thomas Munro wrote:
>>>>> On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
>>>>> <k.knizhnik@postgrespro.ru> wrote:
>>>>>> pg_wal_prefetch function will infinitely traverse WAL and prefetch block
>>>>>> references in WAL records
>>>>>> using posix_fadvise(WILLNEED) system call.
>>>>> Hi Konstantin,
>>>>>
>>>>> Why stop at the page cache...  what about shared buffers?
>>>>>
>>>>
>>>> It is good question. I thought a lot about prefetching directly to shared
>>>> buffers.
>>>
>>> I think that's definitely how this should work.  I'm pretty strongly
>>> opposed to a prefetching implementation that doesn't read into s_b.
>>>
>>
>> Could you elaborate why prefetching into s_b is so much better (I'm sure it
>> has advantages, but I suppose prefetching into page cache would be much
>> easier to implement).
> 
> I think there's a number of issues with just issuing prefetch requests
> via fadvise etc:
> 
> - it leads to guaranteed double buffering, in a way that's just about
>   guaranteed to *never* be useful. Because we'd only prefetch whenever
>   there's an upcoming write, there's simply no benefit in the page
>   staying in the page cache - we'll write out the whole page back to the
>   OS.

How does reading directly into shared buffers substantially change the
behavior? The only difference is that we end up with the double
buffering after performing the write. Which is expected to happen pretty
quick after the read request.

> - reading from the page cache is far from free - so you add costs to the
>   replay process that it doesn't need to do.
> - you don't have any sort of completion notification, so you basically
>   just have to guess how far ahead you want to read. If you read a bit
>   too much you suddenly get into synchronous blocking land.
> - The OS page is actually not particularly scalable to large amounts of
>   data either. Nor are the decisions what to keep cached likley to be
>   particularly useful.

The posix_fadvise approach is not perfect, no doubt about that. But it
works pretty well for bitmap heap scans, and it's about 13249x better
(rough estimate) than the current solution (no prefetching).

> - We imo need to add support for direct IO before long, and adding more
>   and more work to reach feature parity strikes meas a bad move.
> 

IMHO it's unlikely to happen in PG12, but I might be over-estimating the
invasiveness and complexity of the direct I/O change. While this patch
seems pretty doable, and the improvements are pretty significant.

My point was that I don't think this actually adds a significant amount
of work to the direct IO patch, as we already do prefetch for bitmap
heap scans. So this needs to be written anyway, and I'd expect those two
places to share most of the code. So where's the additional work?

I don't think we should reject patches just because it might add a bit
of work to some not-yet-written future patch ... (which I however don't
think is this case).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: WAL prefetch
Следующее
От: Andres Freund
Дата:
Сообщение: Re: WAL prefetch