Re: WAL prefetch

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: WAL prefetch
Дата
Msg-id baa76c0f-ac18-851f-8181-316629fc7ee4@postgrespro.ru
обсуждение исходный текст
Ответ на Re: WAL prefetch  (Andres Freund <andres@anarazel.de>)
Ответы Re: WAL prefetch  (Andres Freund <andres@anarazel.de>)
Re: WAL prefetch  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers

On 19.06.2018 18:50, Andres Freund wrote:
> On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote:
>> I do not think that prefetching in shared buffers requires much more efforts
>> and make patch more envasive...
>> It even somehow simplify it, because there is no to maintain own cache of
>> prefetched pages...
>> But it will definitely have much more impact on Postgres performance:
>> contention for buffer locks, throwing away pages accessed by read-only
>> queries,...
> These arguments seem bogus to me. Otherwise the startup process is going
> to do that work.

There is just one process replaying WAL. Certainly it has some impact on 
hot standby query execution.
But if there will be several prefetch workers (128???) then this impact 
will be dramatically increased.


>
>> Also there are two points which makes prefetching into shared buffers more
>> complex:
>> 1. Need to spawn multiple workers to make prefetch in parallel and somehow
>> distribute work between them.
> I'm not even convinced that's true. It doesn't seem insane to have a
> queue of, say, 128 requests that are done with posix_fadvise WILLNEED,
> where the oldest requests is read into shared buffers by the
> prefetcher. And then discarded from the page cache with WONTNEED.  I
> think we're going to want a queue that's sorted in the prefetch process
> anyway, because there's a high likelihood that we'll otherwise issue
> prfetch requets for the same pages over and over again.
>
> That gets rid of most of the disadvantages: We have backpressure
> (because the read into shared buffers will block if not yet ready),
> we'll prevent double buffering, we'll prevent the startup process from
> doing the victim buffer search.
>
>
>> Concerning WAL perfetch I still have a serious doubt if it is needed at all:
>> if checkpoint interval is less than size of free memory at the system, then
>> redo process should not read much.
> I'm confused. Didn't you propose this?  FWIW, there's a significant
> number of installations where people have observed this problem in
> practice.

Well, originally it was proposed by Sean - the author of pg-prefaulter. 
I just ported it from GO to C using standard PostgreSQL WAL iterator.
Then I performed some measurements and didn't find some dramatic 
improvement in performance (in case of synchronous replication) or 
reducing replication lag for asynchronous replication neither at my 
desktop (SSD, 16Gb RAM, local replication within same computer, pgbench 
scale 1000), neither at pair of two powerful servers connected by
InfiniBand and 3Tb NVME (pgbench with scale 100000).
Also I noticed that read rate at replica is almost zero.
What does it mean:
1. I am doing something wrong.
2. posix_prefetch is not so efficient.
3. pgbench is not right workload to demonstrate effect of prefetch.
4. Hardware which I am using is not typical.

So it make me think when such prefetch may be needed... And it caused 
new questions:
I wonder how frequently checkpoint interval is much larger than OS cache?
If we enforce full pages writes (let's say each after each 1Gb), how it 
affect wal size and performance?

Looks like it is difficult to answer the second question without 
implementing some prototype.
May be I will try to do it.
>> And if checkpoint interval is much larger than OS cache (are there cases
>> when it is really needed?)
> Yes, there are.  Percentage of FPWs can cause serious problems, as do
> repeated writouts by the checkpointer.

One more consideration: data is written to the disk as blocks in any 
case. If you updated just few bytes on a page, then still the whole page 
has to be written in database file.
So avoiding full page writes allows to reduce WAL size and amount of 
data written to the WAL, but not amount of data written to the database 
itself.
It means that if we completely eliminate FPW and transactions are 
updating random pages, then disk traffic is reduced less than two times...

>
>
>> then quite small patch (as it seems to me now) forcing full page write
>> when distance between page LSN and current WAL insertion point exceeds
>> some threshold should eliminate random reads also in this case.
> I'm pretty sure that that'll hurt a significant number of installations,
> that set the timeout high, just so they can avoid FPWs.
May be, but I am not so sure. This is why I will try to investigate it more.


> Greetings,
>
> Andres Freund

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Fast default stuff versus pg_upgrade
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Fast default stuff versus pg_upgrade