Re: WAL prefetch

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: WAL prefetch
Дата	19 июня 2018 г. 19:44:15
Msg-id	20180619164415.ta6q47vwvyzcjwjo@alap3.anarazel.de обсуждение исходный текст
Ответ на	Re: WAL prefetch (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список	pgsql-hackers

Дерево обсуждения

On 2018-06-19 19:34:22 +0300, Konstantin Knizhnik wrote:
> On 19.06.2018 18:50, Andres Freund wrote:
> > On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote:
> > > I do not think that prefetching in shared buffers requires much more efforts
> > > and make patch more envasive...
> > > It even somehow simplify it, because there is no to maintain own cache of
> > > prefetched pages...
> > > But it will definitely have much more impact on Postgres performance:
> > > contention for buffer locks, throwing away pages accessed by read-only
> > > queries,...
> > These arguments seem bogus to me. Otherwise the startup process is going
> > to do that work.
> 
> There is just one process replaying WAL. Certainly it has some impact on hot
> standby query execution.
> But if there will be several prefetch workers (128???) then this impact will
> be dramatically increased.

Hence me suggesting how you can do that with one process (re locking). I
still entirely fail to see how "throwing away pages accessed by
read-only queries" is meaningful here - the startup process is going to
read the data anyway, and we *do not* want to use a ringbuffer as that'd
make the situation dramatically worse.

> Well, originally it was proposed by Sean - the author of pg-prefaulter. I
> just ported it from GO to C using standard PostgreSQL WAL iterator.
> Then I performed some measurements and didn't find some dramatic improvement
> in performance (in case of synchronous replication) or reducing replication
> lag for asynchronous replication neither at my desktop (SSD, 16Gb RAM, local
> replication within same computer, pgbench scale 1000), neither at pair of
> two powerful servers connected by
> InfiniBand and 3Tb NVME (pgbench with scale 100000).
> Also I noticed that read rate at replica is almost zero.

> What does it mean:
> 1. I am doing something wrong.
> 2. posix_prefetch is not so efficient.
> 3. pgbench is not right workload to demonstrate effect of prefetch.
> 4. Hardware which I am using is not typical.

I think it's probably largely a mix of 3 and 4. pgbench with random
distribution probably indeed is a bad testcase, because either
everything is in cache or just about every write ends up as a full page
write because of the scale.  You might want to try a) turn of full page
writes b) use a less random distribution.

> So it make me think when such prefetch may be needed... And it caused new
> questions:
> I wonder how frequently checkpoint interval is much larger than OS
> cache?

Extremely common.

> If we enforce full pages writes (let's say each after each 1Gb), how it
> affect wal size and performance?

Extremely badly.  If you look at stats of production servers (using
pg_waldump) you can see that large percentage of the total WAL volume is
FPWs, that FPWs are a storage / bandwidth / write issue, and that higher
FPW rates after a checkpoint correlate strongly negatively with performance.

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WAL prefetch