Re: [HACKERS] subscription worker signalling wal writer too much

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: [HACKERS] subscription worker signalling wal writer too much
Дата	27 августа 2017 г. 03:45:20
Msg-id	CAMkU=1zP931=v-ypFZMDxHAGGQwuVNUPONZE5_ggV+onkXEdvQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] subscription worker signalling wal writer too much (Jeff Janes <jeff.janes@gmail.com>)
Ответы	Re: [HACKERS] subscription worker signalling wal writer too much (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jul 3, 2017 at 8:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sat, Jun 24, 2017 at 5:09 PM, Andres Freund <andres@anarazel.de> wrote:
On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:

> Wouldn't it
> better, and much simpler, just to have reverted the first change once the
> second one was done?

I think both can actually happen independently, no? It's pretty common
for the page lsn to be *older* than the corresponding commit. In fact
you'll always hit that case unless there's also concurrent writes also
touching said page.

That is true, but I don't see any residual regression when going from pre-db76b1efbbab2441428 to 7975c5e0a992ae9a4-with-my-patch applied. So I think the hint bit issue only shows up when hitting the same page aggressively, in which case the LSN does get advanced quickly enough that db76b1efbba solves it. Perhaps a different test case could show that issue. I also didn't see any improvement from the original 4de82f7d7c50a81e, so maybe 8 CPUs just isn't enough to detect the problem with setting hint-buts with permanent tables with synchronous_commit=false.

I've refined my benchmarking, using a smaller scaling factor (8, same as my CPU count) and running the tests for longer. The smaller size means there are more intensive updates on individual pgbench_accounts pages, and the longer run times allows be unset hint bits to build up for longer (usually I like short test runs with a large number of randomized repetitions, but that doesn't work well here). Since the scale factor is the same as the number of CPUs, I've bumped the thread count so that it is more likely someone will choose a non-contended value of pgbench_branches.bid to update at any given moment.

pgbench -c16 -j16 -T1200 -M prepared -b tpcb-func --startup='set synchronous_commit=false'

This testing did show the importance of waking up the wal writer so that hint bits can be set:

commit	TPS
cfafd8beadcd6f	22200.48
cfafd8beadcd6f_nowake	30766.83

median of 14 runs reported, p-val on difference of means is 2e-22.

Where nowake is a hackish disabling of the wake up introduced in 4de82f7d7c50a8, forward ported to 9.6 branch. (I still wake it if is is doing the big sleep, I just took out the part that wake it up even from small sleeps)

So my revised benchmark is capable of detecting this effect, even with only 8 CPUs.

When I move to next commit where we set hint bits as long as the page was re-dirtied after, get these numbers:

db76b1efbbab24	30742.69
db76b1efbbab24_nowake	31072.97

The difference between these is not statistically different, nor is the difference between them and cfafd8beadcd6f.

So I think the logic you introduced in db76b1efbbab24 captures all the gain there is to be captured, and 4de82f7d7c50a8 can be reverted. The only reason to wake up the wal writer is if it is taking the big sleep.

Of course, I can't prove a negative. There could always be some test case which demonstrates that 4de82f7d7c50a8 still matters in spite of db76b1efbbab24. So to be ultra-conservative, attached is a patch which should preserve all wake-ups other than the ones known to be useless.

7975c5e0a	27810.66
7975c5e0a_patched	30821.41

So 7975c5 causes a significant regression (10% regression, p-val 7e-16 on difference of the means). It repeatedly wakes the wal-writer up to flush a page which the wal-writer simply refuses to flush. This can't serve any purpose. My patch still wakes it up to write the page if it has not been written yet, and also still wakes it to flush a range of pages of WalWriterFlushAfter is met. But it just doesn't wake it up to flush a page which has already been written and will not get flushed.

I'd prefer the code-simplifying change of just not waking it up anymore except for when it is doing the big sleep, but I can't reliably detect a performance difference between that and the more conservative change posted here.

Cheers,

Jeff