rebased background worker reimplementation prototype

Поиск
Список
Период
Сортировка
От Andres Freund
Тема rebased background worker reimplementation prototype
Дата
Msg-id 20190611032249.kfi7pgqu2ipmlqca@alap3.anarazel.de
обсуждение исходный текст
Ответы Re: rebased background worker reimplementation prototype  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi,

I've talked a few times about a bgwriter replacement prototype I'd
written a few years back. That happened somewhere deep in another thread
[1], and thus not easy to fix.

Tomas Vondra asked me for a link, but there was some considerable bitrot
since. Attached is a rebased and slightly improved version. It's also
available at [2][3].

The basic observation is that there's some fairly fundamental issues
with the current bgwriter implementation:

1) The pacing logic is complicated, but doesn't work well
2) If most/all buffers have a usagecount, it cannot do anything, because
   it doesn't participate in the clock-sweep
3) Backends have to re-discover the now clean buffers.


The prototype is much simpler - in my opinion of course. It has a
ringbuffer of buffers it thinks are clean (which might be reused
concurrently though). It fills that ringbuffer by performing
clock-sweep, and if necessary cleaning, usagecount=pincount=0
buffers. Backends can then pop buffers from that ringbuffer.

Pacing works by bgwriter trying to keep the ringbuffer full, and
backends emptying the ringbuffer. If the ringbuffer is less than 1/4
full, backends wake up bgwriter using the existing latch mechanism.

The ringbuffer is a pretty simplistic lockless (but just obstruction
free, not lock free) implementation, with a lot of unneccessary
constraints.

I've had to improve the current instrumentation for pgwriter
(i.e. pg_stat_bgwriter) considerably - the details in there imo are not
even remotely good enough to actually understand the system (nor are the
names understandable). That needs to be split into a separate commit,
and the half dozen different implementations of the counters need to be
unified.

Obviously this is very prototype-stage code. But I think it's a good
starting point for going forward.

To enable it, one currently has to set the bgwriter_legacy = false GUC.

Some early benchmarks show that in IO heavy cases there's somewhere
between a very mild regression (close to noise), to a pretty
considerable improvement. To see a benefit one - fairly obviously -
needs a workload that is bigger than shared buffers, because otherwise
checkpointer is going to do all writes (and should, it can sort them
perfectly!).

It's quite possible to saturate what a single bgwriter can write out (as
it is before the replacement). I'm inclined to think the next solution
for that is asynchronous IO, and write-combining, rather than multiple
bgwriters.

Here's an example pg_stat_bgwriter from the middle of a pgbench run
(after resetting it a short while before):

┌─[ RECORD 1 ]───────────────┬───────────────────────────────┐
│ checkpoints_timed          │ 1                             │
│ checkpoints_req            │ 0                             │
│ checkpoint_write_time      │ 179491                        │
│ checkpoint_sync_time       │ 266                           │
│ buffers_written_checkpoint │ 172414                        │
│ buffers_written_bgwriter   │ 475802                        │
│ buffers_written_backend    │ 7140                          │
│ buffers_written_ring       │ 0                             │
│ buffers_fsync_checkpointer │ 137                           │
│ buffers_fsync_bgwriter     │ 0                             │
│ buffers_fsync_backend      │ 0                             │
│ buffers_bgwriter_clean     │ 832616                        │
│ buffers_alloc_preclean     │ 1306572                       │
│ buffers_alloc_free         │ 0                             │
│ buffers_alloc_sweep        │ 4639                          │
│ buffers_alloc_ring         │ 767                           │
│ buffers_ticks_bgwriter     │ 4398290                       │
│ buffers_ticks_backend      │ 17098                         │
│ maxwritten_clean           │ 17                            │
│ stats_reset                │ 2019-06-10 20:17:56.087704-07 │
└────────────────────────────┴───────────────────────────────┘


Note that buffers_written_backend (as buffers_backend before) accounts
for file extensions too - which bgwriter can't offload. We should
replace that by a non-write (i.e. fallocate) anyway.

Greetings,

Andres Freund

[1] https://postgr.es/m/20160204155458.jrw3crmyscusdqf6%40alap3.anarazel.de
[2] https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/bgwriter-rewrite
[3] https://github.com/anarazel/postgres/tree/bgwriter-rewrite

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: pg_upgrade: prep_status doesn't translate messages
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Missing generated column in ALTER TABLE ADD COLUMN doc