Re: Reduce/eliminate the impact of FPW

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Reduce/eliminate the impact of FPW
Дата
Msg-id CA+TgmoaLdyBCSfQb=8+zthN0Oyfs0zE4HQuRE6wR+EibxHnNDQ@mail.gmail.com
обсуждение исходный текст
Ответ на Reduce/eliminate the impact of FPW  (Daniel Wood <hexexpert@comcast.net>)
Ответы Re: Reduce/eliminate the impact of FPW
Список pgsql-hackers
On Mon, Aug 3, 2020 at 5:26 AM Daniel Wood <hexexpert@comcast.net> wrote:
> If we can't eliminate FPW's can we at least solve the impact of it?  Instead of writing the before images of pages
inlineinto the WAL, which increases the COMMIT latency, write these same images to a separate physical log file.  The
keyidea is that I don't believe that COMMIT's require these buffers to be immediately flushed to the physical log.  We
onlyneed to flush these before the dirty pages are written.  This delay allows the physical before image IO's to be
decoupledand done in an efficient manner without an impact to COMMIT's. 

I think this is what's called a double-write buffer, or what was tried
some years ago under that name.  A significant problem is that you
have to fsync() the double-write buffer before you can write the WAL.
So instead of this:

- write WAL to OS
- fsync WAL

You have to do this:

- write double-write buffer to OS
- fsync double-write buffer
- write WAL to OS
- fsync WAL

Note that you cannot overlap these steps -- the first fsync must be
completed before the second write can begin, else you might try to
replay WAL for which the double-write buffer information is not
available.

Because of this, I think this is actually quite expensive. COMMIT
requires the WAL to be flushed, unless you configure
synchronous_commit=off. So this would double the number of fsyncs we
have to do. It's not as bad as all that, because the individual fsyncs
would be smaller, and that makes a significant difference. For a big
transaction that writes a lot of WAL, you'd probably not notice much
difference; instead of writing 1000 pages to WAL, you might write 770
pages to the double-write buffer and 270 to the double-write buffer,
or something like that. But for short transactions, such as those
performed by pgbench, you'd probably end up with a lot of cases where
you had to write 3 pages instead of 2, and not only that, but the
writes have to be consecutive rather than simultaneous, and to
different parts of the disk rather than sequential. That would likely
suck a lot.

It's entirely possible that these kinds of problems could be mitigated
through really good engineering, maybe to the point where this kind of
solution outperforms what we have now for some or even all workloads,
but it seems equally possible that it's just always a loser. I don't
really know. It seems like a very difficult project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: public schema default ACL
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Confusing behavior of create table like