Re: Improve WALRead() to suck data directly from WAL buffers when possible

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Improve WALRead() to suck data directly from WAL buffers when possible
Дата
Msg-id CALj2ACVFjHq38_rM653yuiAAFVSPskokDzBmwo3P-rm57OCiYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Thu, Oct 12, 2023 at 4:13 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-10-03 16:05:32 -0700, Jeff Davis wrote:
> >
> > Does this patch still look like a good fit for your (or someone else's)
> > plans for direct IO here? If so, would committing this soon make it
> > easier to make progress on that, or should we wait until it's actually
> > needed?
>
> I think it'd be quite useful to have. Even with the code as of 16, I see
> better performance in some workloads with debug_io_direct=wal,
> wal_sync_method=open_datasync compared to any other configuration. Except of
> course that it makes walsenders more problematic, as they suddenly require
> read IO. Thus having support for walsenders to send directly from wal buffers
> would be beneficial, even without further AIO infrastructure.

Right. Tests show the benefit with WAL DIO + this patch -
https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40mail.gmail.com.

Also, irrespective of WAL DIO, the WAL buffers hit ratio with the
patch stood at 95% for 1 primary, 1 sync standby, 1 async standby,
pgbench --scale=300 --client=32 --time=900. In other words, the
walsenders avoided 95% of the time reading from the file/avoided pread
system calls -
https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com.

> I also think there are other quite desirable features that are made easier by
> this patch. One of the primary problems with using synchronous replication is
> the latency increase, obviously. We can't send out WAL before it has locally
> been wirten out and flushed to disk. For some workloads, we could
> substantially lower synchronous commit latency if we were able to send WAL to
> remote nodes *before* WAL has been made durable locally, even if the receiving
> systems wouldn't be allowed to write that data to disk yet: It takes less time
> to send just "write LSN: %X/%X, flush LSNL: %X/%X" than also having to send
> all the not-yet-durable WAL.
>
> In many OLTP workloads there won't be WAL flushes between generating WAL for
> DML and commit, which means that the amount of WAL that needs to be sent out
> at commit can be of nontrivial size.
>
> E.g. for pgbench, normally a transaction is about ~550 bytes (fitting in a
> single tcp/ip packet), but a pgbench transaction that needs to emit FPIs for
> everything is a lot larger: ~45kB (not fitting in a single packet). Obviously
> many real world workloads OLTP workloads actually do more writes than
> pgbench. Making the commit latency of the latter be closer to the commit
> latency of the former when using syncrep would obviously be great.
>
> Of course this patch is just a relatively small step towards that: We'd also
> need in-memory buffering on the receiving side, the replication protocol would
> need to be improved, we'd likely need an option to explicitly opt into
> receiving unflushed data. But it's still a pretty much required step.

Yes, this patch can pave the way for all of the above features in
future. However, I'm looking forward to getting this in for now.
Later, I'll come up with more concrete thoughts on the above.

Having said above, the latest v10 patch after addressing some of the
review comments is at
https://www.postgresql.org/message-id/CALj2ACU3ZYzjOv4vZTR%2BLFk5PL4ndUnbLS6E1vG2dhDBjQGy2A%40mail.gmail.com.
Any further thoughts on the patch is welcome.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: Improving Physical Backup/Restore within the Low Level API
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Improving Physical Backup/Restore within the Low Level API