Re: pg_preadv() and pg_pwritev()

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: pg_preadv() and pg_pwritev()
Дата
Msg-id CA+hUKG+TCoareSYcAu05yaMh9eUck6Hd8FQZt5VEvotNs3YOMQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_preadv() and pg_pwritev()  (Andres Freund <andres@anarazel.de>)
Ответы Re: pg_preadv() and pg_pwritev()  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On Mon, Dec 21, 2020 at 11:40 AM Andres Freund <andres@anarazel.de> wrote:
> On 2020-12-20 16:26:42 +1300, Thomas Munro wrote:
> > > 1. port.h cannot assume that <limits.h> has already been included;
> > > nor do I want to fix that by including <limits.h> there.  Do we
> > > really need to define a fallback value of IOV_MAX?  If so,
> > > maybe the answer is to put the replacement struct iovec and
> > > IOV_MAX in some new header.
> >
> > Ok, I moved all this stuff into port/pg_uio.h.
>
> Can we come up with a better name than 'uio'? I find that a not exactly
> meaningful name.

Ok, let's try port/pg_iovec.h.

> Or perhaps we could just leave the functions in port/port.h, but extract
> the value of the define at configure time? That way only pread.c /
> pwrite.c would need it.

That won't work for the struct definition, so client code would need
to remember to do:

#ifdef HAVE_SYS_UIO_H
#include <sys/uio.h>
#endif

... which is a little tedious, or port.h would need to do that and
incur an overhead in every translation unit, which Tom objected to.
That's why I liked the separate header idea.

> > > 3. The patch as given won't prove anything except that the code
> > > compiles.  Is it worth fixing at least one code path to make
> > > use of pg_preadv and pg_pwritev, so we can make sure this code
> > > is tested before there's layers of other new code on top?
> >
> > OK, here's a patch to zero-fill fresh WAL segments with pwritev().
>
> I think that's a good idea. However, I see two small issues: 1) If we
> write larger amounts at once, we need to handle partial writes. That's a
> large enough amount of IO that it's much more likely to hit a memory
> shortage or such in the kernel - we had to do that a while a go for WAL
> writes (which can also be larger), if memory serves.
>
> Perhaps we should have pg_pwritev/readv unconditionally go through
> pwrite.c/pread.c and add support for "continuing" partial writes/reads
> in one central place?

Ok, here's a new version with the following changes:

1.  Define PG_IOV_MAX, a reasonable size for local variables, not
larger than IOV_MAX.
2   Use 32 rather than 1024, based on off-list complaint about 1024
potentially swamping the IO system unfairly.
3.  Add a wrapper pg_pwritev_retry() to retry automatically on short
writes.  (I didn't write pg_preadv_retry(), because I don't currently
need it for anything so I didn't want to have to think about EOF vs
short-reads-for-implementation-reasons.)
4.  I considered whether pg_pwrite() should have built-in retry
instead of a separate wrapper, but I thought of an argument against
hiding the "raw" version: the AIO patch set already understands short
reads/writes and knows how to retry at a higher level, as that's
needed for native AIO too, so I think it makes sense to be able to
keep things the same and not solve the same problem twice.  A counter
argument would be that you could get the retry underway sooner with a
tight loop, but I'm not expecting this to be common.

> > I'm drawing a blank on trivial candidate uses for preadv(), without
> > infrastructure from later patches.
>
> Can't immediately think of something either.

One idea I had for the future is for xlogreader.c to read the WAL into
a large multi-page circular buffer, which could wrap around using a
pair of iovecs, but that requires lots more work    .

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: postgres_fdw - cached connection leaks if the associated user mapping/foreign server is dropped
Следующее
От: Peter Smith
Дата:
Сообщение: Re: Single transaction in the tablesync worker?