Re: Streaming I/O, vectored I/O (WIP)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Streaming I/O, vectored I/O (WIP)
Дата
Msg-id 20230927201333.hmlpwo6h2j5tdom6@alap3.anarazel.de
обсуждение исходный текст
Ответ на Streaming I/O, vectored I/O (WIP)  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Streaming I/O, vectored I/O (WIP)  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-09-27 21:33:15 +0300, Heikki Linnakangas wrote:
> I'm a bit disappointed and surprised by
> v1-0009-WIP-Use-streaming-reads-in-vacuum.patch though:
> 
>  4 files changed, 244 insertions(+), 78 deletions(-)
> 
> The current prefetching logic in vacuumlazy.c is pretty hairy, so I hoped
> that this would simplify it. I didn't look closely at that patch, so maybe
> it's simpler even though it's more code.

A good chunk of the changes is pretty boring stuff. A good chunk of the
remainder could be simplified a lot - it's partially there because vacuumlazy
changed a lot over the last couple years and because a bit more refactoring is
needed.  I do think it's actually simpler in some ways - besides being more
efficient...


> > v1-0002-Provide-vectored-variants-of-smgrread-and-smgrwri.patch
> 
> No smgrextendv()? I guess none of the patches here needed it.

I can't really imagine needing it anytime soon - due to the desire to avoid
ENOSPC for pages in the buffer pool the common pattern is to extend relations
with zeroes on disk, then populate those buffers in memory. It's possible that
you could use something like smgrextendv() when operating directly on the smgr
level - but then I suspect you're going to be better off to extend the
relation to the right size in one operation and then just use smgrwritev() to
write out the contents.


> > /*
> >  * Prepare to read a block.  The buffer is pinned.  If this is a 'hit', then
> >  * the returned buffer can be used immediately.  Otherwise, a physical read
> >  * should be completed with CompleteReadBuffers().  PrepareReadBuffer()
> >  * followed by CompleteReadBuffers() is equivalent ot ReadBuffer(), but the
> >  * caller has the opportunity to coalesce reads of neighboring blocks into one
> >  * CompleteReadBuffers() call.
> >  *
> >  * *found is set to true for a hit, and false for a miss.
> >  *
> >  * *allocated is set to true for a miss that allocates a buffer for the first
> >  * time.  If there are multiple calls to PrepareReadBuffer() for the same
> >  * block before CompleteReadBuffers() or ReadBuffer_common() finishes the
> >  * read, then only the first such call will receive *allocated == true, which
> >  * the caller might use to issue just one prefetch hint.
> >  */
> > Buffer
> > PrepareReadBuffer(BufferManagerRelation bmr,
> >                   ForkNumber forkNum,
> >                   BlockNumber blockNum,
> >                   BufferAccessStrategy strategy,
> >                   bool *found,
> >                   bool *allocated)
> > 
> 
> If you decide you don't want to perform the read, after all, is there a way
> to abort it without calling CompleteReadBuffers()?

When would that be needed?


> Looking at the later patch that introduces the streaming read API, seems
> that it finishes all the reads, so I suppose we don't need an abort
> function. Does it all get cleaned up correctly on error?

I think it should.  The buffer error handling is one of the areas where I
really would like to have some way of testing the various cases, it's easy to
get things wrong, and basically impossible to write reliable tests for with
our current infrastructure.


> > typedef bool (*PgStreamingReadBufferDetermineNextCB) (PgStreamingRead *pgsr,
> >         uintptr_t pgsr_private,
> >         void *per_io_private,
> >         BufferManagerRelation *bmr,
> >         ForkNumber *forkNum,
> >         BlockNumber *blockNum,
> >         ReadBufferMode *mode);
> 
> I was surprised that 'bmr', 'forkNum' and 'mode' are given separately on
> each read. I see that you used that in the WAL replay prefetching, so I
> guess that makes sense.

Yea, that's the origin - I don't like it, but I don't really have a better
idea.


> > extern void pg_streaming_read_prefetch(PgStreamingRead *pgsr);
> > extern Buffer pg_streaming_read_buffer_get_next(PgStreamingRead *pgsr, void **per_io_private);
> > extern void pg_streaming_read_reset(PgStreamingRead *pgsr);
> > extern void pg_streaming_read_free(PgStreamingRead *pgsr);
> 
> Do we need to expose pg_streaming_read_prefetch()? It's only used in the WAL
> replay prefetching patch, and only after calling pg_streaming_read_reset().
> Could pg_streaming_read_reset() call pg_streaming_read_prefetch() directly?
> Is there any need to "reset" the stream, without also starting prefetching?

Heh, I think this is a discussion Thomas I were having before...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Eager page freeze criteria clarification
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Eager page freeze criteria clarification