Re: Trying out read streams in pgvector (an extension)

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: Trying out read streams in pgvector (an extension)
Дата	12 ноября 13:47:16
Msg-id	CA+hUKGL7-Dx8KiUo=G91Y5tfFpwDUFFQJ6=9D8Gr1n=DZxGh+w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Trying out read streams in pgvector (an extension) (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Nov 12, 2025 at 9:04 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> On Wed, 12 Nov 2025 at 07:12, Thomas Munro <thomas.munro@gmail.com> wrote:
> 0002:
>
> +        /* End-of-stream. */
> +        buf = read_stream_next_buffer(stream, NULL);
> +        Assert(buf == InvalidBuffer);
> +        buf = read_stream_next_buffer(stream, NULL);
> +        Assert(buf == InvalidBuffer);
>
> I noticed there are two 'read_stream_next_buffer()' and
> 'InvalidBuffer' checks. Does having both provide any additional
> validation? I tried removing one of them, and the test still passed.

I wanted to demonstrate that this is a state that the stream is stuck
in until you call _resume().

I suppose an alternative design would be that _next_buffer() returns
InvalidBuffer only once (= the block number callback returns
InvalidBlock once) and then automatically resumes (= it restores the
distance) and then you can call read_stream_next_buffer() again (= the
block number callback will be called again to fill the stream up with
new buffers before waiting for the first one to be ready to give to
you if it isn't already).  That would have the advantage of not
requiring a new function at all and make the patch even shorter, but I
don't know, I guess I thought that would be a bit more fragile in some
way, less explicit.  Hmm, would it actually be better if it worked
like that?

> Also, there is one thing I wanted to clarify about the
> 'read_stream_resume()'. If 'read_stream_next_buffer()' returns an
> 'InvalidBuffer', then we can use 'read_stream_resume()' alone because
> we know that we already consumed all buffers in the stream. For the
> rest, we need to use 'read_stream_resume()' together with the
> 'read_stream_reset()', right?

For the rest, there would be no need to call read_stream_resume().

To recap the uses of read_stream_reset():  the original purpose was to
release any buffers (pins) that the stream is holding internally
because of look-ahead, and put it back to its original state, ready to
be reused.  It is called (1) by read_stream_end() as an implementation
detail (eg if a LIMIT or anything else except ERROR/FATAL ends your
query early, we need to unpin buffers queued in the stream before we
pfree it), (2) explicitly by rescans, (3) in hypothetical code I
thought about that would want to stream blocks speculatively and then
change its mind after predicting incorrectly (I had a few patches like
that, abandoned for now), and then (4) in this case, by places that
temporarily ran out of block numbers, but will have some more again
soon and want to resume the stream.

It was already debatable whether heuristic state like lookahead
distance should survive acoss rescans, or in other words, whether the
expected I/O requirements of the previous scan are a useful prediction
of the requirements of the next scan, but the answer is clearer in
case (4), hence the desire to find a way to separate that use case
from the others.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Trying out read streams in pgvector (an extension)