Re: Streaming read-ready sequential scan code

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Streaming read-ready sequential scan code
Дата
Msg-id CA+hUKGLdvzXDMiLeyBd6u25AwNpPFdWTgER2-_JMU6QPNnxjrw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Streaming read-ready sequential scan code  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers
On Tue, May 21, 2024 at 9:11 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> So, if you are seeing the slow-down mostly go away by reducing
> blocknums array size, does the regression only appear when the scan
> data is fully in shared buffers? Or is this blocknums other use
> (dealing with short reads)?

That must be true (that blocknums array is normally only "filled" in
the "fast path", where all buffers are found in cache).

> Is your theory that one worker ends up reading 16 blocks that should
> have been distributed across multiple workers?

Yes, it just jiggles the odds around a bit, introducing a bit of extra
unfairness by calling the callback in a tighter loop to build a little
batch, revealing a pre-existing problem.

The mistake in PHJ (problem #2 above) is that, once a worker decides
it would like all workers to stop inserting so it can increase the
number of buckets, it sets a flag to ask them to do that, and waits
for them to see it, but if there is a worker filtering all tuples out,
it never checks the "growth" flag.  So it scans all the way to the end
while the other guy waits.  Normally it checks that flag when it is
time to allocate a new chunk of memory, which seemed to make sense to
me at the time: if we've hit the needs-more-buckets (or
needs-more-batches) logic, then surely workers are inserting tuples
and will soon allocate a new chunk!  But, of course, here is the edge
case where that isn't true: we had bad estimates so hash table too
small (problem #1), we got lots of tuples clustered over a few heap
pages and decided to expand the hash table, but right at that moment,
matching tuples ran out so somebody had to finish the whole scan
without ever checking the flag (problem #2), and that someone happened
to have all the rest of the pages because we made the lookahead a bit
less fair (problem #3).  Nice confluence of problems.  I expect #2 and
#3 to be easy to fix, and I didn't look at the estimation problem #1
at all (perhaps a stats puzzle designed by the TPC to trip us up?).



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Maciek Sakrejda
Дата:
Сообщение: Re: commitfest.postgresql.org is no longer fit for purpose
Следующее
От: David Steele
Дата:
Сообщение: Re: pg_combinebackup does not detect missing files