Re: Parallel Seq Scan vs kernel read ahead

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Parallel Seq Scan vs kernel read ahead
Дата
Msg-id CAApHDvpLCEVbdX_i0Ao18sPxg8zgsAKvESznGzMVLjXJvwfjMQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan vs kernel read ahead  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Parallel Seq Scan vs kernel read ahead  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Parallel Seq Scan vs kernel read ahead  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On Tue, 14 Jul 2020 at 19:13, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Jun 26, 2020 at 3:33 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, Jun 23, 2020 at 11:53 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > > In summary, based on these tests, I don't think we're making anything
> > > worse in regards to synchronize_seqscans if we cap the maximum number
> > > of blocks to allocate to each worker at once to 8192. Perhaps there's
> > > some argument for using something smaller than that for servers with
> > > very little RAM, but I don't personally think so as it still depends
> > > on the table size and It's hard to imagine tables in the hundreds of
> > > GBs on servers that struggle with chunk allocations of 16MB.  The
> > > table needs to be at least ~70GB to get a 8192 chunk size with the
> > > current v2 patch settings.
> >
> > Nice research. That makes me happy. I had a feeling the maximum useful
> > chunk size ought to be more in this range than the larger values we
> > were discussing before, but I didn't even think about the effect on
> > synchronized scans.
>
> +1.  This seems about right to me.  We can always reopen the
> discussion if someone shows up with evidence in favour of a tweak to
> the formula, but this seems to address the basic problem pretty well,
> and also fits nicely with future plans for AIO and DIO.

Thank you both of you for having a look at the results.

I'm now pretty happy with this too. I do understand that we've not
exactly exhaustively tested all our supported operating systems.
However, we've seen some great speedups with Windows 10 and Linux with
SSDs. Thomas saw great speedups with FreeBSD with the original patch
using chunk sizes of 64 blocks. (I wonder if it's worth verifying that
it increases further with the latest patch with the same test you did
in the original email on this thread?)

I'd like to propose that if anyone wants to do further testing on
other operating systems with SSDs or HDDs then it would be good if
that could be done within a 1 week from this email. There are various
benchmarking ideas on this thread for inspiration.

If we've not seen any performance regressions within 1 week, then I
propose that we (pending final review) push this to allow wider
testing. It seems we're early enough in the PG14 cycle that there's a
large window of time for us to do something about any reported
performance regressions that come in.

I also have in mind that Amit was keen to see a GUC or reloption to
allow users to control this. My thoughts on that are still that it
would be possible to craft a case where we scan an entire heap to get
a very small number of rows that are all located in the same area in
the table and then call some expensive function on those rows. The
chunk size ramp down code will help reduce the chances of one worker
running on much longer than its co-workers, but not eliminate the
chances.  Even the code as it stands today could suffer from this to a
lesser extent if all the matching rows are on a single page. My
current thoughts are that this just seems unlikely and that the
granularity of 1 block for cases like this was never that great
anyway. I suppose a more ideal plan shape would "Distribute" matching
rows to allow another set of workers to pick these rows up one-by-one
and process them. Our to-date lack of such an operator probably counts
a little towards the fact that one parallel worker being tied up with
a large amount of work is not that common.  Based on those thoughts,
I'd like to avoid any GUC/reloption until we see evidence that it's
really needed.

Any objections to any of the above?

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: avoid bitmapOR-ing indexes with scan condition inconsistent with partition constraint
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Binary support for pgoutput plugin