Обсуждение: Parallel sequential scans

Поиск
Список
Период
Сортировка

Parallel sequential scans

От
Steve Atkins
Дата:
I'm doing some reporting-type work with PG, with the vast
majority of queries hitting upwards of 25% of the table, so
being executed as seq scans.

It's a fairly large set of data, so each pass is taking quite
a while, IO limited. And I'm looking at doing dozens of
passes.

It would be really nice to be able to do all the work with a
single pass over the table, executing all the queries in
parallel in that pass. They're pretty simple queries, mostly,
just some aggregates and a simple where clause.

There are some fairly obvious ways to merge multiple
queries to do that at a SQL level - converting each query
into a function and passing each row from a select * to
each of the functions would be one of the less ugly.

Or I could fire off all the queries simultaneously and hope
they stay in close-enough lockstep through a single pass
through the table to be able to share most of the IO.

Is there a commonly used trick to doing this that I should
know about?

Cheers,
   Steve




Re: Parallel sequential scans

От
Tom Lane
Дата:
Steve Atkins <steve@blighty.com> writes:
> I'm doing some reporting-type work with PG, with the vast
> majority of queries hitting upwards of 25% of the table, so
> being executed as seq scans.
> ...
> It would be really nice to be able to do all the work with a
> single pass over the table, executing all the queries in
> parallel in that pass. They're pretty simple queries, mostly,
> just some aggregates and a simple where clause.

> There are some fairly obvious ways to merge multiple
> queries to do that at a SQL level - converting each query
> into a function and passing each row from a select * to
> each of the functions would be one of the less ugly.

> Or I could fire off all the queries simultaneously and hope
> they stay in close-enough lockstep through a single pass
> through the table to be able to share most of the IO.

I have not tried this sort of thing, but right offhand I like the second
alternative.  The "hope" is more well-founded than you seem to think:
whichever process is currently ahead will be slowed by requesting I/O,
while processes that are behind will find the pages they need already in
shared buffers.  You should definitely see just one read of each table
page as the parallel scans advance, assuming you don't have an
unreasonably small number of buffers.

Another reason, if you have more than one CPU in your machine, is that
multiple processes can make use of multiple CPUs, whereas the
one-fancy-query approach doesn't parallelize (at least not without
Bizgres or some such).

And lastly, you can just try it without sweating hard to convert the
queries ;-).  So try it and let us know how it goes.

            regards, tom lane

Re: Parallel sequential scans

От
"Jim C. Nasby"
Дата:
On Fri, Mar 24, 2006 at 01:44:19AM -0500, Tom Lane wrote:
> > Or I could fire off all the queries simultaneously and hope
> > they stay in close-enough lockstep through a single pass
> > through the table to be able to share most of the IO.
>
> I have not tried this sort of thing, but right offhand I like the second
> alternative.  The "hope" is more well-founded than you seem to think:
> whichever process is currently ahead will be slowed by requesting I/O,
> while processes that are behind will find the pages they need already in
> shared buffers.  You should definitely see just one read of each table
> page as the parallel scans advance, assuming you don't have an
> unreasonably small number of buffers.
>
> Another reason, if you have more than one CPU in your machine, is that
> multiple processes can make use of multiple CPUs, whereas the
> one-fancy-query approach doesn't parallelize (at least not without
> Bizgres or some such).
>
> And lastly, you can just try it without sweating hard to convert the
> queries ;-).  So try it and let us know how it goes.

There was also a patch floating around that allowed seqscans to start in
the middle of a table if it was detected that a seqscan on that table
was already in progress. Not sure if that made it in, but it might be
relevant here.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461