Обсуждение: Parallel sequential scans
I'm doing some reporting-type work with PG, with the vast majority of queries hitting upwards of 25% of the table, so being executed as seq scans. It's a fairly large set of data, so each pass is taking quite a while, IO limited. And I'm looking at doing dozens of passes. It would be really nice to be able to do all the work with a single pass over the table, executing all the queries in parallel in that pass. They're pretty simple queries, mostly, just some aggregates and a simple where clause. There are some fairly obvious ways to merge multiple queries to do that at a SQL level - converting each query into a function and passing each row from a select * to each of the functions would be one of the less ugly. Or I could fire off all the queries simultaneously and hope they stay in close-enough lockstep through a single pass through the table to be able to share most of the IO. Is there a commonly used trick to doing this that I should know about? Cheers, Steve
Steve Atkins <steve@blighty.com> writes: > I'm doing some reporting-type work with PG, with the vast > majority of queries hitting upwards of 25% of the table, so > being executed as seq scans. > ... > It would be really nice to be able to do all the work with a > single pass over the table, executing all the queries in > parallel in that pass. They're pretty simple queries, mostly, > just some aggregates and a simple where clause. > There are some fairly obvious ways to merge multiple > queries to do that at a SQL level - converting each query > into a function and passing each row from a select * to > each of the functions would be one of the less ugly. > Or I could fire off all the queries simultaneously and hope > they stay in close-enough lockstep through a single pass > through the table to be able to share most of the IO. I have not tried this sort of thing, but right offhand I like the second alternative. The "hope" is more well-founded than you seem to think: whichever process is currently ahead will be slowed by requesting I/O, while processes that are behind will find the pages they need already in shared buffers. You should definitely see just one read of each table page as the parallel scans advance, assuming you don't have an unreasonably small number of buffers. Another reason, if you have more than one CPU in your machine, is that multiple processes can make use of multiple CPUs, whereas the one-fancy-query approach doesn't parallelize (at least not without Bizgres or some such). And lastly, you can just try it without sweating hard to convert the queries ;-). So try it and let us know how it goes. regards, tom lane
On Fri, Mar 24, 2006 at 01:44:19AM -0500, Tom Lane wrote: > > Or I could fire off all the queries simultaneously and hope > > they stay in close-enough lockstep through a single pass > > through the table to be able to share most of the IO. > > I have not tried this sort of thing, but right offhand I like the second > alternative. The "hope" is more well-founded than you seem to think: > whichever process is currently ahead will be slowed by requesting I/O, > while processes that are behind will find the pages they need already in > shared buffers. You should definitely see just one read of each table > page as the parallel scans advance, assuming you don't have an > unreasonably small number of buffers. > > Another reason, if you have more than one CPU in your machine, is that > multiple processes can make use of multiple CPUs, whereas the > one-fancy-query approach doesn't parallelize (at least not without > Bizgres or some such). > > And lastly, you can just try it without sweating hard to convert the > queries ;-). So try it and let us know how it goes. There was also a patch floating around that allowed seqscans to start in the middle of a table if it was detected that a seqscan on that table was already in progress. Not sure if that made it in, but it might be relevant here. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461