Обсуждение: Re: PostgreSQL Parallel Processing !
Hi Everyone I just want to illustrate an idea may possible for bringing up parallel process in PostgreSQL at SQL-Query level The PARALLEL option in Oracle really give great improvment in performance, multi-thread concept has great possibilities In Oracle we have hints ( see below ) : SELECT /*+PARALLEL( e, 2 )*/ e.* FROM EMP e ; PostgreSQL ( may if possible in future ) : SELECT e.* FROM EMP PARALLEL ( e, 2) ; *Note: The below syntax does not work with any PostgreSQL versions PostgreSQL Syntax for SELECT ( with PARALLEL ) [ WITH [ RECURSIVE ] with_query [, ...] ] SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] * | expression [ [ AS ] output_name ] [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition [, ...] ] [ WINDOW window_name AS ( window_definition ) [, ...] ] [ PARALLEL (<alias> | <table> | <index> |<segment> , < no. of threads> ) ] [ { UNION | INTERSECT | EXCEPT } [ ALL ] select ] [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ] [ FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ] On 1/24/12, ashish nauriyal <anauriyal@gmail.com> wrote: > Thoughts by Bruce Momjian on Parallel execution in PostgreSQL... > > http://momjian.us/main/blogs/pgblog/2011.html#December_5_2011 > > You can give your thoughts on the blog itself.... > > Thanks, > Ashish Nauriyal >
On Wed, Jan 25, 2012 at 6:18 AM, sridhar bamandlapally <sridhar.bn1@gmail.com> wrote: > I just want to illustrate an idea may possible for bringing up > parallel process in PostgreSQL at SQL-Query level > > The PARALLEL option in Oracle really give great improvment in > performance, multi-thread concept has great possibilities > > In Oracle we have hints ( see below ) : > SELECT /*+PARALLEL( e, 2 )*/ e.* FROM EMP e ; > > PostgreSQL ( may if possible in future ) : > SELECT e.* FROM EMP PARALLEL ( e, 2) ; It makes little sense (and is contrary to pg policy of no hinting) to do it like that. In fact, I've been musing for a long time on leveraging pg's sophisticated planner to do the parallelization: * Synchroscan means whenever a table has to be scanned twice, it can be done with two threads. * Knowing whether a scan will hit mostly disk or memory can help in deciding whether to do them in parallel or not (memory can be parallelized, interleaved memory access isn't so bad, but interleaved disk access is disastrous) * Big sorts can be parallelized quite easily * Number of threads to use can be a tunable or automatically set to the number of processors on the system * Pipelining is another useful plan transformation: parallelize I/O-bound nodes with CPU-bound ones. I know squat about how to implement this, but I've been considering picking the low hanging fruit on that tree and patching up PG to try the concept. Many of the items above would require a thread-safe execution engine, which may be quite hard to get and have a significant performance hit. Some don't, like parallel sort. Also, it is necessary to notice that parallelization will create some priority inversion issues. Simple, non-parallelizable queries will suffer from resource starvation when contending against more complex, parallelizable ones.
Yes
"Hint method" is an alternative solution which does not appear to be
exclusive parallelism solution as it is included in comment block and have no error handling,
and this could be one of the reason against PG policy
"Parameter method" ( which we are thinking about ) can be very exclusive parallelism solution
with proper error handling as it is part of SQL-Query syntax
On Wed, Jan 25, 2012 at 7:13 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Wed, Jan 25, 2012 at 6:18 AM, sridhar bamandlapallyIt makes little sense (and is contrary to pg policy of no hinting) to
<sridhar.bn1@gmail.com> wrote:
> I just want to illustrate an idea may possible for bringing up
> parallel process in PostgreSQL at SQL-Query level
>
> The PARALLEL option in Oracle really give great improvment in
> performance, multi-thread concept has great possibilities
>
> In Oracle we have hints ( see below ) :
> SELECT /*+PARALLEL( e, 2 )*/ e.* FROM EMP e ;
>
> PostgreSQL ( may if possible in future ) :
> SELECT e.* FROM EMP PARALLEL ( e, 2) ;
do it like that.
In fact, I've been musing for a long time on leveraging pg's
sophisticated planner to do the parallelization:
* Synchroscan means whenever a table has to be scanned twice, it can
be done with two threads.
* Knowing whether a scan will hit mostly disk or memory can help in
deciding whether to do them in parallel or not (memory can be
parallelized, interleaved memory access isn't so bad, but interleaved
disk access is disastrous)
* Big sorts can be parallelized quite easily
* Number of threads to use can be a tunable or automatically set to
the number of processors on the system
* Pipelining is another useful plan transformation: parallelize
I/O-bound nodes with CPU-bound ones.
I know squat about how to implement this, but I've been considering
picking the low hanging fruit on that tree and patching up PG to try
the concept. Many of the items above would require a thread-safe
execution engine, which may be quite hard to get and have a
significant performance hit. Some don't, like parallel sort.
Also, it is necessary to notice that parallelization will create some
priority inversion issues. Simple, non-parallelizable queries will
suffer from resource starvation when contending against more complex,
parallelizable ones.
On Wed, Jan 25, 2012 at 7:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > I know squat about how to implement this, but I've been considering > picking the low hanging fruit on that tree and patching up PG to try > the concept. Many of the items above would require a thread-safe > execution engine, which may be quite hard to get and have a > significant performance hit. Some don't, like parallel sort. This was just discussed on -hackers yesterday -- see thread 'multithreaded query planner'. In short, judging by the comments of some of the smartest people working on this project, it sounds like using threads to attack this is not going to happen, ever. Note you can probably still get parallel execution in other ways, using processes, shared memory, etc, so I'd consider researching in that direction. merlin
On Wed, Jan 25, 2012 at 5:16 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Jan 25, 2012 at 7:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote: >> I know squat about how to implement this, but I've been considering >> picking the low hanging fruit on that tree and patching up PG to try >> the concept. Many of the items above would require a thread-safe >> execution engine, which may be quite hard to get and have a >> significant performance hit. Some don't, like parallel sort. > > This was just discussed on -hackers yesterday -- see thread > 'multithreaded query planner'. In short, judging by the comments of > some of the smartest people working on this project, it sounds like > using threads to attack this is not going to happen, ever. Note you > can probably still get parallel execution in other ways, using > processes, shared memory, etc, so I'd consider researching in that > direction. If you mean this[0] thread, it doesn't show anything conclusive against, say, parallel sort or pipelining. But I agree, checking the code, it would be really tough to get any more than parallel sorting by primitive types with threads. Processes, however, show promise. [0] http://archives.postgresql.org/pgsql-hackers/2012-01/msg00734.php
Hi ALL
Please have a look into this,
this may help us to think on PARALLEL option
WITHOUT PARALLEL Option
SQL> explain plan for select * from hr.emp ;
Explained.
PLAN
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 |
| 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 |
--------------------------------------------------------------------------
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 |
| 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 |
--------------------------------------------------------------------------
WITH PARALLEL Option
SQL> explain plan for select /*+parallel(emp,4)*/ * from hr.emp ;
Explained.
PLAN
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7444K| 944M| 4442 (3)| 00:00:54 |
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 7444K| 944M| 4442 (3)| 00:00:54 |
| 3 | PX BLOCK ITERATOR | | 7444K| 944M| 4442 (3)| 00:00:54 |
| 4 | TABLE ACCESS FULL| EMP | 7444K| 944M| 4442 (3)| 00:00:54 |
---------------------------------------------------------------------------------
In the above plan ( WITH PARALLEL Option )
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7444K| 944M| 4442 (3)| 00:00:54 |
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 7444K| 944M| 4442 (3)| 00:00:54 |
| 3 | PX BLOCK ITERATOR | | 7444K| 944M| 4442 (3)| 00:00:54 |
| 4 | TABLE ACCESS FULL| EMP | 7444K| 944M| 4442 (3)| 00:00:54 |
---------------------------------------------------------------------------------
In the above plan ( WITH PARALLEL Option )
1. "Cost" has been nearly reduced to 1/4th
2. "CPU" has been reduced
3. "Time" has been nearly reduced to 1/3rd
On Thu, Jan 26, 2012 at 2:24 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
If you mean this[0] thread, it doesn't show anything conclusiveOn Wed, Jan 25, 2012 at 5:16 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 7:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> I know squat about how to implement this, but I've been considering
>> picking the low hanging fruit on that tree and patching up PG to try
>> the concept. Many of the items above would require a thread-safe
>> execution engine, which may be quite hard to get and have a
>> significant performance hit. Some don't, like parallel sort.
>
> This was just discussed on -hackers yesterday -- see thread
> 'multithreaded query planner'. In short, judging by the comments of
> some of the smartest people working on this project, it sounds like
> using threads to attack this is not going to happen, ever. Note you
> can probably still get parallel execution in other ways, using
> processes, shared memory, etc, so I'd consider researching in that
> direction.
against, say, parallel sort or pipelining.
But I agree, checking the code, it would be really tough to get any
more than parallel sorting by primitive types with threads.
Processes, however, show promise.
[0] http://archives.postgresql.org/pgsql-hackers/2012-01/msg00734.php
On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally <sridhar.bn1@gmail.com> wrote: > -------------------------------------------------------------------------- > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | > -------------------------------------------------------------------------- > | 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 | > | 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 | > -------------------------------------------------------------------------- Sorry to take this off topic, but... Seriously, over 3 minutes to read 944 MB of data? That's less than 5 MB/s, what's wrong with your database? :) Regards, Marti
27.01.12 11:06, Marti Raudsepp написав(ла): > On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally > <sridhar.bn1@gmail.com> wrote: >> -------------------------------------------------------------------------- >> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | >> -------------------------------------------------------------------------- >> | 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 | >> | 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 | >> -------------------------------------------------------------------------- > Sorry to take this off topic, but... Seriously, over 3 minutes to read > 944 MB of data? That's less than 5 MB/s, what's wrong with your > database? :) Actually I'd ask how parallel CPU may help table sequence scan? Usually sequence scan does not take large amount of cpu time, so I see no point in parallelism.
On 27 Leden 2012, 10:06, Marti Raudsepp wrote: > On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally > <sridhar.bn1@gmail.com> wrote: >> -------------------------------------------------------------------------- >> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time >> | >> -------------------------------------------------------------------------- >> | 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 >> | >> | 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 >> | >> -------------------------------------------------------------------------- > > Sorry to take this off topic, but... Seriously, over 3 minutes to read > 944 MB of data? That's less than 5 MB/s, what's wrong with your > database? :) Yes, those results are quite suspicious. There's probably something interfering with the queries (other queries, different processes, block cleanout, ...) or maybe this is purely due to caching. sridhar, run the queries repeatedly and my quess is the difference will disappear (and the fist query will be a bit faster I guess). Tomas
sridhar bamandlapally, 27.01.2012 05:31: > SQL> explain plan for select * from hr.emp ; > Explained. > PLAN > -------------------------------------------------------------------------- > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | > -------------------------------------------------------------------------- > | 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 | > | 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 | > -------------------------------------------------------------------------- > *WITH PARALLEL Option* > SQL> explain plan for select /*+parallel(emp,4)*/ * from hr.emp ; > Explained. > PLAN > --------------------------------------------------------------------------------- > | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | > --------------------------------------------------------------------------------- > | 0 | SELECT STATEMENT | | 7444K| 944M| 4442 (3)| 00:00:54 | > | 1 | PX COORDINATOR | | | | | | > | 2 | PX SEND QC (RANDOM)| :TQ10000 | 7444K| 944M| 4442 (3)| 00:00:54 | > | 3 | PX BLOCK ITERATOR | | 7444K| 944M| 4442 (3)| 00:00:54 | > | 4 | TABLE ACCESS FULL| EMP | 7444K| 944M| 4442 (3)| 00:00:54 | > --------------------------------------------------------------------------------- > > In the above plan ( WITH PARALLEL Option ) > 1. "Cost" has been nearly reduced to 1/4th > 2. "CPU" has been reduced > 3. "Time" has been nearly reduced to 1/3rd I have *never* seen the "time" column in the explain plan output come anywhere near the actual execution time in Oracle.
For security reasons, I cannot put real-time senario into loop, the one which I gave is an example, if I have solution for this then same will be applied for real-time senario We have implemented PARALLEL technology into our database and same expecting after migration to PostgreSQL, The real-time SQL-Query is hiting 18000 times per day, and PARALLEL option gave us great performance and big window for all other process Concept is, we need window for every process on database and all together should fit in our window and time-line. We think PostgreSQL should also upgrade PARALLEL technology at SQL-Query level On 1/27/12, Thomas Kellerer <spam_eater@gmx.net> wrote: > sridhar bamandlapally, 27.01.2012 05:31: >> SQL> explain plan for select * from hr.emp ; >> Explained. >> PLAN >> -------------------------------------------------------------------------- >> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | >> -------------------------------------------------------------------------- >> | 0 | SELECT STATEMENT | | 7444K| 944M| 16077 (4)| 00:03:13 | >> | 1 | TABLE ACCESS FULL| EMP | 7444K| 944M| 16077 (4)| 00:03:13 | >> -------------------------------------------------------------------------- >> *WITH PARALLEL Option* >> SQL> explain plan for select /*+parallel(emp,4)*/ * from hr.emp ; >> Explained. >> PLAN >> --------------------------------------------------------------------------------- >> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| >> Time | >> --------------------------------------------------------------------------------- >> | 0 | SELECT STATEMENT | | 7444K| 944M| 4442 (3)| >> 00:00:54 | >> | 1 | PX COORDINATOR | | | | | >> | >> | 2 | PX SEND QC (RANDOM)| :TQ10000 | 7444K| 944M| 4442 (3)| >> 00:00:54 | >> | 3 | PX BLOCK ITERATOR | | 7444K| 944M| 4442 (3)| >> 00:00:54 | >> | 4 | TABLE ACCESS FULL| EMP | 7444K| 944M| 4442 (3)| >> 00:00:54 | >> --------------------------------------------------------------------------------- >> >> In the above plan ( WITH PARALLEL Option ) >> 1. "Cost" has been nearly reduced to 1/4th >> 2. "CPU" has been reduced >> 3. "Time" has been nearly reduced to 1/3rd > > I have *never* seen the "time" column in the explain plan output come > anywhere near the actual execution time in Oracle. > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On 27 Leden 2012, 13:16, sridhar bamandlapally wrote: > For security reasons, I cannot put real-time senario into loop, The point of Marti's comment was that the estimates (as presented by EXPLAIN PLAN FOR in Oracle) are inherently imprecise. Don't trust what Oracle is telling you about the expected runtime of the queries. Run the queries repeatedly to see. > the one which I gave is an example, if I have solution for this then > same will be applied for real-time senario There's no way to execute a single query in a parallel manner and it won't be available anytime soon. This is not an issue unless you have a CPU bound query and you have unused CPUs. That's not the case of your example, because the sequential scan is likely to be I/O bound, thus executing it in parallel won't fix the issue. > We have implemented PARALLEL technology into our database and same > expecting after migration to PostgreSQL, Why? Have you tried to run the query on PostgreSQL? > The real-time SQL-Query is hiting 18000 times per day, and PARALLEL > option gave us great performance and big window for all other process Are we still discussing the example you've posted? Because this 18k hits per day means running the query every 5 seconds. And if the query takes more than a few seconds, there will be multiple queries running concurrently, thus eating CPUs. > Concept is, we need window for every process on database and all > together should fit in our window and time-line. Not sure what you mean by window or time-line? > We think PostgreSQL should also upgrade PARALLEL technology at SQL-Query > level That is currently discussed in other threads, but it won't happen any time soon (a few years in the future, maybe). Tomas
>> the one which I gave is an example, if I have solution for this then >> same will be applied for real-time senario > > There's no way to execute a single query in a parallel manner and it won't > be available anytime soon. > > This is not an issue unless you have a CPU bound query and you have unused > CPUs. That's not the case of your example, because the sequential scan is > likely to be I/O bound, thus executing it in parallel won't fix the issue. it is possible to emulate with plproxy for example. > >> We have implemented PARALLEL technology into our database and same >> expecting after migration to PostgreSQL, > > Why? Have you tried to run the query on PostgreSQL? premature optimization ... > >> The real-time SQL-Query is hiting 18000 times per day, and PARALLEL >> option gave us great performance and big window for all other process > > Are we still discussing the example you've posted? Because this 18k hits > per day means running the query every 5 seconds. And if the query takes > more than a few seconds, there will be multiple queries running > concurrently, thus eating CPUs. agreed. > >> Concept is, we need window for every process on database and all >> together should fit in our window and time-line. > > Not sure what you mean by window or time-line? > >> We think PostgreSQL should also upgrade PARALLEL technology at SQL-Query >> level > > That is currently discussed in other threads, but it won't happen any time > soon (a few years in the future, maybe). at the SQL level, I don't see the immediate benefit given that the feature is not implemented: SQL level stuff (planner hint) are here to workaround what the server can not handle on its own. And PostgreSQL policiy is not to allow planner hint, but to fix/improve the server. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation