Обсуждение: Re: PostgreSQL Parallel Processing !

Поиск
Список
Период
Сортировка

Re: PostgreSQL Parallel Processing !

От
sridhar bamandlapally
Дата:
Hi Everyone

I just want to illustrate an idea may possible for bringing up
parallel process in PostgreSQL at SQL-Query level

The PARALLEL option in Oracle really give great improvment in
performance, multi-thread concept has great possibilities

In Oracle we have hints ( see below ) :
SELECT /*+PARALLEL( e, 2 )*/ e.* FROM EMP e ;

PostgreSQL ( may if possible in future ) :
SELECT e.* FROM EMP PARALLEL ( e, 2) ;


*Note: The below syntax does not work with any PostgreSQL versions
PostgreSQL Syntax for SELECT ( with PARALLEL )

[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
    * | expression [ [ AS ] output_name ] [, ...]
    [ FROM from_item [, ...] ]
    [ WHERE condition ]
    [ GROUP BY expression [, ...] ]
    [ HAVING condition [, ...] ]
    [ WINDOW window_name AS ( window_definition ) [, ...] ]
    [ PARALLEL (<alias> | <table> | <index> |<segment> , < no. of threads> ) ]
    [ { UNION | INTERSECT | EXCEPT } [ ALL ] select ]
    [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS {
FIRST | LAST } ] [, ...] ]
    [ LIMIT { count | ALL } ]
    [ OFFSET start [ ROW | ROWS ] ]
    [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
    [ FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ]




On 1/24/12, ashish nauriyal <anauriyal@gmail.com> wrote:
> Thoughts by Bruce Momjian on Parallel execution in PostgreSQL...
>
> http://momjian.us/main/blogs/pgblog/2011.html#December_5_2011
>
> You can give your thoughts on the blog itself....
>
> Thanks,
> Ashish Nauriyal
>

Re: PostgreSQL Parallel Processing !

От
Claudio Freire
Дата:
On Wed, Jan 25, 2012 at 6:18 AM, sridhar bamandlapally
<sridhar.bn1@gmail.com> wrote:
> I just want to illustrate an idea may possible for bringing up
> parallel process in PostgreSQL at SQL-Query level
>
> The PARALLEL option in Oracle really give great improvment in
> performance, multi-thread concept has great possibilities
>
> In Oracle we have hints ( see below ) :
> SELECT /*+PARALLEL( e, 2 )*/ e.* FROM EMP e ;
>
> PostgreSQL ( may if possible in future ) :
> SELECT e.* FROM EMP PARALLEL ( e, 2) ;

It makes little sense (and is contrary to pg policy of no hinting) to
do it like that.

In fact, I've been musing for a long time on leveraging pg's
sophisticated planner to do the parallelization:
 * Synchroscan means whenever a table has to be scanned twice, it can
be done with two threads.
 * Knowing whether a scan will hit mostly disk or memory can help in
deciding whether to do them in parallel or not (memory can be
parallelized, interleaved memory access isn't so bad, but interleaved
disk access is disastrous)
 * Big sorts can be parallelized quite easily
 * Number of threads to use can be a tunable or automatically set to
the number of processors on the system
 * Pipelining is another useful plan transformation: parallelize
I/O-bound nodes with CPU-bound ones.

I know squat about how to implement this, but I've been considering
picking the low hanging fruit on that tree and patching up PG to try
the concept. Many of the items above would require a thread-safe
execution engine, which may be quite hard to get and have a
significant performance hit. Some don't, like parallel sort.

Also, it is necessary to notice that parallelization will create some
priority inversion issues. Simple, non-parallelizable queries will
suffer from resource starvation when contending against more complex,
parallelizable ones.

Re: PostgreSQL Parallel Processing !

От
sridhar bamandlapally
Дата:
Yes
 
"Hint method" is an alternative solution which does not appear to be
exclusive parallelism solution as it is included in comment block and have no error handling,
and this could be one of the reason against PG policy
 
"Parameter method" ( which we are thinking about ) can be very exclusive parallelism solution
with proper error handling as it is part of SQL-Query syntax

On Wed, Jan 25, 2012 at 7:13 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Wed, Jan 25, 2012 at 6:18 AM, sridhar bamandlapally
<sridhar.bn1@gmail.com> wrote:
> I just want to illustrate an idea may possible for bringing up
> parallel process in PostgreSQL at SQL-Query level
>
> The PARALLEL option in Oracle really give great improvment in
> performance, multi-thread concept has great possibilities
>
> In Oracle we have hints ( see below ) :
> SELECT /*+PARALLEL( e, 2 )*/ e.* FROM EMP e ;
>
> PostgreSQL ( may if possible in future ) :
> SELECT e.* FROM EMP PARALLEL ( e, 2) ;

It makes little sense (and is contrary to pg policy of no hinting) to
do it like that.

In fact, I've been musing for a long time on leveraging pg's
sophisticated planner to do the parallelization:
 * Synchroscan means whenever a table has to be scanned twice, it can
be done with two threads.
 * Knowing whether a scan will hit mostly disk or memory can help in
deciding whether to do them in parallel or not (memory can be
parallelized, interleaved memory access isn't so bad, but interleaved
disk access is disastrous)
 * Big sorts can be parallelized quite easily
 * Number of threads to use can be a tunable or automatically set to
the number of processors on the system
 * Pipelining is another useful plan transformation: parallelize
I/O-bound nodes with CPU-bound ones.

I know squat about how to implement this, but I've been considering
picking the low hanging fruit on that tree and patching up PG to try
the concept. Many of the items above would require a thread-safe
execution engine, which may be quite hard to get and have a
significant performance hit. Some don't, like parallel sort.

Also, it is necessary to notice that parallelization will create some
priority inversion issues. Simple, non-parallelizable queries will
suffer from resource starvation when contending against more complex,
parallelizable ones.

Re: PostgreSQL Parallel Processing !

От
Merlin Moncure
Дата:
On Wed, Jan 25, 2012 at 7:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> I know squat about how to implement this, but I've been considering
> picking the low hanging fruit on that tree and patching up PG to try
> the concept. Many of the items above would require a thread-safe
> execution engine, which may be quite hard to get and have a
> significant performance hit. Some don't, like parallel sort.

This was just discussed on -hackers yesterday -- see thread
'multithreaded query planner'.  In short, judging by the comments of
some of the smartest people working on this project, it sounds like
using threads to attack this is not going to happen, ever.  Note you
can probably still get parallel execution in other ways, using
processes, shared memory, etc, so I'd consider researching in that
direction.

merlin

Re: PostgreSQL Parallel Processing !

От
Claudio Freire
Дата:
On Wed, Jan 25, 2012 at 5:16 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 7:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> I know squat about how to implement this, but I've been considering
>> picking the low hanging fruit on that tree and patching up PG to try
>> the concept. Many of the items above would require a thread-safe
>> execution engine, which may be quite hard to get and have a
>> significant performance hit. Some don't, like parallel sort.
>
> This was just discussed on -hackers yesterday -- see thread
> 'multithreaded query planner'.  In short, judging by the comments of
> some of the smartest people working on this project, it sounds like
> using threads to attack this is not going to happen, ever.  Note you
> can probably still get parallel execution in other ways, using
> processes, shared memory, etc, so I'd consider researching in that
> direction.

If you mean this[0] thread, it doesn't show anything conclusive
against, say, parallel sort or pipelining.

But I agree, checking the code, it would be really tough to get any
more than parallel sorting by primitive types with threads.

Processes, however, show promise.

[0] http://archives.postgresql.org/pgsql-hackers/2012-01/msg00734.php

Re: PostgreSQL Parallel Processing !

От
sridhar bamandlapally
Дата:
Hi ALL
 
Please have a look into this,
this may help us to think on PARALLEL option
 
WITHOUT PARALLEL Option
SQL> explain plan for select * from hr.emp ;
Explained.
PLAN
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  7444K|   944M| 16077   (4)| 00:03:13 |
|   1 |  TABLE ACCESS FULL| EMP  |  7444K|   944M| 16077   (4)| 00:03:13 |
--------------------------------------------------------------------------
 
 
WITH PARALLEL Option
SQL> explain plan for select /*+parallel(emp,4)*/ * from hr.emp ;
Explained.
PLAN
---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |  7444K|   944M|  4442   (3)| 00:00:54 |
|   1 |  PX COORDINATOR      |          |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |  7444K|   944M|  4442   (3)| 00:00:54 |
|   3 |    PX BLOCK ITERATOR |          |  7444K|   944M|  4442   (3)| 00:00:54 |
|   4 |     TABLE ACCESS FULL| EMP      |  7444K|   944M|  4442   (3)| 00:00:54 |
---------------------------------------------------------------------------------

In the above plan ( WITH PARALLEL Option )
1. "Cost" has been nearly reduced to 1/4th
2. "CPU" has been reduced
3. "Time" has been nearly reduced to 1/3rd
 
 

 
On Thu, Jan 26, 2012 at 2:24 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Wed, Jan 25, 2012 at 5:16 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 7:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> I know squat about how to implement this, but I've been considering
>> picking the low hanging fruit on that tree and patching up PG to try
>> the concept. Many of the items above would require a thread-safe
>> execution engine, which may be quite hard to get and have a
>> significant performance hit. Some don't, like parallel sort.
>
> This was just discussed on -hackers yesterday -- see thread
> 'multithreaded query planner'.  In short, judging by the comments of
> some of the smartest people working on this project, it sounds like
> using threads to attack this is not going to happen, ever.  Note you
> can probably still get parallel execution in other ways, using
> processes, shared memory, etc, so I'd consider researching in that
> direction.

If you mean this[0] thread, it doesn't show anything conclusive
against, say, parallel sort or pipelining.

But I agree, checking the code, it would be really tough to get any
more than parallel sorting by primitive types with threads.

Processes, however, show promise.

[0] http://archives.postgresql.org/pgsql-hackers/2012-01/msg00734.php

Re: PostgreSQL Parallel Processing !

От
Marti Raudsepp
Дата:
On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally
<sridhar.bn1@gmail.com> wrote:
> --------------------------------------------------------------------------
> | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
> --------------------------------------------------------------------------
> |   0 | SELECT STATEMENT  |      |  7444K|   944M| 16077   (4)| 00:03:13 |
> |   1 |  TABLE ACCESS FULL| EMP  |  7444K|   944M| 16077   (4)| 00:03:13 |
> --------------------------------------------------------------------------

Sorry to take this off topic, but... Seriously, over 3 minutes to read
944 MB of data? That's less than 5 MB/s, what's wrong with your
database? :)

Regards,
Marti

Re: PostgreSQL Parallel Processing !

От
Vitalii Tymchyshyn
Дата:
27.01.12 11:06, Marti Raudsepp написав(ла):
> On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally
> <sridhar.bn1@gmail.com>  wrote:
>> --------------------------------------------------------------------------
>> | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
>> --------------------------------------------------------------------------
>> |   0 | SELECT STATEMENT  |      |  7444K|   944M| 16077   (4)| 00:03:13 |
>> |   1 |  TABLE ACCESS FULL| EMP  |  7444K|   944M| 16077   (4)| 00:03:13 |
>> --------------------------------------------------------------------------
> Sorry to take this off topic, but... Seriously, over 3 minutes to read
> 944 MB of data? That's less than 5 MB/s, what's wrong with your
> database? :)
Actually I'd ask how parallel CPU may help table sequence scan? Usually
sequence scan does not take large amount of cpu time, so I see no point
in parallelism.

Re: PostgreSQL Parallel Processing !

От
"Tomas Vondra"
Дата:
On 27 Leden 2012, 10:06, Marti Raudsepp wrote:
> On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally
> <sridhar.bn1@gmail.com> wrote:
>> --------------------------------------------------------------------------
>> | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time    
>> |
>> --------------------------------------------------------------------------
>> |   0 | SELECT STATEMENT  |      |  7444K|   944M| 16077   (4)| 00:03:13
>> |
>> |   1 |  TABLE ACCESS FULL| EMP  |  7444K|   944M| 16077   (4)| 00:03:13
>> |
>> --------------------------------------------------------------------------
>
> Sorry to take this off topic, but... Seriously, over 3 minutes to read
> 944 MB of data? That's less than 5 MB/s, what's wrong with your
> database? :)

Yes, those results are quite suspicious. There's probably something
interfering with the queries (other queries, different processes, block
cleanout, ...) or maybe this is purely due to caching.

sridhar, run the queries repeatedly and my quess is the difference will
disappear (and the fist query will be a bit faster I guess).

Tomas


Re: PostgreSQL Parallel Processing !

От
Thomas Kellerer
Дата:
sridhar bamandlapally, 27.01.2012 05:31:
> SQL> explain plan for select * from hr.emp ;
> Explained.
> PLAN
> --------------------------------------------------------------------------
> | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
> --------------------------------------------------------------------------
> |   0 | SELECT STATEMENT  |      |  7444K|   944M| 16077   (4)| 00:03:13 |
> |   1 |  TABLE ACCESS FULL| EMP  |  7444K|   944M| 16077   (4)| 00:03:13 |
> --------------------------------------------------------------------------
> *WITH PARALLEL Option*
> SQL> explain plan for select /*+parallel(emp,4)*/ * from hr.emp ;
> Explained.
> PLAN
> ---------------------------------------------------------------------------------
> | Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
> ---------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT     |          |  7444K|   944M|  4442   (3)| 00:00:54 |
> |   1 |  PX COORDINATOR      |          |       |       |            |          |
> |   2 |   PX SEND QC (RANDOM)| :TQ10000 |  7444K|   944M|  4442   (3)| 00:00:54 |
> |   3 |    PX BLOCK ITERATOR |          |  7444K|   944M|  4442   (3)| 00:00:54 |
> |   4 |     TABLE ACCESS FULL| EMP      |  7444K|   944M|  4442   (3)| 00:00:54 |
> ---------------------------------------------------------------------------------
>
> In the above plan ( WITH PARALLEL Option )
> 1. "Cost" has been nearly reduced to 1/4th
> 2. "CPU" has been reduced
> 3. "Time" has been nearly reduced to 1/3rd

I have *never* seen the "time" column in the explain plan output come anywhere near the actual execution time in
Oracle.



Re: PostgreSQL Parallel Processing !

От
sridhar bamandlapally
Дата:
For security reasons, I cannot put real-time senario into loop,

the one which I gave is an example, if I have solution for this then
same will be applied for real-time senario

We have implemented PARALLEL technology into our database and same
expecting after migration to PostgreSQL,

The real-time SQL-Query is hiting 18000 times per day, and PARALLEL
option gave us great performance and big window for all other process

Concept is, we need window for every process on database and all
together should fit in our window and time-line.

We think PostgreSQL should also upgrade PARALLEL technology at SQL-Query level





On 1/27/12, Thomas Kellerer <spam_eater@gmx.net> wrote:
> sridhar bamandlapally, 27.01.2012 05:31:
>> SQL> explain plan for select * from hr.emp ;
>> Explained.
>> PLAN
>> --------------------------------------------------------------------------
>> | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
>> --------------------------------------------------------------------------
>> |   0 | SELECT STATEMENT  |      |  7444K|   944M| 16077   (4)| 00:03:13 |
>> |   1 |  TABLE ACCESS FULL| EMP  |  7444K|   944M| 16077   (4)| 00:03:13 |
>> --------------------------------------------------------------------------
>> *WITH PARALLEL Option*
>> SQL> explain plan for select /*+parallel(emp,4)*/ * from hr.emp ;
>> Explained.
>> PLAN
>> ---------------------------------------------------------------------------------
>> | Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)|
>> Time     |
>> ---------------------------------------------------------------------------------
>> |   0 | SELECT STATEMENT     |          |  7444K|   944M|  4442   (3)|
>> 00:00:54 |
>> |   1 |  PX COORDINATOR      |          |       |       |            |
>>      |
>> |   2 |   PX SEND QC (RANDOM)| :TQ10000 |  7444K|   944M|  4442   (3)|
>> 00:00:54 |
>> |   3 |    PX BLOCK ITERATOR |          |  7444K|   944M|  4442   (3)|
>> 00:00:54 |
>> |   4 |     TABLE ACCESS FULL| EMP      |  7444K|   944M|  4442   (3)|
>> 00:00:54 |
>> ---------------------------------------------------------------------------------
>>
>> In the above plan ( WITH PARALLEL Option )
>> 1. "Cost" has been nearly reduced to 1/4th
>> 2. "CPU" has been reduced
>> 3. "Time" has been nearly reduced to 1/3rd
>
> I have *never* seen the "time" column in the explain plan output come
> anywhere near the actual execution time in Oracle.
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: PostgreSQL Parallel Processing !

От
"Tomas Vondra"
Дата:
On 27 Leden 2012, 13:16, sridhar bamandlapally wrote:
> For security reasons, I cannot put real-time senario into loop,

The point of Marti's comment was that the estimates (as presented by
EXPLAIN PLAN FOR in Oracle) are inherently imprecise. Don't trust what
Oracle is telling you about the expected runtime of the queries. Run the
queries repeatedly to see.

> the one which I gave is an example, if I have solution for this then
> same will be applied for real-time senario

There's no way to execute a single query in a parallel manner and it won't
be available anytime soon.

This is not an issue unless you have a CPU bound query and you have unused
CPUs. That's not the case of your example, because the sequential scan is
likely to be I/O bound, thus executing it in parallel won't fix the issue.

> We have implemented PARALLEL technology into our database and same
> expecting after migration to PostgreSQL,

Why? Have you tried to run the query on PostgreSQL?

> The real-time SQL-Query is hiting 18000 times per day, and PARALLEL
> option gave us great performance and big window for all other process

Are we still discussing the example you've posted? Because this 18k hits
per day means running the query every 5 seconds. And if the query takes
more than a few seconds, there will be multiple queries running
concurrently, thus eating CPUs.

> Concept is, we need window for every process on database and all
> together should fit in our window and time-line.

Not sure what you mean by window or time-line?

> We think PostgreSQL should also upgrade PARALLEL technology at SQL-Query
> level

That is currently discussed in other threads, but it won't happen any time
soon (a few years in the future, maybe).

Tomas


Re: PostgreSQL Parallel Processing !

От
Cédric Villemain
Дата:
>> the one which I gave is an example, if I have solution for this then
>> same will be applied for real-time senario
>
> There's no way to execute a single query in a parallel manner and it won't
> be available anytime soon.
>
> This is not an issue unless you have a CPU bound query and you have unused
> CPUs. That's not the case of your example, because the sequential scan is
> likely to be I/O bound, thus executing it in parallel won't fix the issue.

it is possible to emulate with plproxy for example.

>
>> We have implemented PARALLEL technology into our database and same
>> expecting after migration to PostgreSQL,
>
> Why? Have you tried to run the query on PostgreSQL?

premature optimization ...

>
>> The real-time SQL-Query is hiting 18000 times per day, and PARALLEL
>> option gave us great performance and big window for all other process
>
> Are we still discussing the example you've posted? Because this 18k hits
> per day means running the query every 5 seconds. And if the query takes
> more than a few seconds, there will be multiple queries running
> concurrently, thus eating CPUs.

agreed.

>
>> Concept is, we need window for every process on database and all
>> together should fit in our window and time-line.
>
> Not sure what you mean by window or time-line?
>
>> We think PostgreSQL should also upgrade PARALLEL technology at SQL-Query
>> level
>
> That is currently discussed in other threads, but it won't happen any time
> soon (a few years in the future, maybe).

at the SQL level, I don't see the immediate benefit given that the
feature is not implemented: SQL level stuff  (planner hint) are here
to workaround what the server can not handle on its own. And
PostgreSQL policiy is not to allow planner hint, but to fix/improve
the server.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation