Обсуждение: Utilizing multiple cores for one query
I wonder whether the current versions of postgres (i.e. either 8.2 or 8.3) are able to utilize multiple cores for the execution of a single query?
This is one thing that systems like SQL Server and Oracle have been able to do for quite some time. I haven't seen much in the documentation that hints that this may be possible in PG, nor did I find much in the mailinglists about this. The only thing I found was a topic that discussed some patches that may eventually lead to a sequence scan being handled by multiple cores.
Could someone shed some light on the current or future abilities of PG for making use of multiple cores to execute a single query?
Thanks in advance
This is one thing that systems like SQL Server and Oracle have been able to do for quite some time. I haven't seen much in the documentation that hints that this may be possible in PG, nor did I find much in the mailinglists about this. The only thing I found was a topic that discussed some patches that may eventually lead to a sequence scan being handled by multiple cores.
Could someone shed some light on the current or future abilities of PG for making use of multiple cores to execute a single query?
Thanks in advance
Express yourself instantly with MSN Messenger! MSN Messenger
On Dec 1, 2007 8:21 AM, henk de wit <henk53602@hotmail.com> wrote: > I wonder whether the current versions of postgres (i.e. either 8.2 or 8.3) > are able to utilize multiple cores for the execution of a single query? Nope. > This is one thing that systems like SQL Server and Oracle have been able to > do for quite some time. I haven't seen much in the documentation that hints > that this may be possible in PG, nor did I find much in the mailinglists > about this. The only thing I found was a topic that discussed some patches > that may eventually lead to a sequence scan being handled by multiple cores. I believe the threads you're talking about were related to scanning, not parallel query. Though, when Qingqing and I were discussing parallel query a little over a year ago, I do seem to recall several uninformed opinions stating that sequential scans were the only thing it could be useful for. > Could someone shed some light on the current or future abilities of PG for > making use of multiple cores to execute a single query? Currently, the only way to parallelize a query in Postgres is to use pgpool-II. http://pgpool.projects.postgresql.org/ -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
> > I wonder whether the current versions of postgres (i.e. either 8.2 or 8.3)
> > are able to utilize multiple cores for the execution of a single query?> Nope.
I see, thanks for the clarification.
Btw, in this thread: http://archives.postgresql.org/pgsql-performance/2007-10/msg00159.php
the following is said:
>You can determine what runs in parellel based on the
>indentation of the output.
>Items at the same indentation level under the same
>"parent" line will run in parallel
Wouldn't this offer some opportunities for running things on multiple cores? Based on the above, many people already seem to think that PG is able to utilize multiple cores for 1 query. Of course, it can be easily "proved" that this does not happen by simply watching at the CPU utilization graphs when executing a query. Nevertheless, those people may wonder why (some of) those items that already run in parallel not actually run in parallel using multiple cores?
> Currently, the only way to parallelize a query in Postgres is to use pgpool-II.
>
> http://pgpool.projects.postgresql.org/
Yes, I noticed this project before. At the time it was not really clear how stable and/or how well supported this is. It indeed seems to support parallel queries automatically by being able to rewrite standard queries. It does seem it needs different DB nodes and is thus probably not able to use multiple cores of a single DBMS. Also, I could not really find how well pgpool-II is doing at making judgments of the level of parallelization it's going to use. E.g. when there are 16 nodes in the system with a currently low utilization, a single query may be split into 16 pieces. On the other hand, when 8 of these nodes are heavily utilized, splitting to 8 pieces might be better. etc.
Anyway, are there any plans for postgresql to support parallelizing queries natively?
Express yourself instantly with MSN Messenger! MSN Messenger
On Dec 1, 2007 9:42 AM, henk de wit <henk53602@hotmail.com> wrote: > Wouldn't this offer some opportunities for running things on multiple cores? No, it's not actually parallel in the same sense. > Yes, I noticed this project before. At the time it was not really clear how > stable and/or how well supported this is. It indeed seems to support > parallel queries automatically by being able to rewrite standard queries. It > does seem it needs different DB nodes and is thus probably not able to use > multiple cores of a single DBMS. I've seen it actually set up to use multiple connections to the same DBMS. How well it would work is pretty much dependent on your application and the amount of parallelization you could actually gain. > Also, I could not really find how well > pgpool-II is doing at making judgments of the level of parallelization it's > going to use. E.g. when there are 16 nodes in the system with a currently > low utilization, a single query may be split into 16 pieces. On the other > hand, when 8 of these nodes are heavily utilized, splitting to 8 pieces > might be better. etc. IIRC, it doesn't plan parallelization that way. It looks at what is partitioned (by default) on different nodes and parallelizes based on that. As I said earlier, you can partition a single node and put pgpool-II on top of it to gain some parallelization. Unfortunately, it isn't capable of handling things like parallel index builds or other useful maintenance features... but it can do fairly good query result parallelization. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 henk de wit wrote: >> > I wonder whether the current versions of postgres (i.e. either 8.2 > or 8.3) >> > are able to utilize multiple cores for the execution of a single query? >> Nope. > > I see, thanks for the clarification. > > Btw, in this thread: > http://archives.postgresql.org/pgsql-performance/2007-10/msg00159.php > > the following is said: > >>You can determine what runs in parellel based on the >>indentation of the output. >>Items at the same indentation level under the same >>"parent" line will run in parallel > > Wouldn't this offer some opportunities for running things on multiple > cores? Based on the above, many people already seem to think that PG is > able to utilize multiple cores for 1 query. Of course, it depends on just what you mean. Since postgresql is a client-server system, the client can run on one processor and the server on another. And that _is_ parallelism in a way. For me in one application, my client uses about 20% of a processor and the server uses around 80%. But in more detail, VIRT RES SHR SWAP %MEM %CPU TIME+ P COMMAND 2019m 94m 93m 1.9g 1.2 79 2:29.97 3 postgres: jdbeyer stock [local] INSERT 2019m 813m 813m 1.2g 10.2 2 23:38.67 0 postgres: writer process 2018m 29m 29m 1.9g 0.4 0 4:07.59 3 /usr/bin/postmaster -p 5432 -D ... 8624 652 264 7972 0.0 0 0:00.10 2 postgres: logger process 9624 1596 204 8028 0.0 0 0:01.07 2 postgres: stats buffer process 8892 840 280 8052 0.0 0 0:00.74 1 postgres: stats collector process 6608 2320 1980 4288 0.0 22 1:56.27 0 /home/jdbeyer/bin/enter The P column shows the processor the process last ran on. In this case, I might get away with using one processor, it is clearly using all four. Now this is not processing a single query on multiple cores (in this case, the "query" is running on core #3 only), but the ancillary stuff is running on multiple cores and some of it should be charged to the query. And the OS kernel takes time for IO and stuff as well. > Of course, it can be easily > "proved" that this does not happen by simply watching at the CPU > utilization graphs when executing a query. Nevertheless, those people > may wonder why (some of) those items that already run in parallel not > actually run in parallel using multiple cores? > > - -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 11:40:01 up 1 day, 2:02, 5 users, load average: 4.15, 4.14, 4.15 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org iD8DBQFHUZP/Ptu2XpovyZoRAn2BAKDLCyDrRiSo40u15M5GwY4OkxGlngCfbNHI 7hjIcP1ozr+KYPr43Pck9TA= =Fawa -----END PGP SIGNATURE-----
On Sat, 1 Dec 2007, Jonah H. Harris wrote: > I believe the threads you're talking about were related to scanning, > not parallel query. Though, when Qingqing and I were discussing > parallel query a little over a year ago, I do seem to recall several > uninformed opinions stating that sequential scans were the only thing > it could be useful for. I would imagine sorting a huge set of results would benefit from multi-threading, because it can be split up into separate tasks. Heck, Postgres *already* splits sorting up into multiple chunks when the results to sort are bigger than fit in memory. This would benefit a lot of multi-table joins, because being able to sort a table faster would enable merge joins to be used at lower cost. That's particularly valuable when you're doing a large summary multi-table join that uses most of the database contents. Matthew -- Beware of bugs in the above code; I have only proved it correct, not tried it. --Donald Knuth
On 12/1/07, Jonah H. Harris <jonah.harris@gmail.com> wrote: > Currently, the only way to parallelize a query in Postgres is to use pgpool-II. FYI: plproxy issues queries for several nodes in parallel too. -- marko