Обсуждение: Do parallel queries work with only dblink not with fdw?
In order to improve cpu and disk utilization, I am testing parallel queries.
The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference query was reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significant improvement.
The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference query was reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significant improvement.
My approach is fairly easy:
db_link_send_query('conn1', 'statement based on partitioning field');
db_link_send_query('conn2', 'statement based on partitioning field');
...
db_link_send_query('conn2', 'statement based on partitioning field');
...
SELECT
dblink_get_result('conn1')
dblink_get_result('conn1')
UNION ALL
dblink_get_result('conn2')
...
dblink_get_result('conn2')
...
SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
...
On 05/04/2016 01:28 AM, Klaus P. wrote:
> In order to improve cpu and disk utilization, I am testing parallel
> queries.
>
> The approach with dblink_send_query() and dblink_get_result() works in
> my proof-of-concept. Runtime of my reference query was reduced from 55
> seconds to ~20seconds using 4 parallel connections. Not what I had hoped
> but certainly a significant improvement.
>
> My approach is fairly easy:
>
> db_link_send_query('conn1', 'statement based on partitioning field');
> db_link_send_query('conn2', 'statement based on partitioning field');
>
> ...
>
> SELECT
> dblink_get_result('conn1')
> UNION ALL
> dblink_get_result('conn2')
>
> ...
>
>
> However, using fdw foreign data wrappers, I was not able to run any
> query in parallel. I came across this presentation
> http://de.slideshare.net/EnterpriseDB/foreign-data-wrappers-and-you-with-postgres
> where it says on page 12 "parallel FDW access". Is there any example
> available on how to run FDW queries in parallel?
>
> My approach with FDW is similar to the dblink example above:
>
> SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
> UNION ALL
> SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
> ...
>
> My experience is however that the statements are carried out sequentially.
I am not sure of the status of parallel query in FDW, but for those that
are some more information would be helpful:
1) You are using postgres_fdw, correct?
2) What version of the FDW are you using?
3) What version(s) of Postgres are you connecting from/to?
>
> Thanks
>
> Klaus Pieper
--
Adrian Klaver
adrian.klaver@aklaver.com
> -----Ursprüngliche Nachricht----- > Von: Adrian Klaver [mailto:adrian.klaver@aklaver.com] > > I am not sure of the status of parallel query in FDW, but for those that are some > more information would be helpful: > > 1) You are using postgres_fdw, correct? Yes. > > 2) What version of the FDW are you using? Not sure if FDW could be different from the cluster version (see below). Postgres_fdw.dll shows file version 9.5.2.16088- this is the one installed with the Windows PostgreSQL package. > > 3) What version(s) of Postgres are you connecting from/to? For testing I have installed a fresh download 9.5.2 Windows x64 (on Windows Server 2008 R2, if this is of interest) There is only one test database running in this cluster. I am connecting in a "loopback" mode to that database. I could repeat my tests with two different databases or run it on a Linux / Debian machine if that makes a difference. Thanks Klaus
>From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Klaus P.
>Sent: Mittwoch, 4. Mai 2016 10:28
>To: pgsql-general@postgresql.org
>Subject: [GENERAL] Do parallel queries work with only dblink not with fdw?
>
>In order to improve cpu and disk utilization, I am testing parallel queries.
>
>The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference
querywas reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a
significantimprovement.
>My approach is fairly easy:
>db_link_send_query('conn1', 'statement based on partitioning field');
>db_link_send_query('conn2', 'statement based on partitioning field');
>
>...
>SELECT
>dblink_get_result('conn1')
>UNION ALL
>dblink_get_result('conn2')
>
>...
Hello,
I'm using the db_link approach too, but rather than using dblink_get_result, I store the result in an additional
table.
This was faster, at least for my use case and at the time when I implemented that solution...
something like
db_link_send_query('conn0', 'CREATE UNLOGGED TABLE my_result...
db_link_send_query('conn1', 'INSERT INTO my_result statement based on partitioning field');
db_link_send_query('conn2', 'INSERT INTO my_result statement based on partitioning field');
...
select * from my_result
regards,
Marc Mamin
Hi, Sorry for appending to that thread, but I think this is related: Does anyone have experience with parsel [1] and/or it's extension parallelsql [2]? :Stefan [1] http://geeohspatial.blogspot.ch/2013/12/a-simple-function-for-parallel-queries_18.html [2] https://github.com/k1aus/parallelsql 2016-05-04 16:10 GMT+02:00 Marc Mamin <M.Mamin@intershop.de>: >>From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Klaus P. >>Sent: Mittwoch, 4. Mai 2016 10:28 >>To: pgsql-general@postgresql.org >>Subject: [GENERAL] Do parallel queries work with only dblink not with fdw? >> >>In order to improve cpu and disk utilization, I am testing parallel queries. >> >>The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference querywas reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significantimprovement. >>My approach is fairly easy: >>db_link_send_query('conn1', 'statement based on partitioning field'); >>db_link_send_query('conn2', 'statement based on partitioning field'); >> >>... >>SELECT >>dblink_get_result('conn1') >>UNION ALL >>dblink_get_result('conn2') >> >>... > > > Hello, > > I'm using the db_link approach too, but rather than using dblink_get_result, I store the result in an additional table. > This was faster, at least for my use case and at the time when I implemented that solution... > > something like > > db_link_send_query('conn0', 'CREATE UNLOGGED TABLE my_result... > db_link_send_query('conn1', 'INSERT INTO my_result statement based on partitioning field'); > db_link_send_query('conn2', 'INSERT INTO my_result statement based on partitioning field'); > ... > select * from my_result > > regards, > > Marc Mamin > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general