Re: Question about optimising (Postgres_)FDW

Поиск
Список
Период
Сортировка
От Etsuro Fujita
Тема Re: Question about optimising (Postgres_)FDW
Дата
Msg-id 534E6B0C.6080403@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Question about optimising (Postgres_)FDW  (Hannu Krosing <hannu@2ndQuadrant.com>)
Ответы Re: Question about optimising (Postgres_)FDW  (Hannu Krosing <hannu@krosing.net>)
Список pgsql-hackers
(2014/04/16 6:55), Hannu Krosing wrote:
> ----------------------------------
> CREATE EXTENSION postgres_fdw;
>
> CREATE SERVER loop foreign data wrapper postgres_fdw
>    OPTIONS (port '5432', dbname 'testdb');
>
> CREATE USER MAPPING FOR PUBLIC SERVER loop;
>
> create table onemillion (
>      id serial primary key,
>      inserted timestamp default clock_timestamp(),
>      data text
> );
>
> insert into onemillion(data) select random() from
> generate_series(1,1000000);
>
> CREATE FOREIGN TABLE onemillion_pgfdw (
>      id int,
>      inserted timestamp,
>      data text
> ) SERVER loop
> OPTIONS (table_name 'onemillion',
>           use_remote_estimate 'true');
>
> testdb=# explain analyse
> select * from onemillion_pgfdw where id in (select id from onemillion
> where data > '0.9' limit 100);
>                                                             QUERY
> PLAN
>
---------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop  (cost=122.49..10871.06 rows=500000 width=44) (actual
> time=4.269..93.444 rows=100 loops=1)
>     ->  HashAggregate  (cost=22.06..23.06 rows=100 width=4) (actual
> time=1.110..1.263 rows=100 loops=1)
>           ->  Limit  (cost=0.00..20.81 rows=100 width=4) (actual
> time=0.038..1.026 rows=100 loops=1)
>                 ->  Seq Scan on onemillion  (cost=0.00..20834.00
> rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1)
>                       Filter: (data > '0.9'::text)
>                       Rows Removed by Filter: 805
>     ->  Foreign Scan on onemillion_pgfdw  (cost=100.43..108.47 rows=1
> width=29) (actual time=0.772..0.773 rows=1 loops=100)
>   Total runtime: 93.820 ms
> (8 rows)
>
> Time: 97.283 ms
> ------------------------------
>
> ... actually performs 100 distinct "SELECT * FROM onemillion WHERE id =
> $1" calls on "remote" side.

Maybe I'm missing something, but I think that you can do what I think 
you'd like to do by the following procedure:

postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000');
ALTER SERVER
postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in 
(SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);                                          QUERY PLAN
----------------------------------------------------------------------------------------------- Hash Semi Join
(cost=1023.10..41983.21rows=100 width=30)   Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
 
onemillion_pgsql.data   Hash Cond: (onemillion_pgsql.id = onemillion.id)   ->  Foreign Scan on public.onemillion_pgsql
(cost=1000.00..39334.00
 
rows=1000000 width=29)         Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
onemillion_pgsql.data         Remote SQL: SELECT id, inserted, data FROM public.onemillion   ->  Hash
(cost=21.85..21.85rows=100 width=4)         Output: onemillion.id         ->  Limit  (cost=0.00..20.85 rows=100
width=4)              Output: onemillion.id               ->  Seq Scan on public.onemillion  (cost=0.00..20834.00 
 
rows=99918 width=4)                     Output: onemillion.id                     Filter: (onemillion.data >
'0.9'::text)Planning time: 0.690 ms
 
(14 rows)

or, that as Tom mentioned, by disabling the use_remote_estimate function:

postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SET 
use_remote_estimate 'false');
ALTER FOREIGN TABLE
postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in 
(SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);                                          QUERY PLAN
---------------------------------------------------------------------------------------------- Hash Semi Join
(cost=123.10..41083.21rows=100 width=30)   Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
 
onemillion_pgsql.data   Hash Cond: (onemillion_pgsql.id = onemillion.id)   ->  Foreign Scan on public.onemillion_pgsql
(cost=100.00..38434.00
 
rows=1000000 width=30)         Output: onemillion_pgsql.id, onemillion_pgsql.inserted, 
onemillion_pgsql.data         Remote SQL: SELECT id, inserted, data FROM public.onemillion   ->  Hash
(cost=21.85..21.85rows=100 width=4)         Output: onemillion.id         ->  Limit  (cost=0.00..20.85 rows=100
width=4)              Output: onemillion.id               ->  Seq Scan on public.onemillion  (cost=0.00..20834.00 
 
rows=99918 width=4)                     Output: onemillion.id                     Filter: (onemillion.data >
'0.9'::text)Planning time: 0.215 ms
 
(14 rows)

Thanks,

Best regards,
Etsuro Fujita



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BGWorkers, shared memory pointers, and postmaster restart
Следующее
От: Nicholas White
Дата:
Сообщение: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls