Re: Question about optimising (Postgres_)FDW

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: Question about optimising (Postgres_)FDW
Дата
Msg-id 534E82B8.9070205@krosing.net
обсуждение исходный текст
Ответ на Re: Question about optimising (Postgres_)FDW  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Ответы Re: Question about optimising (Postgres_)FDW  (Hannu Krosing <hannu@krosing.net>)
Re: Question about optimising (Postgres_)FDW  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Список pgsql-hackers
On 04/16/2014 01:35 PM, Etsuro Fujita wrote:
> (2014/04/16 6:55), Hannu Krosing wrote:
...
>
> Maybe I'm missing something, but I think that you can do what I think
> you'd like to do by the following procedure:
No, what I'd like PostgreSQL to do is to

1. select the id+set from local table
2. select the rows from remote table with WHERE ID IN (<set selected in
step 1>)
3. then join the original set to selected set, with any suitable join
strategy

The things I do not want are

A. selecting all rows from remote table   (this is what your examples below do)

or

B. selecting rows from remote table by single selects using "ID = $"   (this is something that I managed to do by some
tweakingof costs)
 

as A will be always slow if there are millions of rows in remote table
and B is slow(ish) when the idset is over a few hundred ids

I hope this is a bit better explanation than I provided before .

Cheers
Hannu

P.S. I am not sure if this is a limitation of postgres_fdw or postgres
itself

P.P.S I tested a little with with Multicorn an postgresql did not
request row
counts for any IN plans, so it may be that the planner does not consider
this
kind of plan at all. (testing was on PgSQL 9.3.4)

Hannu
>
> postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000');
> ALTER SERVER
> postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in
> (SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);
>                                           QUERY PLAN
> -----------------------------------------------------------------------------------------------
>
>  Hash Semi Join  (cost=1023.10..41983.21 rows=100 width=30)
>    Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
>    Hash Cond: (onemillion_pgsql.id = onemillion.id)
>    ->  Foreign Scan on public.onemillion_pgsql 
> (cost=1000.00..39334.00 rows=1000000 width=29)
>          Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
>          Remote SQL: SELECT id, inserted, data FROM public.onemillion
>    ->  Hash  (cost=21.85..21.85 rows=100 width=4)
>          Output: onemillion.id
>          ->  Limit  (cost=0.00..20.85 rows=100 width=4)
>                Output: onemillion.id
>                ->  Seq Scan on public.onemillion  (cost=0.00..20834.00
> rows=99918 width=4)
>                      Output: onemillion.id
>                      Filter: (onemillion.data > '0.9'::text)
>  Planning time: 0.690 ms
> (14 rows)
>
> or, that as Tom mentioned, by disabling the use_remote_estimate function:
>
> postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SET
> use_remote_estimate 'false');
> ALTER FOREIGN TABLE
> postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in
> (SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);
>                                           QUERY PLAN
> ----------------------------------------------------------------------------------------------
>
>  Hash Semi Join  (cost=123.10..41083.21 rows=100 width=30)
>    Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
>    Hash Cond: (onemillion_pgsql.id = onemillion.id)
>    ->  Foreign Scan on public.onemillion_pgsql  (cost=100.00..38434.00
> rows=1000000 width=30)
>          Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
>          Remote SQL: SELECT id, inserted, data FROM public.onemillion
>    ->  Hash  (cost=21.85..21.85 rows=100 width=4)
>          Output: onemillion.id
>          ->  Limit  (cost=0.00..20.85 rows=100 width=4)
>                Output: onemillion.id
>                ->  Seq Scan on public.onemillion  (cost=0.00..20834.00
> rows=99918 width=4)
>                      Output: onemillion.id
>                      Filter: (onemillion.data > '0.9'::text)
>  Planning time: 0.215 ms
> (14 rows)
>
> Thanks,
>
> Best regards,
> Etsuro Fujita
>
>




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Clock sweep not caching enough B-Tree leaf pages?
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Clock sweep not caching enough B-Tree leaf pages?