Re: Question about optimising (Postgres_)FDW

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: Question about optimising (Postgres_)FDW
Дата
Msg-id 534E0870.3050309@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: Question about optimising (Postgres_)FDW  (Hannu Krosing <hannu@2ndQuadrant.com>)
Список pgsql-hackers
On 04/16/2014 06:12 AM, Hannu Krosing wrote:
> On 04/16/2014 01:25 AM, Tom Lane wrote:
>> Hannu Krosing <hannu@2ndQuadrant.com> writes:
>>> Is there a way to force it to prefer a plan where the results of (select
>>> id from onemillion where data > '0.9' limit 100)
>>> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
>>> all at once ?
>> You could write the query like that:
>>
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
> My actual use-case was about a join between a local and a remote table
> and without rewriting the query (they come from ORM)
>
> I was hoping to be able to nudge postgresql towards a better plan via some
> tuning of table/fdw options or GUCs.
>
> for example, would postgresql use the WHERE id IN (...) query on remote
> side for a query like
>
> select r.data, l.data
>   from onemillion_pgfdw r
>   join onemillion l
>     on r.id = l.id and l.data > '0.999';
>
> if it recognizes that the local side returns only 1000 rows ?
>
> or would it still use 1000 individual WHERE id = $1 queries.
>
> Is getting the foreign data via IN and then turning the data into a hash
> for joining one of the plans it considers at all ?
It sees that could we need an extra tuning parameter for choosing the

ID IN (...) + HASH plan over individual SELECT .. WHERE ID = $1

something between

`fdw_startup_cost` and `fdw_tuple_cost`

to signify that an IN query returning 1000 rows runs faster than 1000 =
queries

as I understan currently they both would be estimated as

fdw_startup_cost + 1000 * fdw_tuple_cost

the new parameter could be fdw_call_cost or fdw_query_cost and would
estimate
how much each individual call to fdw costs, thus favouring calls which
return more
data in one call

Cheers
Hannu
>
> Best
> Hannu
>
>> Or at least you should be able to, except when I try it I get
>>
>> explain analyze
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
>> ERROR:  operator does not exist: integer = integer[]
>> HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
>> CONTEXT:  Remote SQL command: EXPLAIN SELECT id, inserted, data FROM public.onemillion WHERE ((id = ANY ((SELECT
null::integer[]))))
>>
>> so there's something the remote-estimate code is getting wrong here.
>> (It seems to work without remote_estimate, though.)
>>
>>             regards, tom lane
>


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Clock sweep not caching enough B-Tree leaf pages?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Clock sweep not caching enough B-Tree leaf pages?