Обсуждение: Implementing foreign data wrappers and avoiding n+1 querying

Поиск
Список
Период
Сортировка

Implementing foreign data wrappers and avoiding n+1 querying

От
David Gilman
Дата:
When a fdw table participates in query planning and finds itself as
part of a join it can output a parameterized path. If chosen, Postgres
will dutifully call the fdw over and over via IterateForeignScan to
fetch matching tuples. Many fdw extensions do network traffic, though,
and it would be beneficial to reduce the total number of queries done
or network connections established.

Is there some path that can be emitted by the fdw, or some other
technique, to get the query planner and everything else to handle
batching the tuples returned by the outer relation? For example, once
batched the fdw extension could send the equivalent of a WHERE row IN
(a, b, c), or maybe even WHERE row BETWEEN a AND c to the foreign
system, and either the fdw callback or a subplan does the rechecking
to match up the returned foreign tuples with the local ones.

One thought is that it might be possible to abuse the async support
for fdws to accomplish this. Your fdw could accept async requests, sit
on them until some threshold is crossed, do the actual query and feed
them back into the executor when the results are back. However, from
what I can tell the async interface has no way to tell the ForeignScan
that it won't get any more async requests, so there's no way to force
flush the final batch of queries.



Re: Implementing foreign data wrappers and avoiding n+1 querying

От
David Rowley
Дата:
On Thu, 22 Dec 2022 at 13:31, David Gilman <davidgilman1@gmail.com> wrote:
>
> When a fdw table participates in query planning and finds itself as
> part of a join it can output a parameterized path. If chosen, Postgres
> will dutifully call the fdw over and over via IterateForeignScan to
> fetch matching tuples. Many fdw extensions do network traffic, though,
> and it would be beneficial to reduce the total number of queries done
> or network connections established.

Sounds like you might be looking for fdw_startup_cost [1].

David

[1] https://www.postgresql.org/docs/current/postgres-fdw.html



Re: Implementing foreign data wrappers and avoiding n+1 querying

От
David Gilman
Дата:
I apologize that my post was not super clear, I am thinking about implementing a fdw from scratch, and the target database is one of those NoSQL databases where you have to send JSON over a HTTP connection for each query.

I have reviewed the postgres fdw code to see how it works and to see what's possible. Although it probably wouldn't benefit as much from this sort of thing (yay to postgres' design!) It could possibly still benefit a bit, which makes me wonder if it can't be done with the current planner nodes it might be a worthy improvement to add support for this.

On Wed, Dec 21, 2022, 10:57 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 22 Dec 2022 at 13:31, David Gilman <davidgilman1@gmail.com> wrote:
>
> When a fdw table participates in query planning and finds itself as
> part of a join it can output a parameterized path. If chosen, Postgres
> will dutifully call the fdw over and over via IterateForeignScan to
> fetch matching tuples. Many fdw extensions do network traffic, though,
> and it would be beneficial to reduce the total number of queries done
> or network connections established.

Sounds like you might be looking for fdw_startup_cost [1].

David

[1] https://www.postgresql.org/docs/current/postgres-fdw.html

Re: Implementing foreign data wrappers and avoiding n+1 querying

От
Brad White
Дата:
We had a similar situation in a completely different context.
Our eventual solution was to fire off a request as soon as one came in. Then we batched further requests until the first returned. Whenever a request returned, we sent any pending requests. 
Any single request not sent immediately was slowed slightly, but overall the system was faster because of the reduced traffic. 

Brad

On Thu, Dec 22, 2022, 6:51 AM David Gilman <davidgilman1@gmail.com> wrote:
I apologize that my post was not super clear, I am thinking about implementing a fdw from scratch, and the target database is one of those NoSQL databases where you have to send JSON over a HTTP connection for each query.

I have reviewed the postgres fdw code to see how it works and to see what's possible. Although it probably wouldn't benefit as much from this sort of thing (yay to postgres' design!) It could possibly still benefit a bit, which makes me wonder if it can't be done with the current planner nodes it might be a worthy improvement to add support for this.

On Wed, Dec 21, 2022, 10:57 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 22 Dec 2022 at 13:31, David Gilman <davidgilman1@gmail.com> wrote:
>
> When a fdw table participates in query planning and finds itself as
> part of a join it can output a parameterized path. If chosen, Postgres
> will dutifully call the fdw over and over via IterateForeignScan to
> fetch matching tuples. Many fdw extensions do network traffic, though,
> and it would be beneficial to reduce the total number of queries done
> or network connections established.

Sounds like you might be looking for fdw_startup_cost [1].

David

[1] https://www.postgresql.org/docs/current/postgres-fdw.html