Обсуждение: Custom FDW - the results of a nested query/join not being passed as qual to the outer query

Поиск
Список
Период
Сортировка

Custom FDW - the results of a nested query/join not being passed as qual to the outer query

От
Kai Daguerre
Дата:
Hi!

(First post. If this is not the appropriate list, please feel free to move or let me know. )

I am developing an FDW which allows various data sources to act as virtual tables, allowing various different APIs to be queried using a consistent SQL interface - a similar concept to Osquery but using Postgres instead of SQLite. It is working pretty well, but we have hit a bit of a roadblock (or bump in the road at least).

We often have virtual tables where a list operation is not viable/possible without providing quals. For example we have implemented a 'whois' table, which will retrieve whois information for specified domains. It is clearly not practical to do an unqualified 'list' of all domains. 

The problem we have is that the results of nested subqueries/joins are not being passed as quals to the outer query.

So for example
   select * from whois_domain where domain in ('google.com', 'yahoo.co.uk')
works fine, and a qual is passed to the fdw with a value of ['google.com', 'yahoo.co.uk']

However the following (assuming a 'domains table containing required domains) does not work:
   select * from whois_domain where domain in (select domain from domains)

In this case, no quals are passed to the fdw, so the select * from whois_domain query therefore fails. What we would like is to ensure the subquery runs first, and for the results to be available to the outer query.

---

Using SQLite, this could be accomplished using cross-joins (https://sqlite.org/optoverview.html#crossjoin). Is there an equivalent (or similar) mechanism in Postgres to ensure query ordering?

Within the FDW, I have tried using the GetForeignPaths function to return a path which returns a single row when the 'key' column is used. This does provide a qual, however it is of type T_Var - I believe I need a constant qual.

Any suggestion welcome as to either a different way to structure the query or whether the FDW can request/enforce the ordering by returning specific planning results.

Many thanks,
Kai Daguerre

Product page: https://steampipe.io


Kai Daguerre <kai@turbot.com> writes:
> We often have virtual tables where a list operation is not viable/possible
> without providing quals. For example we have implemented a 'whois' table,
> which will retrieve whois information for specified domains. It is clearly
> not practical to do an unqualified 'list' of *all* domains.

In that case you're going to have to resign yourself to some queries
failing.  This is unavoidable, consider "select * from whois".  But
just because the query has a WHERE condition doesn't mean that a useful
restriction clause can be extracted for any particular table.

I think the best you can do is (1) fail at runtime if there's no qual
and (2) at plan time, return an extremely high cost estimate for a
qual-less scan, in hopes of discouraging the planner from choosing
that.  (Instead of (2), you could perhaps not generate a scan path
at all, but that's likely to lead to an unintelligible error message.)

Perhaps you should rethink whether you really want a foreign table
rather than a set-returning function.  With the SRF approach it's
automatic that the user must supply the restricting argument(s) you need.

            regards, tom lane



Re: Custom FDW - the results of a nested query/join not being passed as qual to the outer query

От
Kai Daguerre
Дата:
Many thanks for the fast response. 

Using an SRF is an interesting idea, I'll have a play and see if we can make that work.

Cheers,
Kai

On Wed, Jan 27, 2021 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kai Daguerre <kai@turbot.com> writes:
> We often have virtual tables where a list operation is not viable/possible
> without providing quals. For example we have implemented a 'whois' table,
> which will retrieve whois information for specified domains. It is clearly
> not practical to do an unqualified 'list' of *all* domains.

In that case you're going to have to resign yourself to some queries
failing.  This is unavoidable, consider "select * from whois".  But
just because the query has a WHERE condition doesn't mean that a useful
restriction clause can be extracted for any particular table.

I think the best you can do is (1) fail at runtime if there's no qual
and (2) at plan time, return an extremely high cost estimate for a
qual-less scan, in hopes of discouraging the planner from choosing
that.  (Instead of (2), you could perhaps not generate a scan path
at all, but that's likely to lead to an unintelligible error message.)

Perhaps you should rethink whether you really want a foreign table
rather than a set-returning function.  With the SRF approach it's
automatic that the user must supply the restricting argument(s) you need.

                        regards, tom lane