Обсуждение: Projection pushdown to index access method

Поиск
Список
Период
Сортировка

Projection pushdown to index access method

От
Chris Cleveland
Дата:
I'm working on an index access method. I have a function which can appear in a projection list which should be evaluated by the access method itself. Example:

SELECT title, my_special_function(body)
FROM books
WHERE book_id <===> 42;

"<===>" is the operator that invokes the access method. The value returned by my_special_function() gets calculated during the index scan, and depends on  information that exists only in the index.

How do I get the system to pull the value from the index instead of trying to calculate it?

So far, I have created a CustomScan and set it using set_rel_pathlist_hook. The hook function gives us a PlannerInfo, RelOptInfo, Index, and RangeTblEntry. So far as I can tell, only RelOptInfo.reltarget.exprs gives us any info on the SELECT expressions, but unfortunately, the exprs are Var nodes that contain the (title, body) columns from above, and do not say anything about my_special_function().

Where do I find the actual final projection exprs?

Am I using the right hook?

Is there any example code out there on how to do this?

I know this is possible, because the docs for PathTarget say this:

PathTarget
 *
 * This struct contains what we need to know during planning about the
 * targetlist (output columns) that a Path will compute.  Each RelOptInfo
 * includes a default PathTarget, which its individual Paths may simply
 * reference.  However, in some cases a Path may compute outputs different
 * from other Paths, and in that case we make a custom PathTarget for it.
 * For example, an indexscan might return index expressions that would
 * otherwise need to be explicitly calculated.


--
Chris Cleveland
312-339-2677 mobile

Re: Projection pushdown to index access method

От
Robert Haas
Дата:
On Tue, Sep 19, 2023 at 12:35 PM Chris Cleveland
<ccleveland@dieselpoint.com> wrote:
> I'm working on an index access method. I have a function which can appear in a projection list which should be
evaluatedby the access method itself. Example: 
>
> SELECT title, my_special_function(body)
> FROM books
> WHERE book_id <===> 42;
>
> "<===>" is the operator that invokes the access method. The value returned by my_special_function() gets calculated
duringthe index scan, and depends on  information that exists only in the index. 
>
> How do I get the system to pull the value from the index instead of trying to calculate it?

I don't see how you can do this in general, because there's no
guarantee that the plan will be an Index Scan or Index Only Scan
instead of a Seq Scan or Bitmap Heap/Index Scan.

> So far, I have created a CustomScan and set it using set_rel_pathlist_hook. The hook function gives us a PlannerInfo,
RelOptInfo,Index, and RangeTblEntry. So far as I can tell, only RelOptInfo.reltarget.exprs gives us any info on the
SELECTexpressions, but unfortunately, the exprs are Var nodes that contain the (title, body) columns from above, and do
notsay anything about my_special_function(). 

So what does the EXPLAIN plan look like?

I'm not quite sure what's happening here, but the planner likes to
make plans that just fetch attributes from all the relations being
joined (here, there's just one) and then perform the calculation of
any expressions at the very end, as the final step, or at least as the
final step at that subquery level. And if it plans to ask your custom
scan for title, body, and book_id and then compute
my_special_function(body) after the fact, the thing you want to happen
is not going to happen. If the planner can be induced to ask your
custom scan for my_special_function(body), then I *think* you should
be able to arrange to get that value any way you like and just return
it. But I don't quite know how to induce the planner to do that -- and
especially if this query involved more than one table, because of the
planner's tendency to postpone expression evaluation until after joins
are done.

> I know this is possible, because the docs for PathTarget say this:
>
> PathTarget
>  *
>  * This struct contains what we need to know during planning about the
>  * targetlist (output columns) that a Path will compute.  Each RelOptInfo
>  * includes a default PathTarget, which its individual Paths may simply
>  * reference.  However, in some cases a Path may compute outputs different
>  * from other Paths, and in that case we make a custom PathTarget for it.
>  * For example, an indexscan might return index expressions that would
>  * otherwise need to be explicitly calculated.

It's just worth keeping in mind that the planner and the executor are
very tightly bound together here. This may be one of those cases
getting the executor to do what you want is the easy part, and getting
the planner to produce a plan that tells it to do that is the hard
part.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Projection pushdown to index access method

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Sep 19, 2023 at 12:35 PM Chris Cleveland
> <ccleveland@dieselpoint.com> wrote:
>> I'm working on an index access method. I have a function which can appear in a projection list which should be
evaluatedby the access method itself. Example: 
>> ...
>> How do I get the system to pull the value from the index instead of trying to calculate it?

> I don't see how you can do this in general, because there's no
> guarantee that the plan will be an Index Scan or Index Only Scan
> instead of a Seq Scan or Bitmap Heap/Index Scan.

Yeah.  There is some adjacent functionality for indexed expressions,
which maybe you could use, but it has a lot of shortcomings yet.
For example:

regression=# create or replace function f(x int) returns int as $$begin return x+1; end$$ language plpgsql strict
immutablecost 1000; 
CREATE FUNCTION
regression=# create table mytable (id int, x int);
CREATE TABLE
regression=# create index on mytable(x, f(x));
CREATE INDEX
regression=# set enable_seqscan TO 0;
SET
regression=# set enable_bitmapscan TO 0;
SET
regression=# explain verbose select f(x) from mytable;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Index Only Scan using mytable_x_f_idx on public.mytable  (cost=0.15..5728.06 rows=2260 width=4)
   Output: (f(x))
(2 rows)

If you examine the plan tree closely you can confirm that it is pulling
f(x) from the index rather than recomputing it.  So maybe you could get
somewhere by pretending that my_special_function(body) is an indexed
expression.  However, there are a couple of big gotchas, which this
example illustrates:

1. The index has to also provide x (or for you, "body") or else the
planner fails to detect that an IOS is applicable.  This comes back
to the point Robert made about the planner preferring to think about
pulling individual Vars from tables: we don't believe the index is
usable in an IOS unless it provides all the Vars the query needs from
that table.  This wouldn't be hard to fix exactly; the problem is to
fix it without spending exponential amounts of planning time in
check_index_only.  We'd have to detect that all uses of "x" appear in
the context "f(x)" in order to realize that we don't need to be able
to fetch "x" itself.

2. Costing doesn't account for the fact that we've avoided runtime
computation of f(), thus the IOS plan may not be preferred over
other plan shapes, which is why I had to force it above.  Again,
this is pretty closely tied to the fact that we don't recognize
until very late in the game that we can get f(x) from the index.

3. This only works for an index-only scan, not regular index scans.
There's some early discussion happening about unifying IOS and
regular scans a bit more, which perhaps would allow relaxing that
(and maybe even solve issue #1?).  But it's a long way off yet.

If my_special_function() is supposed to always be applied to an
indexed column, then issue #1 would fortuitously not be a problem
for you.  But #2 is a pain, and #3 might be a deal-breaker for you.

            regards, tom lane