Обсуждение: Avoid computing ORDER BY junk columns unnecessarily
Problem ------- We are using junk columns for (at least) two slightly different purposes: 1. For passing row IDs and other such data from lower plan nodes to LockRows / ModifyTable. 2. To represent ORDER BY and GROUP BY columns that don't appear in the SELECT list. For example, in a query like: SELECT foo FROM mytable ORDER BY bar; The parser adds 'bar' to the target list as a junk column. You can see that with EXPLAIN VERBOSE: explain (verbose, costs off) select foo from mytable order by bar; QUERY PLAN ---------------------------------- Sort Output: foo, bar Sort Key: mytable.bar -> Seq Scan on public.mytable Output: foo, bar (5 rows) The 'bar' column get filtered away in the executor, by the so-called junk filter. That's fine for simple cases like the above, but in some cases, that causes the ORDER BY value to be computed unnecessarily. For example: create table mytable (foo text, bar text); insert into mytable select g, g from generate_series(1, 10000) g; create index on mytable (sha256(bar::bytea)); explain verbose select foo from mytable order by sha256(bar::bytea); QUERY PLAN ------------------------------------------------------------------------------------------------ Index Scan using mytable_sha256_idx on public.mytable (cost=0.29..737.28 rows=10000 width=64) Output: foo, sha256((bar)::bytea) (2 rows) The index is used to satisfy the ORDER BY, but the expensive ORDER BY expression is still computed for every row, just to be thrown away by the junk filter. This came up with pgvector, as the vector distance functions are pretty expensive. All vector operations are expensive, so one extra distance function call per row doesn't necessarily make that much difference, but it sure looks silly. See https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021 (thanks Matthias for the investigation!). Solution -------- The obvious solution is that the planner should not include those junk columns in the plan. But how exactly to implement that is a different matter. I came up with the attached patch set, which adds a projection to all the paths at the end of planning in grouping_planner(). The projection filters out the unnecessary junk columns. With that, the plan for the above example: postgres=# explain verbose select foo from mytable order by sha256(bar::bytea); QUERY PLAN ----------------------------------------------------------------------------------------------- Index Scan using mytable_sha256_idx on public.mytable (cost=0.29..662.24 rows=10000 width=4) Output: foo (2 rows) Problems with the solution -------------------------- So this seems to work, but I have a few doubts: 1. Because Sort cannot project, this adds an extra Result node on top of Sort nodes when the the ORDER BY is implemented by sorting: postgres=# explain verbose select foo from mytable order by bar; QUERY PLAN -------------------------------------------------------------------------------- Result (cost=818.39..843.39 rows=10000 width=4) Output: foo -> Sort (cost=818.39..843.39 rows=10000 width=8) Output: foo, bar Sort Key: mytable.bar -> Seq Scan on public.mytable (cost=0.00..154.00 rows=10000 width=8) Output: foo, bar (7 rows) From a performance point of view, I think that's not as bad as it sounds. Remember that without this patch, the executor needs to execute the junk filter to filter out the extra column instead. It's not clear that an extra Result is worse than that, although I haven't tried benchmarking it though. This makes plans for queries like above more verbose though. Perhaps we should teach Sort (and MergeAppend) to do projection, just to avoid that? Another solution would be to continue relying on the junk filter, if adding the projection in the planner leads to an extra Result node. That's a bit ugly, because then the final target list of a (sub)query depends on the Path that's chosen. 2. Instead of tacking on the projection to the paths at the end, I first tried modifying the code earlier in grouping_planner() that computes the target lists for the different plan stages. That still feels like a cleaner approach to me, although I don't think there's any difference in the generated plans in practice. However I ran into some problems with that approach and gave up. I basically tried to remove the junk columns from 'final_target', and have create_ordered_paths() create paths with the filtered target list directly. And if there is no ORDER BY, leave out the junk columns from 'grouping_target' too, and have create_grouping_paths() generate the final target list directly. However, create_grouping_paths() assumes that the grouping columns are included in 'grouping_target'. And similarly in create_ordered_paths(), some partial planning stages assume that the ordering columns are included in 'final_target'. Those assumptions could probably be fixed, but I ran out of steam trying to do that. 3. I also considered if we should stop using junk columns to represent ORDER BY / GROUP BY columns like this in the first place. Perhaps we should have a separate list for those and not stash them in the target list. But that seems like a much bigger change. 4. It would be nice to get rid of the junk filter in the executor altogether. With this patch, the junk columns used for RowLocks and ModifyTable are still included in the final target list, and are still filtered out by the junk filter. But we could add similar projections between the RowLocks and ModifyTable stages, to eliminate all the junk columns at the top of the plan. ExecFilterJunk() isn't a lot of code, but it would feel cleaner to me. I didn't try to implement that. 5. I'm not sure the categorization of junk columns that I implemented here is the best one. It might make sense to have more categories, and distinguish row-id columns from others for example. And ORDER BY columns from GROUP BY columns. Patches ------- So the attached patches implement that idea, with the above-mentioned problems. I think this approach is fine as it is, despite those problems, but my planner-fu is a rusty so I really need review and a second opinion on this. v1-0001-Add-test-for-Min-Max-optimization-with-kNN-index-.patch v1-0002-Show-how-ORDER-BY-expression-is-computed-unnecess.patch These patches just add to existing tests to demonstrate the problem. v1-0003-Turn-resjunk-into-an-enum.patch Replace 'resjunk' boolean with an enum, so that we can distinguish between different junk columns. The next patch uses that information to identify junk columns that can be filtered out. It's is a separate patch for easier review. v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch The main patch in this series. v1-0005-Fix-regression-tests-caused-by-additional-Result-.patch Regression test output changes, for all the plans with Sort that now have Sort + Result. See "Problem with the solution" #1. -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
Hi Heikii,
I haven't dug into your patch yet, but for this problem, I have another idea.
-------
explain verbose
select foo from mytable order by sha256(bar::bytea);
QUERY PLAN
------------------------------------------------------------------------------------------------
Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..737.28 rows=10000 width=64)
Output: foo, sha256((bar)::bytea)
(2 rows)
The index is used to satisfy the ORDER BY, but the expensive ORDER BY
expression is still computed for every row, just to be thrown away by
the junk filter.
select foo from mytable order by sha256(bar::bytea);
QUERY PLAN
------------------------------------------------------------------------------------------------
Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..737.28 rows=10000 width=64)
Output: foo, sha256((bar)::bytea)
(2 rows)
The index is used to satisfy the ORDER BY, but the expensive ORDER BY
expression is still computed for every row, just to be thrown away by
the junk filter.
------
How about adding the orderby column value in 'xs_heaptid' with the 'xs_heaptid'
together? So that we can use that value directly instead of computing it when using
an index scan to fetch the ordered data.
Another problem I am concerned about is that if we exclude junk
columns in the sort plan, we may change its behavior. I'm not sure if it can lead to some other issues.
Heikki Linnakangas <hlinnaka@iki.fi> 于2023年12月22日周五 02:39写道:
Problem
-------
We are using junk columns for (at least) two slightly different purposes:
1. For passing row IDs and other such data from lower plan nodes to
LockRows / ModifyTable.
2. To represent ORDER BY and GROUP BY columns that don't appear in the
SELECT list. For example, in a query like:
SELECT foo FROM mytable ORDER BY bar;
The parser adds 'bar' to the target list as a junk column. You can see
that with EXPLAIN VERBOSE:
explain (verbose, costs off)
select foo from mytable order by bar;
QUERY PLAN
----------------------------------
Sort
Output: foo, bar
Sort Key: mytable.bar
-> Seq Scan on public.mytable
Output: foo, bar
(5 rows)
The 'bar' column get filtered away in the executor, by the so-called
junk filter. That's fine for simple cases like the above, but in some
cases, that causes the ORDER BY value to be computed unnecessarily. For
example:
create table mytable (foo text, bar text);
insert into mytable select g, g from generate_series(1, 10000) g;
create index on mytable (sha256(bar::bytea));
explain verbose
select foo from mytable order by sha256(bar::bytea);
QUERY PLAN
------------------------------------------------------------------------------------------------
Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..737.28 rows=10000 width=64)
Output: foo, sha256((bar)::bytea)
(2 rows)
The index is used to satisfy the ORDER BY, but the expensive ORDER BY
expression is still computed for every row, just to be thrown away by
the junk filter.
This came up with pgvector, as the vector distance functions are pretty
expensive. All vector operations are expensive, so one extra distance
function call per row doesn't necessarily make that much difference, but
it sure looks silly. See
https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021
(thanks Matthias for the investigation!).
Solution
--------
The obvious solution is that the planner should not include those junk
columns in the plan. But how exactly to implement that is a different
matter.
I came up with the attached patch set, which adds a projection to all
the paths at the end of planning in grouping_planner(). The projection
filters out the unnecessary junk columns. With that, the plan for the
above example:
postgres=# explain verbose select foo from mytable order by
sha256(bar::bytea);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..662.24 rows=10000 width=4)
Output: foo
(2 rows)
Problems with the solution
--------------------------
So this seems to work, but I have a few doubts:
1. Because Sort cannot project, this adds an extra Result node on top of
Sort nodes when the the ORDER BY is implemented by sorting:
postgres=# explain verbose select foo from mytable order by bar;
QUERY PLAN
--------------------------------------------------------------------------------
Result (cost=818.39..843.39 rows=10000 width=4)
Output: foo
-> Sort (cost=818.39..843.39 rows=10000 width=8)
Output: foo, bar
Sort Key: mytable.bar
-> Seq Scan on public.mytable (cost=0.00..154.00 rows=10000
width=8)
Output: foo, bar
(7 rows)
From a performance point of view, I think that's not as bad as it
sounds. Remember that without this patch, the executor needs to execute
the junk filter to filter out the extra column instead. It's not clear
that an extra Result is worse than that, although I haven't tried
benchmarking it though.
This makes plans for queries like above more verbose though. Perhaps we
should teach Sort (and MergeAppend) to do projection, just to avoid that?
Another solution would be to continue relying on the junk filter, if
adding the projection in the planner leads to an extra Result node.
That's a bit ugly, because then the final target list of a (sub)query
depends on the Path that's chosen.
2. Instead of tacking on the projection to the paths at the end, I first
tried modifying the code earlier in grouping_planner() that computes the
target lists for the different plan stages. That still feels like a
cleaner approach to me, although I don't think there's any difference in
the generated plans in practice. However I ran into some problems with
that approach and gave up.
I basically tried to remove the junk columns from 'final_target', and
have create_ordered_paths() create paths with the filtered target list
directly. And if there is no ORDER BY, leave out the junk columns from
'grouping_target' too, and have create_grouping_paths() generate the
final target list directly. However, create_grouping_paths() assumes
that the grouping columns are included in 'grouping_target'. And
similarly in create_ordered_paths(), some partial planning stages assume
that the ordering columns are included in 'final_target'. Those
assumptions could probably be fixed, but I ran out of steam trying to do
that.
3. I also considered if we should stop using junk columns to represent
ORDER BY / GROUP BY columns like this in the first place. Perhaps we
should have a separate list for those and not stash them in the target
list. But that seems like a much bigger change.
4. It would be nice to get rid of the junk filter in the executor
altogether. With this patch, the junk columns used for RowLocks and
ModifyTable are still included in the final target list, and are still
filtered out by the junk filter. But we could add similar projections
between the RowLocks and ModifyTable stages, to eliminate all the junk
columns at the top of the plan. ExecFilterJunk() isn't a lot of code,
but it would feel cleaner to me. I didn't try to implement that.
5. I'm not sure the categorization of junk columns that I implemented
here is the best one. It might make sense to have more categories, and
distinguish row-id columns from others for example. And ORDER BY columns
from GROUP BY columns.
Patches
-------
So the attached patches implement that idea, with the above-mentioned
problems. I think this approach is fine as it is, despite those
problems, but my planner-fu is a rusty so I really need review and a
second opinion on this.
v1-0001-Add-test-for-Min-Max-optimization-with-kNN-index-.patch
v1-0002-Show-how-ORDER-BY-expression-is-computed-unnecess.patch
These patches just add to existing tests to demonstrate the problem.
v1-0003-Turn-resjunk-into-an-enum.patch
Replace 'resjunk' boolean with an enum, so that we can distinguish
between different junk columns. The next patch uses that information to
identify junk columns that can be filtered out. It's is a separate patch
for easier review.
v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch
The main patch in this series.
v1-0005-Fix-regression-tests-caused-by-additional-Result-.patch
Regression test output changes, for all the plans with Sort that now
have Sort + Result. See "Problem with the solution" #1.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 22/12/2023 11:05, Xiaoran Wang wrote: > I haven't dug into your patch yet, but for this problem, I have another > idea. > ------- > explain verbose > select foo from mytable order by sha256(bar::bytea); > > QUERY PLAN > > ------------------------------------------------------------------------------------------------ > Index Scan using mytable_sha256_idx on public.mytable > (cost=0.29..737.28 rows=10000 width=64) > Output: foo, sha256((bar)::bytea) > (2 rows) > > The index is used to satisfy the ORDER BY, but the expensive ORDER BY > expression is still computed for every row, just to be thrown away by > the junk filter. > ------ > How about adding the orderby column value in 'xs_heaptid' with the > 'xs_heaptid' > together? So that we can use that value directly instead of computing it > when using > an index scan to fetch the ordered data. Hmm, so return the computed column from the index instead of recomputing it? Yeah, that makes sense too and would help in this example. It won't help in all cases though, the index might not store the original value in the first place. -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > Hmm, so return the computed column from the index instead of recomputing > it? Yeah, that makes sense too and would help in this example. Yeah, that's been on the to-do list for ages. The main problems are (1) we need the planner to not spend too much effort on looking for subexpression matches, and (2) amcanreturn ability isn't implemented by the executor in plain indexscans. There's another thread right now discussing fixing (2), after which we could perhaps work on this. > It won't > help in all cases though, the index might not store the original value > in the first place. I'm a little skeptical that an index could produce an accurate ORDER BY result if it doesn't store the values-to-be-sorted exactly. Any loss of information would compromise its ability to sort nearly-identical values correctly. A more credible argument is that the index might expose amcanorder ability but not amcanreturn; but what I'm saying is that that's probably an AM implementation gap that ought to be fixed. How much of your patchset still makes sense if we assume that we can always extract the ORDER BY column values from the index? regards, tom lane
On 22/12/2023 17:24, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> It won't >> help in all cases though, the index might not store the original value >> in the first place. > > I'm a little skeptical that an index could produce an accurate ORDER BY > result if it doesn't store the values-to-be-sorted exactly. Any loss > of information would compromise its ability to sort nearly-identical > values correctly. In the context of pgvector, its ordering is approximate anyway. Aside from that, there's one trick that it implements: it compares squares of distances, avoiding a sqrt() calculation. (I wonder if we could do the same in GiST opclasses) > A more credible argument is that the index might > expose amcanorder ability but not amcanreturn; but what I'm saying is > that that's probably an AM implementation gap that ought to be fixed. > > How much of your patchset still makes sense if we assume that we > can always extract the ORDER BY column values from the index? That would make it much less interesting. But I don't think that's a good assumption. Especially in the kNN case, the ORDER BY value would not be stored in the index. Most likely the index needs to calculate it in some form, but it might take shortcuts like avoiding the sqrt(). -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 22/12/2023 17:24, Tom Lane wrote: >> How much of your patchset still makes sense if we assume that we >> can always extract the ORDER BY column values from the index? > That would make it much less interesting. But I don't think that's a > good assumption. Especially in the kNN case, the ORDER BY value would > not be stored in the index. Most likely the index needs to calculate it > in some form, but it might take shortcuts like avoiding the sqrt(). Yeah, fair point. I'll try to take a look at your patchset after the holidays. regards, tom lane
On Sat, Dec 23, 2023 at 1:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 22/12/2023 17:24, Tom Lane wrote:
>> How much of your patchset still makes sense if we assume that we
>> can always extract the ORDER BY column values from the index?
> That would make it much less interesting. But I don't think that's a
> good assumption. Especially in the kNN case, the ORDER BY value would
> not be stored in the index. Most likely the index needs to calculate it
> in some form, but it might take shortcuts like avoiding the sqrt().
Yeah, fair point. I'll try to take a look at your patchset after
the holidays.
Agreed.
I haven't looked into these patches, but it seems that there is an issue
with how the targetlist is handled for foreign rels. The following test
case for postgres_fdw hits the Assert in apply_tlist_labeling().
contrib_regression=# SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;
server closed the connection unexpectedly
When we create foreign final path in add_foreign_final_paths(), we use
root->upper_targets[UPPERREL_FINAL] as the PathTarget. This PathTarget
is set in grouping_planner(), without removing the junk columns. I
think this is why the above query hits the Assert.
Thanks
Richard
I haven't looked into these patches, but it seems that there is an issue
with how the targetlist is handled for foreign rels. The following test
case for postgres_fdw hits the Assert in apply_tlist_labeling().
contrib_regression=# SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;
server closed the connection unexpectedly
When we create foreign final path in add_foreign_final_paths(), we use
root->upper_targets[UPPERREL_FINAL] as the PathTarget. This PathTarget
is set in grouping_planner(), without removing the junk columns. I
think this is why the above query hits the Assert.
Thanks
Richard
On Fri, Dec 22, 2023 at 2:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch
The main patch in this series.
This patch filters out the junk columns created for ORDER BY/GROUP BY,
and retains the junk columns created for RowLocks. I'm afraid this may
have a problem about out-of-order resnos. For instance,
create table mytable (foo text, bar text);
# explain select foo from mytable order by bar for update of mytable;
server closed the connection unexpectedly
This query triggers another Assert in apply_tlist_labeling:
Assert(dest_tle->resno == src_tle->resno);
At first there are three TargetEntry items: foo, bar and ctid, with
resnos being 1, 2 and 3. And then the second item 'bar' is removed,
leaving only two items: foo and ctid, with resnos being 1 and 3. So now
we have a missing resno, and finally hit the Assert.
Thanks
Richard
and retains the junk columns created for RowLocks. I'm afraid this may
have a problem about out-of-order resnos. For instance,
create table mytable (foo text, bar text);
# explain select foo from mytable order by bar for update of mytable;
server closed the connection unexpectedly
This query triggers another Assert in apply_tlist_labeling:
Assert(dest_tle->resno == src_tle->resno);
At first there are three TargetEntry items: foo, bar and ctid, with
resnos being 1, 2 and 3. And then the second item 'bar' is removed,
leaving only two items: foo and ctid, with resnos being 1 and 3. So now
we have a missing resno, and finally hit the Assert.
Thanks
Richard
I wrote: > Yeah, fair point. I'll try to take a look at your patchset after > the holidays. I spent some time looking at this patch, and I'm not very pleased with it. My basic complaint is that this is a band-aid that only touches things at a surface level, whereas I think we need a much deeper set of changes in order to have a plausible solution. Specifically, you're only munging the final top-level targetlist not anything for lower plan levels. That looks like it works in simple cases, but it leaves everything on the table as soon as there's more than one level of plan involved. I didn't stop to trace the details but I'm pretty sure this is why you're getting the bogus extra projections shown in the 0005 patch. Moreover, this isn't doing anything for cost estimates, which means the planner doesn't really understand that more-desirable plans are more desirable, and it may not pick an available plan that would exploit what we want to have happen here. As an example, say we have an index on f(x) and the query requires sorting by f(x) but not final output of f(x). If we devise a plan that uses the index to sort and doesn't emit f(x), we need to not charge the evaluation cost of f() for that plan --- this might make the difference between picking that plan and not. Right now I believe we'll charge all plans for f(), so that some other plan might look cheaper even when f() is very expensive. Another example: we might be using an indexscan but not relying on its sort order, for example because its output is going into a hash join and then we'll sort afterwards. For expensive f() it would still be useful if the index could emit f(x) so that we don't have to calculate f() at the sort step. Right now I don't think we can even emit that plan shape, never mind recognize why it's cheaper. I have only vague ideas about how to do this better. It might work to set up multiple PathTargets for a relation that has such an index, one for the base case where the scan only emits x and f() is computed above, one for the case where we don't need either x or f(x) because we're relying on the index order, and one that emits f(x) with the expectation that a sort will happen above. Then we'd potentially generate multiple Paths representing the very same indexscan but with different PathTargets, and differing targets would have to become something that add_path treats as a reason to keep multiple Paths for the same relation. I'm a little frightened about the possible growth in number of paths considered, but how else would we keep track of the differing costs of these cases? So your proposed patch is far from a final solution in this area, and I'm not even sure that it represents a plausible incremental step. I've got mixed feelings about the "make resjunk an enum" idea --- it seems quite invasive and I'm not sure it's buying enough to risk breaking non-planner code for. It might be better to leave the targetlist representation alone and deal with this strictly inside the planner. We could, for example, add an array to PlannerInfo that's indexed by query-tlist resno and indicates the reason for a particular tlist item to exist. Also, as you mentioned, this categorization of junk columns seems a little unprincipled. It might help to think about that in terms similar to the "attr_needed" data we keep for Vars, that is: how far up in the plan tree is this tlist item needed? I think the possibilities are: * needed for final output (ie, not resjunk) * needed for final grouping/sorting steps (I think all resjunk items produced by the parser are this case; but do we need to distinguish GROUP BY from ORDER BY?) * needed at the ModifyTable node (for rowmark columns) * needed at the SetOp node (for flag columns added in prepunion.c) It looks to me like these cases are mutually exclusive, so that we just need an enum value not a bitmask. Only "needed for final grouping/sorting" is something we can potentially omit from the plan. BTW, the hack you invented JUNK_PLANNER_ONLY for, namely that create_indexscan_plan feeds canreturn flags forward to set_indexonlyscan_references, could be done another way: we don't really have to overload resjunk for that. At worst, set_indexonlyscan_references could re-look-up the IndexOptInfo using the index OID from the plan node, and get the canreturn flags directly from that. If we keep resjunk as a bool I'd be inclined to leave this alone; but if we change resjunk, we don't need the replacement design to account for this bit. regards, tom lane
On 29/12/2023 01:42, Tom Lane wrote: > I wrote: >> Yeah, fair point. I'll try to take a look at your patchset after >> the holidays. > > I spent some time looking at this patch, and I'm not very pleased > with it. My basic complaint is that this is a band-aid that only > touches things at a surface level, whereas I think we need a much > deeper set of changes in order to have a plausible solution. > Specifically, you're only munging the final top-level targetlist > not anything for lower plan levels. That looks like it works in > simple cases, but it leaves everything on the table as soon as > there's more than one level of plan involved. Yeah, that's fair. > I didn't stop to trace the details but I'm pretty sure this is why > you're getting the bogus extra projections shown in the 0005 patch. They're not bogus. With the patches, projecting away the junk columns is visible in the plan as an extra Result node, while currently it's done as an implicit step in the executor. That seems fine and arguably an even more honest representation of what's happening, although I don't like the extra verbosity in EXPLAIN output. > Moreover, this isn't doing anything for cost estimates, which means > the planner doesn't really understand that more-desirable plans are > more desirable, and it may not pick an available plan that would > exploit what we want to have happen here. > > As an example, say we have an index on f(x) and the query requires > sorting by f(x) but not final output of f(x). If we devise a plan > that uses the index to sort and doesn't emit f(x), we need to not > charge the evaluation cost of f() for that plan --- this might > make the difference between picking that plan and not. Right now > I believe we'll charge all plans for f(), so that some other plan > might look cheaper even when f() is very expensive. > > Another example: we might be using an indexscan but not relying on > its sort order, for example because its output is going into a hash > join and then we'll sort afterwards. For expensive f() it would > still be useful if the index could emit f(x) so that we don't have > to calculate f() at the sort step. Right now I don't think we can > even emit that plan shape, never mind recognize why it's cheaper. Related to this, we are not currently costing the target list evaluation correctly for index-only scans. Here's an example: create or replace function expensive(i int) returns int language plpgsql as $$ begin return i; end; $$ immutable cost 1000000; create table atab (i int); insert into atab select g from generate_series(1, 10000) g; create index on atab (i, expensive(i)); postgres=# explain verbose select expensive(i) from atab order by expensive(i); QUERY PLAN ---------------------------------------------------------------------------- Sort (cost=25000809.39..25000834.39 rows=10000 width=4) Output: (expensive(i)) Sort Key: (expensive(atab.i)) -> Seq Scan on public.atab (cost=0.00..25000145.00 rows=10000 width=4) Output: expensive(i) (5 rows) postgres=# set enable_seqscan=off; set enable_bitmapscan=off; SET SET postgres=# explain verbose select expensive(i) from atab order by expensive(i); QUERY PLAN -------------------------------------------------------------------------------------------------------------- Sort (cost=25001114.67..25001139.67 rows=10000 width=4) Output: (expensive(i)) Sort Key: (expensive(atab.i)) -> Index Only Scan using atab_i_expensive_idx on public.atab (cost=0.29..25000450.29 rows=10000 width=4) Output: (expensive(i)) (5 rows) The cost of the index only scan ought to be much lower than the seqscan, as it can return the pre-computed expensive(i) from the index. That could probably be fixed without any of the other changes we've been discussing here, though. > I have only vague ideas about how to do this better. It might work > to set up multiple PathTargets for a relation that has such an > index, one for the base case where the scan only emits x and f() is > computed above, one for the case where we don't need either x or > f(x) because we're relying on the index order, and one that emits > f(x) with the expectation that a sort will happen above. Then we'd > potentially generate multiple Paths representing the very same > indexscan but with different PathTargets, and differing targets > would have to become something that add_path treats as a reason to > keep multiple Paths for the same relation. I'm a little frightened > about the possible growth in number of paths considered, but how > else would we keep track of the differing costs of these cases? Hmm, if there are multiple functions like that in the target list, would you need to create different paths for each combination of expressions? That could really explode the number of paths. Perhaps each Path could include "optional" target entries that it can calculate more cheaply, with a separate cost for each such expression. add_path() would treat the presence of optional target entries as a reason to retain the path, but you wouldn't need to keep a separate path for each PathTarget. Another idea is to include f(x) in the PathTarget only if it's "cheap to emit". For example, if it's an expression index on f(x). If it turns out that f(x) is not needed higher up in the plan, it's not a big error in the estimate because it was cheap to emit. That wouldn't work well if the index AM could calculate f(x) more cheaply than the executor, but the cost was still not trivial. I'm not sure if that situation can arise currently. > * needed for final grouping/sorting steps (I think all > resjunk items produced by the parser are this case; > but do we need to distinguish GROUP BY from ORDER BY?) It would seem useful to distinguish GROUP BY and ORDER BY. For example: SELECT COUNT(*) FROM table GROUP BY a, b ORDER BY a; If this is implemented as HashAgg + Sort for example, only 'a' would be needed by the sort. Including less data in a Sort is good. I wanted to implement that in my patch already, by removing the junk columns needed for grouping but not sorting from sort_input_target. But the grouping path generation has some assumptions that the grouping output target list includes all the grouping columns. I don't remember the exact problem that made me give up on that, but it probably could be fixed. -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 29/12/2023 01:42, Tom Lane wrote: >> I didn't stop to trace the details but I'm pretty sure this is why >> you're getting the bogus extra projections shown in the 0005 patch. > They're not bogus. With the patches, projecting away the junk columns is > visible in the plan as an extra Result node, while currently it's done > as an implicit step in the executor. That seems fine and arguably an > even more honest representation of what's happening, although I don't > like the extra verbosity in EXPLAIN output. I didn't bring it up in my previous message, but I'm not really on board with trying to get rid of the executor junkfilter. It seems to me that that code is probably faster than a general-purpose projection, and surely faster than an extra Result node, so I think that is not a goal that would improve matters. However, what I *was* trying to say is that I think those projections occur because the lower-level plan node is still outputting the columns you want to suppress, and then the planner realizes that it needs a projection to create the shortened tlist. But that's not saving any work, because we still computed the expensive function :-(. We need a more thorough treatment of the problem to ensure that the lower-level plan nodes don't emit these columns either. > Related to this, we are not currently costing the target list evaluation > correctly for index-only scans. Right, and we're dumb in other ways too: if the index can return f(x) but not x, we fail to realize that we can use it for an IOS in the first place, because there's an assumption that we'd better be able to fetch x. Currently I think the best way around that might be via the other discussion that's going on about unifying regular indexscans and index-only scans [1]. If we do that then we could postpone the decision about whether we actually need x itself, and perhaps that would simplify getting this right. I'm kind of inclined to think that it'd be best to push the other discussion forward first, and come back to this area when it's done, because that will be touching a lot of the same code as well as (maybe) simplifying the planner's problem. regards, tom lane [1] https://www.postgresql.org/message-id/20230609000600.syqy447e6metnvyj%40awork3.anarazel.de
2024-01 Commitfest. Hi, This patch has a CF status of "Needs Review", but it seems like there were some CFbot test failures last time it was run [1]. Please have a look and post an updated version if necessary. ====== [1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4717