Обсуждение: Avoid computing ORDER BY junk columns unnecessarily

Поиск
Список
Период
Сортировка

Avoid computing ORDER BY junk columns unnecessarily

От
Heikki Linnakangas
Дата:
Problem
-------

We are using junk columns for (at least) two slightly different purposes:

1. For passing row IDs and other such data from lower plan nodes to 
LockRows / ModifyTable.

2. To represent ORDER BY and GROUP BY columns that don't appear in the 
SELECT list. For example, in a query like:

     SELECT foo FROM mytable ORDER BY bar;

The parser adds 'bar' to the target list as a junk column. You can see 
that with EXPLAIN VERBOSE:

explain (verbose, costs off)
   select foo from mytable order by bar;

             QUERY PLAN
----------------------------------
  Sort
    Output: foo, bar
    Sort Key: mytable.bar
    ->  Seq Scan on public.mytable
          Output: foo, bar
(5 rows)

The 'bar' column get filtered away in the executor, by the so-called 
junk filter. That's fine for simple cases like the above, but in some 
cases, that causes the ORDER BY value to be computed unnecessarily. For 
example:

create table mytable (foo text, bar text);
insert into mytable select g, g from generate_series(1, 10000) g;
create index on mytable (sha256(bar::bytea));
explain verbose
  select foo from mytable order by sha256(bar::bytea);

                                            QUERY PLAN 

------------------------------------------------------------------------------------------------
  Index Scan using mytable_sha256_idx on public.mytable 
(cost=0.29..737.28 rows=10000 width=64)
    Output: foo, sha256((bar)::bytea)
(2 rows)

The index is used to satisfy the ORDER BY, but the expensive ORDER BY 
expression is still computed for every row, just to be thrown away by 
the junk filter.

This came up with pgvector, as the vector distance functions are pretty 
expensive. All vector operations are expensive, so one extra distance 
function call per row doesn't necessarily make that much difference, but 
it sure looks silly. See 
https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021 
(thanks Matthias for the investigation!).

Solution
--------

The obvious solution is that the planner should not include those junk 
columns in the plan. But how exactly to implement that is a different 
matter.

I came up with the attached patch set, which adds a projection to all 
the paths at the end of planning in grouping_planner(). The projection 
filters out the unnecessary junk columns. With that, the plan for the 
above example:

postgres=# explain verbose select foo from mytable order by 
sha256(bar::bytea);
                                           QUERY PLAN 

-----------------------------------------------------------------------------------------------
  Index Scan using mytable_sha256_idx on public.mytable 
(cost=0.29..662.24 rows=10000 width=4)
    Output: foo
(2 rows)


Problems with the solution
--------------------------

So this seems to work, but I have a few doubts:

1. Because Sort cannot project, this adds an extra Result node on top of 
Sort nodes when the the ORDER BY is implemented by sorting:

postgres=# explain verbose select foo from mytable order by bar;
                                    QUERY PLAN 

--------------------------------------------------------------------------------
  Result  (cost=818.39..843.39 rows=10000 width=4)
    Output: foo
    ->  Sort  (cost=818.39..843.39 rows=10000 width=8)
          Output: foo, bar
          Sort Key: mytable.bar
          ->  Seq Scan on public.mytable  (cost=0.00..154.00 rows=10000 
width=8)
                Output: foo, bar
(7 rows)

 From a performance point of view, I think that's not as bad as it 
sounds. Remember that without this patch, the executor needs to execute 
the junk filter to filter out the extra column instead. It's not clear 
that an extra Result is worse than that, although I haven't tried 
benchmarking it though.

This makes plans for queries like above more verbose though. Perhaps we 
should teach Sort (and MergeAppend) to do projection, just to avoid that?

Another solution would be to continue relying on the junk filter, if 
adding the projection in the planner leads to an extra Result node. 
That's a bit ugly, because then the final target list of a (sub)query 
depends on the Path that's chosen.

2. Instead of tacking on the projection to the paths at the end, I first 
tried modifying the code earlier in grouping_planner() that computes the 
target lists for the different plan stages. That still feels like a 
cleaner approach to me, although I don't think there's any difference in 
the generated plans in practice. However I ran into some problems with 
that approach and gave up.

I basically tried to remove the junk columns from 'final_target', and 
have create_ordered_paths() create paths with the filtered target list 
directly. And if there is no ORDER BY, leave out the junk columns from 
'grouping_target' too, and have create_grouping_paths() generate the 
final target list directly. However, create_grouping_paths() assumes 
that the grouping columns are included in 'grouping_target'. And 
similarly in create_ordered_paths(), some partial planning stages assume 
that the ordering columns are included in 'final_target'. Those 
assumptions could probably be fixed, but I ran out of steam trying to do 
that.

3. I also considered if we should stop using junk columns to represent 
ORDER BY / GROUP BY columns like this in the first place. Perhaps we 
should have a separate list for those and not stash them in the target 
list. But that seems like a much bigger change.

4. It would be nice to get rid of the junk filter in the executor 
altogether. With this patch, the junk columns used for RowLocks and 
ModifyTable are still included in the final target list, and are still 
filtered out by the junk filter. But we could add similar projections 
between the RowLocks and ModifyTable stages, to eliminate all the junk 
columns at the top of the plan. ExecFilterJunk() isn't a lot of code, 
but it would feel cleaner to me. I didn't try to implement that.

5. I'm not sure the categorization of junk columns that I implemented 
here is the best one. It might make sense to have more categories, and 
distinguish row-id columns from others for example. And ORDER BY columns 
from GROUP BY columns.

Patches
-------

So the attached patches implement that idea, with the above-mentioned 
problems. I think this approach is fine as it is, despite those 
problems, but my planner-fu is a rusty so I really need review and a 
second opinion on this.

v1-0001-Add-test-for-Min-Max-optimization-with-kNN-index-.patch
v1-0002-Show-how-ORDER-BY-expression-is-computed-unnecess.patch

These patches just add to existing tests to demonstrate the problem.

v1-0003-Turn-resjunk-into-an-enum.patch

Replace 'resjunk' boolean with an enum, so that we can distinguish 
between different junk columns. The next patch uses that information to 
identify junk columns that can be filtered out. It's is a separate patch 
for easier review.

v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch

The main patch in this series.

v1-0005-Fix-regression-tests-caused-by-additional-Result-.patch

Regression test output changes, for all the plans with Sort that now 
have Sort + Result. See "Problem with the solution" #1.

-- 
Heikki Linnakangas
Neon (https://neon.tech)
Вложения

Re: Avoid computing ORDER BY junk columns unnecessarily

От
Xiaoran Wang
Дата:
Hi Heikii, 
I haven't dug into your patch yet, but for this problem, I have another idea.
-------
explain verbose
  select foo from mytable order by sha256(bar::bytea);

                                            QUERY PLAN

------------------------------------------------------------------------------------------------
  Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..737.28 rows=10000 width=64)
    Output: foo, sha256((bar)::bytea)
(2 rows)

The index is used to satisfy the ORDER BY, but the expensive ORDER BY
expression is still computed for every row, just to be thrown away by
the junk filter.
------
How about adding the orderby column value  in 'xs_heaptid' with the 'xs_heaptid'
together? So that we can use that value directly instead of computing it when using
an index scan to fetch the ordered data.
Another problem I am concerned about is that if we exclude junk
columns in the sort plan, we may change its behavior. I'm not sure if it can lead to some 
other issues. 

Heikki Linnakangas <hlinnaka@iki.fi> 于2023年12月22日周五 02:39写道:
Problem
-------

We are using junk columns for (at least) two slightly different purposes:

1. For passing row IDs and other such data from lower plan nodes to
LockRows / ModifyTable.

2. To represent ORDER BY and GROUP BY columns that don't appear in the
SELECT list. For example, in a query like:

     SELECT foo FROM mytable ORDER BY bar;

The parser adds 'bar' to the target list as a junk column. You can see
that with EXPLAIN VERBOSE:

explain (verbose, costs off)
   select foo from mytable order by bar;

             QUERY PLAN
----------------------------------
  Sort
    Output: foo, bar
    Sort Key: mytable.bar
    ->  Seq Scan on public.mytable
          Output: foo, bar
(5 rows)

The 'bar' column get filtered away in the executor, by the so-called
junk filter. That's fine for simple cases like the above, but in some
cases, that causes the ORDER BY value to be computed unnecessarily. For
example:

create table mytable (foo text, bar text);
insert into mytable select g, g from generate_series(1, 10000) g;
create index on mytable (sha256(bar::bytea));
explain verbose
  select foo from mytable order by sha256(bar::bytea);

                                            QUERY PLAN

------------------------------------------------------------------------------------------------
  Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..737.28 rows=10000 width=64)
    Output: foo, sha256((bar)::bytea)
(2 rows)

The index is used to satisfy the ORDER BY, but the expensive ORDER BY
expression is still computed for every row, just to be thrown away by
the junk filter.

This came up with pgvector, as the vector distance functions are pretty
expensive. All vector operations are expensive, so one extra distance
function call per row doesn't necessarily make that much difference, but
it sure looks silly. See
https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021
(thanks Matthias for the investigation!).

Solution
--------

The obvious solution is that the planner should not include those junk
columns in the plan. But how exactly to implement that is a different
matter.

I came up with the attached patch set, which adds a projection to all
the paths at the end of planning in grouping_planner(). The projection
filters out the unnecessary junk columns. With that, the plan for the
above example:

postgres=# explain verbose select foo from mytable order by
sha256(bar::bytea);
                                           QUERY PLAN

-----------------------------------------------------------------------------------------------
  Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..662.24 rows=10000 width=4)
    Output: foo
(2 rows)


Problems with the solution
--------------------------

So this seems to work, but I have a few doubts:

1. Because Sort cannot project, this adds an extra Result node on top of
Sort nodes when the the ORDER BY is implemented by sorting:

postgres=# explain verbose select foo from mytable order by bar;
                                    QUERY PLAN

--------------------------------------------------------------------------------
  Result  (cost=818.39..843.39 rows=10000 width=4)
    Output: foo
    ->  Sort  (cost=818.39..843.39 rows=10000 width=8)
          Output: foo, bar
          Sort Key: mytable.bar
          ->  Seq Scan on public.mytable  (cost=0.00..154.00 rows=10000
width=8)
                Output: foo, bar
(7 rows)

 From a performance point of view, I think that's not as bad as it
sounds. Remember that without this patch, the executor needs to execute
the junk filter to filter out the extra column instead. It's not clear
that an extra Result is worse than that, although I haven't tried
benchmarking it though.

This makes plans for queries like above more verbose though. Perhaps we
should teach Sort (and MergeAppend) to do projection, just to avoid that?

Another solution would be to continue relying on the junk filter, if
adding the projection in the planner leads to an extra Result node.
That's a bit ugly, because then the final target list of a (sub)query
depends on the Path that's chosen.

2. Instead of tacking on the projection to the paths at the end, I first
tried modifying the code earlier in grouping_planner() that computes the
target lists for the different plan stages. That still feels like a
cleaner approach to me, although I don't think there's any difference in
the generated plans in practice. However I ran into some problems with
that approach and gave up.

I basically tried to remove the junk columns from 'final_target', and
have create_ordered_paths() create paths with the filtered target list
directly. And if there is no ORDER BY, leave out the junk columns from
'grouping_target' too, and have create_grouping_paths() generate the
final target list directly. However, create_grouping_paths() assumes
that the grouping columns are included in 'grouping_target'. And
similarly in create_ordered_paths(), some partial planning stages assume
that the ordering columns are included in 'final_target'. Those
assumptions could probably be fixed, but I ran out of steam trying to do
that.

3. I also considered if we should stop using junk columns to represent
ORDER BY / GROUP BY columns like this in the first place. Perhaps we
should have a separate list for those and not stash them in the target
list. But that seems like a much bigger change.

4. It would be nice to get rid of the junk filter in the executor
altogether. With this patch, the junk columns used for RowLocks and
ModifyTable are still included in the final target list, and are still
filtered out by the junk filter. But we could add similar projections
between the RowLocks and ModifyTable stages, to eliminate all the junk
columns at the top of the plan. ExecFilterJunk() isn't a lot of code,
but it would feel cleaner to me. I didn't try to implement that.

5. I'm not sure the categorization of junk columns that I implemented
here is the best one. It might make sense to have more categories, and
distinguish row-id columns from others for example. And ORDER BY columns
from GROUP BY columns.

Patches
-------

So the attached patches implement that idea, with the above-mentioned
problems. I think this approach is fine as it is, despite those
problems, but my planner-fu is a rusty so I really need review and a
second opinion on this.

v1-0001-Add-test-for-Min-Max-optimization-with-kNN-index-.patch
v1-0002-Show-how-ORDER-BY-expression-is-computed-unnecess.patch

These patches just add to existing tests to demonstrate the problem.

v1-0003-Turn-resjunk-into-an-enum.patch

Replace 'resjunk' boolean with an enum, so that we can distinguish
between different junk columns. The next patch uses that information to
identify junk columns that can be filtered out. It's is a separate patch
for easier review.

v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch

The main patch in this series.

v1-0005-Fix-regression-tests-caused-by-additional-Result-.patch

Regression test output changes, for all the plans with Sort that now
have Sort + Result. See "Problem with the solution" #1.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Avoid computing ORDER BY junk columns unnecessarily

От
Heikki Linnakangas
Дата:
On 22/12/2023 11:05, Xiaoran Wang wrote:
> I haven't dug into your patch yet, but for this problem, I have another 
> idea.
> -------
> explain verbose
>    select foo from mytable order by sha256(bar::bytea);
> 
>                                              QUERY PLAN
> 
> ------------------------------------------------------------------------------------------------
>    Index Scan using mytable_sha256_idx on public.mytable
> (cost=0.29..737.28 rows=10000 width=64)
>      Output: foo, sha256((bar)::bytea)
> (2 rows)
> 
> The index is used to satisfy the ORDER BY, but the expensive ORDER BY
> expression is still computed for every row, just to be thrown away by
> the junk filter.
> ------
> How about adding the orderby column value  in 'xs_heaptid' with the 
> 'xs_heaptid'
> together? So that we can use that value directly instead of computing it 
> when using
> an index scan to fetch the ordered data.

Hmm, so return the computed column from the index instead of recomputing 
it? Yeah, that makes sense too and would help in this example. It won't 
help in all cases though, the index might not store the original value 
in the first place.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Avoid computing ORDER BY junk columns unnecessarily

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Hmm, so return the computed column from the index instead of recomputing 
> it? Yeah, that makes sense too and would help in this example.

Yeah, that's been on the to-do list for ages.  The main problems are
(1) we need the planner to not spend too much effort on looking for
subexpression matches, and (2) amcanreturn ability isn't implemented
by the executor in plain indexscans.  There's another thread right now
discussing fixing (2), after which we could perhaps work on this.

> It won't 
> help in all cases though, the index might not store the original value 
> in the first place.

I'm a little skeptical that an index could produce an accurate ORDER BY
result if it doesn't store the values-to-be-sorted exactly.  Any loss
of information would compromise its ability to sort nearly-identical
values correctly.  A more credible argument is that the index might
expose amcanorder ability but not amcanreturn; but what I'm saying is
that that's probably an AM implementation gap that ought to be fixed.

How much of your patchset still makes sense if we assume that we
can always extract the ORDER BY column values from the index?

            regards, tom lane



Re: Avoid computing ORDER BY junk columns unnecessarily

От
Heikki Linnakangas
Дата:
On 22/12/2023 17:24, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> It won't
>> help in all cases though, the index might not store the original value
>> in the first place.
> 
> I'm a little skeptical that an index could produce an accurate ORDER BY
> result if it doesn't store the values-to-be-sorted exactly.  Any loss
> of information would compromise its ability to sort nearly-identical
> values correctly.

In the context of pgvector, its ordering is approximate anyway. Aside 
from that, there's one trick that it implements: it compares squares of 
distances, avoiding a sqrt() calculation. (I wonder if we could do the 
same in GiST opclasses)

> A more credible argument is that the index might
> expose amcanorder ability but not amcanreturn; but what I'm saying is
> that that's probably an AM implementation gap that ought to be fixed.
> 
> How much of your patchset still makes sense if we assume that we
> can always extract the ORDER BY column values from the index?

That would make it much less interesting. But I don't think that's a 
good assumption. Especially in the kNN case, the ORDER BY value would 
not be stored in the index. Most likely the index needs to calculate it 
in some form, but it might take shortcuts like avoiding the sqrt().

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Avoid computing ORDER BY junk columns unnecessarily

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 22/12/2023 17:24, Tom Lane wrote:
>> How much of your patchset still makes sense if we assume that we
>> can always extract the ORDER BY column values from the index?

> That would make it much less interesting. But I don't think that's a 
> good assumption. Especially in the kNN case, the ORDER BY value would 
> not be stored in the index. Most likely the index needs to calculate it 
> in some form, but it might take shortcuts like avoiding the sqrt().

Yeah, fair point.  I'll try to take a look at your patchset after
the holidays.

            regards, tom lane



Re: Avoid computing ORDER BY junk columns unnecessarily

От
Richard Guo
Дата:

On Sat, Dec 23, 2023 at 1:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 22/12/2023 17:24, Tom Lane wrote:
>> How much of your patchset still makes sense if we assume that we
>> can always extract the ORDER BY column values from the index?

> That would make it much less interesting. But I don't think that's a
> good assumption. Especially in the kNN case, the ORDER BY value would
> not be stored in the index. Most likely the index needs to calculate it
> in some form, but it might take shortcuts like avoiding the sqrt().

Yeah, fair point.  I'll try to take a look at your patchset after
the holidays.

Agreed.

I haven't looked into these patches, but it seems that there is an issue
with how the targetlist is handled for foreign rels.  The following test
case for postgres_fdw hits the Assert in apply_tlist_labeling().

contrib_regression=# SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;
server closed the connection unexpectedly

When we create foreign final path in add_foreign_final_paths(), we use
root->upper_targets[UPPERREL_FINAL] as the PathTarget.  This PathTarget
is set in grouping_planner(), without removing the junk columns.  I
think this is why the above query hits the Assert.

Thanks
Richard

Re: Avoid computing ORDER BY junk columns unnecessarily

От
Richard Guo
Дата:

On Fri, Dec 22, 2023 at 2:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch

The main patch in this series.

This patch filters out the junk columns created for ORDER BY/GROUP BY,
and retains the junk columns created for RowLocks.  I'm afraid this may
have a problem about out-of-order resnos.  For instance,

create table mytable (foo text, bar text);

# explain select foo from mytable order by bar for update of mytable;
server closed the connection unexpectedly

This query triggers another Assert in apply_tlist_labeling:

    Assert(dest_tle->resno == src_tle->resno);

At first there are three TargetEntry items: foo, bar and ctid, with
resnos being 1, 2 and 3.  And then the second item 'bar' is removed,
leaving only two items: foo and ctid, with resnos being 1 and 3.  So now
we have a missing resno, and finally hit the Assert.

Thanks
Richard

Re: Avoid computing ORDER BY junk columns unnecessarily

От
Tom Lane
Дата:
I wrote:
> Yeah, fair point.  I'll try to take a look at your patchset after
> the holidays.

I spent some time looking at this patch, and I'm not very pleased
with it.  My basic complaint is that this is a band-aid that only
touches things at a surface level, whereas I think we need a much
deeper set of changes in order to have a plausible solution.
Specifically, you're only munging the final top-level targetlist
not anything for lower plan levels.  That looks like it works in
simple cases, but it leaves everything on the table as soon as
there's more than one level of plan involved.  I didn't stop to
trace the details but I'm pretty sure this is why you're getting the
bogus extra projections shown in the 0005 patch.  Moreover, this
isn't doing anything for cost estimates, which means the planner
doesn't really understand that more-desirable plans are more
desirable, and it may not pick an available plan that would exploit
what we want to have happen here.

As an example, say we have an index on f(x) and the query requires
sorting by f(x) but not final output of f(x).  If we devise a plan
that uses the index to sort and doesn't emit f(x), we need to not
charge the evaluation cost of f() for that plan --- this might
make the difference between picking that plan and not.  Right now
I believe we'll charge all plans for f(), so that some other plan
might look cheaper even when f() is very expensive.

Another example: we might be using an indexscan but not relying on
its sort order, for example because its output is going into a hash
join and then we'll sort afterwards.  For expensive f() it would
still be useful if the index could emit f(x) so that we don't have
to calculate f() at the sort step.  Right now I don't think we can
even emit that plan shape, never mind recognize why it's cheaper.

I have only vague ideas about how to do this better.  It might work
to set up multiple PathTargets for a relation that has such an
index, one for the base case where the scan only emits x and f() is
computed above, one for the case where we don't need either x or
f(x) because we're relying on the index order, and one that emits
f(x) with the expectation that a sort will happen above.  Then we'd
potentially generate multiple Paths representing the very same
indexscan but with different PathTargets, and differing targets
would have to become something that add_path treats as a reason to
keep multiple Paths for the same relation.  I'm a little frightened
about the possible growth in number of paths considered, but how
else would we keep track of the differing costs of these cases?

So your proposed patch is far from a final solution in this area,
and I'm not even sure that it represents a plausible incremental
step.  I've got mixed feelings about the "make resjunk an enum"
idea --- it seems quite invasive and I'm not sure it's buying
enough to risk breaking non-planner code for.  It might be
better to leave the targetlist representation alone and deal
with this strictly inside the planner.  We could, for example,
add an array to PlannerInfo that's indexed by query-tlist resno
and indicates the reason for a particular tlist item to exist.

Also, as you mentioned, this categorization of junk columns
seems a little unprincipled.  It might help to think about that
in terms similar to the "attr_needed" data we keep for Vars,
that is: how far up in the plan tree is this tlist item needed?
I think the possibilities are:

* needed for final output (ie, not resjunk)

* needed for final grouping/sorting steps (I think all
  resjunk items produced by the parser are this case;
  but do we need to distinguish GROUP BY from ORDER BY?)
  
* needed at the ModifyTable node (for rowmark columns)

* needed at the SetOp node (for flag columns added in
  prepunion.c)

It looks to me like these cases are mutually exclusive, so
that we just need an enum value not a bitmask.  Only
"needed for final grouping/sorting" is something we can
potentially omit from the plan.

BTW, the hack you invented JUNK_PLANNER_ONLY for, namely
that create_indexscan_plan feeds canreturn flags forward to
set_indexonlyscan_references, could be done another way:
we don't really have to overload resjunk for that.  At worst,
set_indexonlyscan_references could re-look-up the IndexOptInfo
using the index OID from the plan node, and get the canreturn
flags directly from that.  If we keep resjunk as a bool I'd be
inclined to leave this alone; but if we change resjunk, we
don't need the replacement design to account for this bit.

            regards, tom lane



Re: Avoid computing ORDER BY junk columns unnecessarily

От
Heikki Linnakangas
Дата:
On 29/12/2023 01:42, Tom Lane wrote:
> I wrote:
>> Yeah, fair point.  I'll try to take a look at your patchset after
>> the holidays.
> 
> I spent some time looking at this patch, and I'm not very pleased
> with it.  My basic complaint is that this is a band-aid that only
> touches things at a surface level, whereas I think we need a much
> deeper set of changes in order to have a plausible solution.
> Specifically, you're only munging the final top-level targetlist
> not anything for lower plan levels.  That looks like it works in
> simple cases, but it leaves everything on the table as soon as
> there's more than one level of plan involved.

Yeah, that's fair.

> I didn't stop to trace the details but I'm pretty sure this is why
> you're getting the bogus extra projections shown in the 0005 patch.
They're not bogus. With the patches, projecting away the junk columns is 
visible in the plan as an extra Result node, while currently it's done 
as an implicit step in the executor. That seems fine and arguably an 
even more honest representation of what's happening, although I don't 
like the extra verbosity in EXPLAIN output.

> Moreover, this isn't doing anything for cost estimates, which means
> the planner doesn't really understand that more-desirable plans are
> more desirable, and it may not pick an available plan that would
> exploit what we want to have happen here.
> 
> As an example, say we have an index on f(x) and the query requires
> sorting by f(x) but not final output of f(x).  If we devise a plan
> that uses the index to sort and doesn't emit f(x), we need to not
> charge the evaluation cost of f() for that plan --- this might
> make the difference between picking that plan and not.  Right now
> I believe we'll charge all plans for f(), so that some other plan
> might look cheaper even when f() is very expensive.
> > Another example: we might be using an indexscan but not relying on
> its sort order, for example because its output is going into a hash
> join and then we'll sort afterwards.  For expensive f() it would
> still be useful if the index could emit f(x) so that we don't have
> to calculate f() at the sort step.  Right now I don't think we can
> even emit that plan shape, never mind recognize why it's cheaper.

Related to this, we are not currently costing the target list evaluation 
correctly for index-only scans. Here's an example:

create or replace function expensive(i int) returns int
language plpgsql as
$$
   begin return i; end;
$$
immutable cost 1000000;

create table atab (i int);
insert into atab select g from generate_series(1, 10000) g;
create index on atab (i, expensive(i));

postgres=# explain verbose select expensive(i) from atab order by 
expensive(i);
                                  QUERY PLAN 

----------------------------------------------------------------------------
  Sort  (cost=25000809.39..25000834.39 rows=10000 width=4)
    Output: (expensive(i))
    Sort Key: (expensive(atab.i))
    ->  Seq Scan on public.atab  (cost=0.00..25000145.00 rows=10000 width=4)
          Output: expensive(i)
(5 rows)

postgres=# set enable_seqscan=off; set enable_bitmapscan=off;
SET
SET
postgres=# explain verbose select expensive(i) from atab order by 
expensive(i);
                                                   QUERY PLAN 

--------------------------------------------------------------------------------------------------------------
  Sort  (cost=25001114.67..25001139.67 rows=10000 width=4)
    Output: (expensive(i))
    Sort Key: (expensive(atab.i))
    ->  Index Only Scan using atab_i_expensive_idx on public.atab 
(cost=0.29..25000450.29 rows=10000 width=4)
          Output: (expensive(i))
(5 rows)

The cost of the index only scan ought to be much lower than the seqscan, 
as it can return the pre-computed expensive(i) from the index.

That could probably be fixed without any of the other changes we've been 
discussing here, though.

> I have only vague ideas about how to do this better.  It might work
> to set up multiple PathTargets for a relation that has such an
> index, one for the base case where the scan only emits x and f() is
> computed above, one for the case where we don't need either x or
> f(x) because we're relying on the index order, and one that emits
> f(x) with the expectation that a sort will happen above.  Then we'd
> potentially generate multiple Paths representing the very same
> indexscan but with different PathTargets, and differing targets
> would have to become something that add_path treats as a reason to
> keep multiple Paths for the same relation.  I'm a little frightened
> about the possible growth in number of paths considered, but how
> else would we keep track of the differing costs of these cases?

Hmm, if there are multiple functions like that in the target list, would 
you need to create different paths for each combination of expressions? 
That could really explode the number of paths.

Perhaps each Path could include "optional" target entries that it can 
calculate more cheaply, with a separate cost for each such expression. 
add_path() would treat the presence of optional target entries as a 
reason to retain the path, but you wouldn't need to keep a separate path 
for each PathTarget.

Another idea is to include f(x) in the PathTarget only if it's "cheap to 
emit". For example, if it's an expression index on f(x). If it turns out 
that f(x) is not needed higher up in the plan, it's not a big error in 
the estimate because it was cheap to emit. That wouldn't work well if 
the index AM could calculate f(x) more cheaply than the executor, but 
the cost was still not trivial. I'm not sure if that situation can arise 
currently.

> * needed for final grouping/sorting steps (I think all
>    resjunk items produced by the parser are this case;
>    but do we need to distinguish GROUP BY from ORDER BY?)

It would seem useful to distinguish GROUP BY and ORDER BY. For example:

SELECT COUNT(*) FROM table GROUP BY a, b ORDER BY a;

If this is implemented as HashAgg + Sort for example, only 'a' would be 
needed by the sort. Including less data in a Sort is good.

I wanted to implement that in my patch already, by removing the junk 
columns needed for grouping but not sorting from sort_input_target. But 
the grouping path generation has some assumptions that the grouping 
output target list includes all the grouping columns. I don't remember 
the exact problem that made me give up on that, but it probably could be 
fixed.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Avoid computing ORDER BY junk columns unnecessarily

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 29/12/2023 01:42, Tom Lane wrote:
>> I didn't stop to trace the details but I'm pretty sure this is why
>> you're getting the bogus extra projections shown in the 0005 patch.

> They're not bogus. With the patches, projecting away the junk columns is
> visible in the plan as an extra Result node, while currently it's done
> as an implicit step in the executor. That seems fine and arguably an
> even more honest representation of what's happening, although I don't
> like the extra verbosity in EXPLAIN output.

I didn't bring it up in my previous message, but I'm not really on
board with trying to get rid of the executor junkfilter.  It seems
to me that that code is probably faster than a general-purpose
projection, and surely faster than an extra Result node, so I think
that is not a goal that would improve matters.

However, what I *was* trying to say is that I think those projections
occur because the lower-level plan node is still outputting the
columns you want to suppress, and then the planner realizes that it
needs a projection to create the shortened tlist.  But that's not
saving any work, because we still computed the expensive function :-(.
We need a more thorough treatment of the problem to ensure that the
lower-level plan nodes don't emit these columns either.

> Related to this, we are not currently costing the target list evaluation
> correctly for index-only scans.

Right, and we're dumb in other ways too: if the index can return
f(x) but not x, we fail to realize that we can use it for an IOS
in the first place, because there's an assumption that we'd better
be able to fetch x.  Currently I think the best way around that
might be via the other discussion that's going on about unifying
regular indexscans and index-only scans [1].  If we do that then
we could postpone the decision about whether we actually need x
itself, and perhaps that would simplify getting this right.

I'm kind of inclined to think that it'd be best to push the
other discussion forward first, and come back to this area
when it's done, because that will be touching a lot of the
same code as well as (maybe) simplifying the planner's problem.

            regards, tom lane

[1] https://www.postgresql.org/message-id/20230609000600.syqy447e6metnvyj%40awork3.anarazel.de



Re: Avoid computing ORDER BY junk columns unnecessarily

От
Peter Smith
Дата:
2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review", but it seems like
there were some CFbot test failures last time it was run [1]. Please
have a look and post an updated version if necessary.

======
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4717