Обсуждение: MergeAppend costing

Поиск
Список
Период
Сортировка

MergeAppend costing

От
Robert Haas
Дата:
See the attached test case.  With that setup, this uses MergeAppend:

explain select * from ma_parent order by id limit 10;

But this one does not:

explain select * from ma_parent order by name limit 10;

...which seems odd, because the index on ma_child1 and sorting the
other two ought to still be better than appending all three and
sorting the whole thing.  If you drop ma_child2, you get MergeAppend
again:

begin;
drop table ma_child2;
explain select * from ma_parent order by name limit 10;
rollback;

...which makes me wonder if our costing model is off?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: MergeAppend costing

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> See the attached test case.  With that setup, this uses MergeAppend:
> explain select * from ma_parent order by id limit 10;

> But this one does not:

> explain select * from ma_parent order by name limit 10;

> ...which seems odd, because the index on ma_child1 and sorting the
> other two ought to still be better than appending all three and
> sorting the whole thing.

Not really; what you're not accounting for is that the top-level sort
is a lot cheaper than a full sort of the large child relation would be,
because it gets hacked to do a top-N sort instead of a full sort.

What this example suggests is that we should consider ways to pass
down the top-N-ness to sorts executed as part of a MergeAppend tree.
That seems a tad messy though, both in the executor and the planner.
        regards, tom lane


Re: MergeAppend costing

От
Tom Lane
Дата:
I wrote:
> What this example suggests is that we should consider ways to pass
> down the top-N-ness to sorts executed as part of a MergeAppend tree.
> That seems a tad messy though, both in the executor and the planner.

Actually the executor side of it doesn't seem too bad.  A Limit node
already pokes the limit value into its child Sort node, and if that
doesn't make you gag, then recursing down through MergeAppend to poke
grandchild Sort nodes shouldn't either.

The planner is a bit messier: we really need to know the effective limit
value in create_merge_append_path in order to generate the right path
cost estimates, and that's many call levels removed from where the limit
is currently available.  I think the least invasive way to do it is to
add the constant limit value (if known) as a field in PlannerInfo.
If that's set, *and* the MergeAppendPath is being made for the only "base
relation" in the query (ie, no joins allowed) then apply the limit while
costing any sorts needed.

Any objections to that approach?

BTW, while looking at this I came to the conclusion that the limit value
is being improperly applied in query_planner() ATM.  We factor it into
a cost_sort call there, even in cases where the query has GROUP BY or
DISTINCT; but in such cases the tuples emitted from the join tree don't
necessarily each produce a result tuple, so it's wrong to assume that
"LIMIT n" means we can stop after n tuples.  The effects of this are
probably relatively marginal, but it could cause us to make the wrong
decision about whether sorting is cheaper than a presorted path.
        regards, tom lane