> So the costing was fairly trivial, we simply do something like
>
> comparison_cost = 2.0 * cpu_operator_cost;
>
> sort_cost = comparison_cost * tuples * LOG2(tuples);
>
> which essentially ignores that there might be multiple columns, or that
> the columns may have sort operator with different costs.
Agree. And distribution of keys.
>
> The question is how reliable the heuristics can be. The current patch
> uses just plain ndistinct, but that seems rather unreliable but I don't
> have a clear idea how to improve that - we may have MCV for the columns
> and perhaps some extended statistics, but I'm not sure how far we can
> run with that.
v8 already uses another algorithm.
>
> Essentially what we need to estimate the number of comparisons for each
> column, to compute better comparison_cost.
Exactly
>> Priorization of the user-provided order can be as simple as giving
>> that comparison_cost a small handicap.
>
> I see no point in doing that, and I don't recall a single place in the
> planner where we do that. If the user specified ORDER BY, we'll slap an
> explicit Sort on top when needed (which acts as the handicap, but in a
> clear way). Otherwise we don't do such things - it'd be just plain
> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
> types, ndistinct etc. but unexpectedly different costs). Also, what
> would be a good value for the handicap?
Again agree. If we have fixed order of columns (ORDER BY) then we should not try
to reorder it. Current patch follows that if I didn't a mistake.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/