Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Дата
Msg-id 20190708135933.oco3gyloeacdj7yy@development
обсуждение исходный текст
Ответ на Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Ответы Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Список pgsql-hackers
On Mon, Jul 08, 2019 at 09:22:39AM -0400, James Coleman wrote:
>On Sun, Jul 7, 2019 at 5:02 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> We're running query like this:
>>
>>   SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3
>>
>> but we're trying to add the incremental sort *before* the aggregation,
>> because the optimizer also considers group aggregate with a sorted
>> input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
>> actually can do this, but clearly that's nonsense, because we can't
>> possibly know the aggregates yet. Hence the error.
>>
>> If this is the actual issue, we need to ensure we actually can evaluate
>> all the pathkeys. I don't know how to do that yet. I thought that maybe
>> we should modify pathkeys_common_contained_in() to set presorted_keys to
>> 0 in this case.
>>
>> But then I started wondering why we don't see this issue even for
>> regular (non-incremental-sort) paths built in create_ordered_paths().
>> How come we don't see these failures there? I've modified costing to
>> make all incremental sort paths very cheap, and still nothing.
>
>I assume you mean you modified costing to make regular sort paths very cheap?
>

No, I mean costing of incremental sort paths, so that they end up being
the cheapest ones. If some other path is cheaper, we won't see the error
because it only happens when building plan from the cheapest path.

>> So presumably there's a check elsewhere (either implicit or explicit),
>> because create_ordered_paths() uses pathkeys_common_contained_in() and
>> does not have the same issue.
>
>Given this comment in create_ordered_paths():
>
>  generate_gather_paths() will have already generated a simple Gather
>  path for the best parallel path, if any, and the loop above will have
>  considered sorting it.  Similarly, generate_gather_paths() will also
>  have generated order-preserving Gather Merge plans which can be used
>  without sorting if they happen to match the sort_pathkeys, and the loop
>  above will have handled those as well.  However, there's one more
>  possibility: it may make sense to sort the cheapest partial path
>  according to the required output order and then use Gather Merge.
>
>my understanding is that generate_gather_paths() only considers paths
>that already happen to be sorted (not explicit sorts), so I'm
>wondering if it would make more sense for the incremental sort path
>creation for this case to live alongside the explicit ordered path
>creation in create_ordered_paths() rather than in
>generate_gather_paths().
>

How would that solve the issue? Also, we're building a gather path, so
I think generate_gather_paths() is the right place where to do it. And
we're not changing the semantics of generate_gather_paths() - the result
path should be sorted "correctly" with respect to sort_pathkeys.

>If I'm understanding what you're saying properly, I think you'd
>expected create_ordered_paths() to be roughly similar in what it
>considers as partial paths and so have the same problem, and I haven't
>yet read enough of the code to understand if my proposed change
>actually has any impact on the issue we're discussing, but it seems to
>me that it at least fits more with what the comments imply.
>

Roughly. AFAICS the problem is that we're trying to use pathkeys that are
only valid for the (aggregated) upper relation, not before it.

I have to admit I've never quite understood this pathkeys business. I
mean, I know what pathkeys are for, but clearly it's not valid to look at
root->sort_pathkeys at this place. Maybe there's some other field we
should be looking at instead, or maybe it's ensured by grouping_planner in
some implicit way ...

I.e. the question is - when you do a query like

    SELECT a, count(*) FROM t GROUP BY a ORDER BY a, count(*);

and cost the incremental sort extremely cheap, how come we don't end up
with the same issue?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




В списке pgsql-hackers по дате отправления:

Предыдущее
От: James Coleman
Дата:
Сообщение: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11