Re: enable_incremental_sort changes query behavior

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: enable_incremental_sort changes query behavior
Дата
Msg-id 124400.1606159456@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: enable_incremental_sort changes query behavior  (James Coleman <jtc331@gmail.com>)
Ответы Re: enable_incremental_sort changes query behavior
Список pgsql-hackers
James Coleman <jtc331@gmail.com> writes:
> But I think that still leaves something missing: after all,
> prepare_sort_from_pathkeys does know how to insert new target list
> entries, so something either there (or in the caller/how its called)
> has to be enforcing an apparently implicit rule about what point in
> the tree it's safe to do such. Or even just no path generation had
> ever considered it before (that feels unsatisfactory as an explanation
> to me, because it feels more implicit than I'd like, but...)

Hi guys,

I hadn't been paying any attention to this thread, but Robert asked
me to take a look.  A few comments:

1. James was wondering, far upthread, why we would do projections
pre-sort or post-sort.  I think the answers can be found by studying
planner.c's make_sort_input_target(), which separates out what we want
to do pre-sort and post-sort.  (There, the "sort" is envisioned as a
standalone Sort node atop all the scan/join steps; but its decisions
nonetheless constrain what will be considered for sorting that's
implemented further down.)  It has a *very* long introductory comment
laying out all the considerations.

2. Robert is correct that under normal circumstances, the targetlists of
both baserels and join rels contain only Vars.  Any computations we have
to do are postponed to the final top-level targetlist, whether they are
volatile or not.  The fact that this policy is applied regardless of
volatility may explain why James isn't seeing volatility checks where he
expected them.  The core (and, I think, only) exception to this is that
an expression that is a sort or group key has to be evaluated earlier.
But even then, it won't be pushed down as far as the reltargets of any
scan or join relations; it'll be computed after the final join.

3. If we have a volatile sort/group key, that constrains the set of plans
we can validly generate, because the key expression mustn't be evaluated
redundantly.  With the addition of parallelism, a non-parallel-safe
sort/group key expression likewise constrains the set of valid plans,
even if it's not considered volatile.

4. In the past, the only way we'd consider non-Var sort keys in any way
during scan/join planning was if (a) a relation had an expression index
matching a requested sort key; of course, that's a-fortiori going to be
a non-volatile expression.  Or (b) we needed to sort in order to provide
suitable input for a merge join ... but again, volatile expressions
aren't considered candidates for mergejoin.  So there basically wasn't
any need to consider sorting by volatiles below the top level sort/group
processing, and that again contributes to why you don't see explicit
tests in that area.  I'm not sure offhand whether the parallel-query
patches added anything to guard against nonvolatile-but-parallel-unsafe
sort expressions.  If not, there might be live bugs there independently
of incremental sort; or that might be unreachable because of overall
limitations on parallelism.

5. While I've not dug far enough into the code to identify the exact
issue, the impression I have about this bug and the parallel-unsafe
subplan bug is that the incremental sort code is generating Paths
without due regard to point 3.  If it is making paths based on explicit
sorts for final-stage pathkeys, independently of either expression
indexes or mergejoin requirements, that could well explain why it's
got problems that weren't seen before.

6. I did look at ebb7ae839, and I suspect that the last part of
find_em_expr_usable_for_sorting_rel(), ie the search of the reltarget
list, is dead code.  There is no case where a volatile expression would
appear there, per point 2.  The committed regression test cases
certainly do not provide a counterexample: a look at the code coverage
report will show you that that loop never finds a match in any
regression test case.  I'd be inclined to flush that code and just
say that we won't return volatile expressions here.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Corbit, Dann"
Дата:
Сообщение: Re: Connection using ODBC and SSL
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: error_severity of brin work item