Re: upper planner path-ification

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: upper planner path-ification
Дата
Msg-id CANP8+jKeGV0oF2SaOR3HyiE_KjcwF6GWT3N4nDVZNuUyG6BKbQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: upper planner path-ification  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: upper planner path-ification  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 18 May 2015 at 14:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, May 17, 2015 at 12:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Rather than adding tlists per se to Paths, I've been vaguely toying with
>> a notion of identifying all the "interesting" subexpressions in a query
>> (expensive functions, aggregates, etc), giving them indexes 1..n, and then
>> marking Paths with bitmapsets showing which interesting subexpressions
>> they can produce values for.  This would make things like "does this Path
>> compute all the needed aggregates" much cheaper to deal with than a raw
>> tlist representation would do.  But maybe that's still not the best way.

> I don't know, but it seems like this might be pulling in the opposite
> direction from your previously-stated desire to get subquery_planner
> to output Paths rather than Plans as soon as possible.

Sorry, I didn't mean to suggest that that necessarily had to happen right
away.

What we do need right away, though, is *some* design for distinguishing
Paths for the different possible upper-level steps.  I won't cry if we
change it around later, but we have to have something to start with.

So for the moment, let's assume that we still rigidly follow the sequence
of upper-level steps currently embodied in grouping_planner.  (I'm not
sure if it even makes sense to consider other orderings of those
processing steps, but in any case we don't need to allow it on day zero.)
Then, make a dummy RelOptInfo corresponding to the result of each step,
and insert links to those in new fields in PlannerInfo.  (We create these
*before* starting scan/join planning, so that FDWs, custom scans, etc, can
inject paths into these RelOptInfos if they want, so as to represent cases
like remote aggregation.)  Then just use add_path with the appropriate
target RelOptInfo when producing different ways to do grouping etc.

This is a bit ad-hoc but it would be a place to start.

Comments?

My thinking was to push aggregation down to the lowest level possible in the plan, hopefully a single relation. That way we can generate paths for the current grouping_planner options as well as others, such as these

* Push down aggregate prior to a join (which might then affect join planning)
* Allow parallel queries to follow a scan-aggregate-collectfromslaves-aggregate strategy (hence need for double aggregation semantics)
* Allow a lookaside to a Mat View rather than do a scan-aggregate (assume for now these are maintained correctly)
* Allow a lookaside to an alternate datastore/mechanism via CustomScan (assume these are maintained correctly)

all of which need to be costed against each other and the current strategies (aggregate last).

The above proposal sounds like it will do that, but not completely sure.

I'm assuming the O(N^2) Mat View planning problem can be solved in part by recognizing that many MVs are just single-table plus aggregates, and that we'd have a small enough number of MVs in play that search would not be a problem in practice.

I'm also aware that LIMIT is still very badly optimized, so I'm hoping it helps there also.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: jsonb concatenate operator's semantics seem questionable
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: jsonb concatenate operator's semantics seem questionable