Re: [HACKERS] Performance improvement for joins where outer side is unique
| От | Antonin Houska | 
|---|---|
| Тема | Re: [HACKERS] Performance improvement for joins where outer side is unique | 
| Дата | |
| Msg-id | 18764.1485522844@localhost обсуждение исходный текст | 
| Ответ на | Re: Performance improvement for joins where outer side is unique (David Rowley <david.rowley@2ndquadrant.com>) | 
| Список | pgsql-hackers | 
I thought about the patch from the perspective of "grouped relations" (especially [1]). When looking for the appropriate context within the thread, I picked this message. David Rowley <david.rowley@2ndquadrant.com> wrote: > On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > It seems like the major intellectual complexity here is to figure out > > how to detect inner-side-unique at reasonable cost. I see that for > > LEFT joins you're caching that in the SpecialJoinInfos, which is probably > > fine. But for INNER joins it looks like you're just doing it over again > > for every candidate join, and that seems mighty expensive. > ... I'll look into that. > > The other thing I thought of was to add a dedicated list for unique > indexes in RelOptInfo, this would also allow > rel_supports_distinctness() to do something a bit smarter than just > return false if there's no indexes. That might not buy us much though, > but at least relations tend to have very little unique indexes, even > when they have lots of indexes. I'm thinking of a concept of "unique keys", similar to path keys that the planner already uses. Besides the current evaluation of uniqueness of the inner side of a join, the planner would (kind of) union the unique keys of the joined rels, ie compute a list of expressions which generates an unique row throughout the new join result. (Requirement is that each key must be usable in join expression, as opposed to filter.) To figure out whether at most one inner row exists per outer row, each unique key of the inner relation which references the outer relation needs to match an unique key of the outer relation (but it's probably wrong if multiple unique keys of the inner rel reference the same key of the outer rel). Like path key, the unique key would also point to an equivalence class. Thus mere equality of the EC pointers could perhaps be used to evaluate the match of the inner and outer keys. Given that rel_is_distinct_for() currently does not accept joins, this change would make the patch more generic. (BTW, with this approach, unique_rels and non_unique_rels caches would have to be stored per-relation (RelOptInfo), as opposed to PlannerInfo.) The reason I'd like the unique keys is that - from the "grouped relation" point of view - the relation uniqueness also needs to be checked against the GROUP BY clause. Thus the "unique keys" concept seem to me like an useful abstraction. Does this proposal seem to have a serious flaw? [1] https://www.postgresql.org/message-id/CAKJS1f_h1CLff92B%3D%2BbdrMK2Nf3EfGWaJu2WbzQUYcSBUi02ag%40mail.gmail.com -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
В списке pgsql-hackers по дате отправления: