Re: Index Skip Scan
От | Jesper Pedersen |
---|---|
Тема | Re: Index Skip Scan |
Дата | |
Msg-id | 9f4ef571-2f9e-2f5f-e2e3-c1632f3a3d7b@redhat.com обсуждение исходный текст |
Ответ на | Re: Index Skip Scan (David Rowley <david.rowley@2ndquadrant.com>) |
Список | pgsql-hackers |
Hi Thomas and David, Thanks for the feedback ! On 7/2/19 8:27 AM, David Rowley wrote: > On Tue, 2 Jul 2019 at 21:00, Thomas Munro <thomas.munro@gmail.com> wrote: >> I took this for a quick spin today. The DISTINCT ON support is nice >> and I think it will be very useful. I've signed up to review it and >> will have more to say later. But today I had a couple of thoughts >> after looking into how src/backend/optimizer/plan/planagg.c works and >> wondering how to do some more skipping tricks with the existing >> machinery. >> >> 1. SELECT COUNT(DISTINCT i) FROM t could benefit from this. (Or >> AVG(DISTINCT ...) or any other aggregate). Right now you get a seq >> scan, with the sort/unique logic inside the Aggregate node. If you >> write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get >> a skip scan that is much faster in good cases. I suppose you could >> have a process_distinct_aggregates() in planagg.c that recognises >> queries of the right form and generates extra paths a bit like >> build_minmax_path() does. I think it's probably better to consider >> that in the grouping planner proper instead. I'm not sure. > > I think to make that happen we'd need to do a bit of an overhaul in > nodeAgg.c to allow it to make use of presorted results instead of > having the code blindly sort rows for each aggregate that has a > DISTINCT or ORDER BY. The planner would also then need to start > requesting paths with pathkeys that suit the aggregate and also > probably dictate the order the AggRefs should be evaluated to allow > all AggRefs to be evaluated that can be for each sort order. Once > that part is done then the aggregates could then also request paths > with certain "UniqueKeys" (a feature I mentioned in [1]), however we'd > need to be pretty careful with that one since there may be other parts > of the query that require that all rows are fed in, not just 1 row per > value of "i", e.g SELECT COUNT(DISTINCT i) FROM t WHERE z > 0; can't > just feed through 1 row for each "i" value, since we need only the > ones that have "z > 0". Getting the first part of this solved is much > more important than making skip scans work here, I'd say. I think we > need to be able to walk before we can run with DISTINCT / ORDER BY > aggs. > I agree that the above is outside of scope for the first patch -- I think the goal should be the simple use-cases for IndexScan and IndexOnlyScan. Maybe we should expand [1] with possible cases, so we don't lose track of the ideas. >> 2. SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if >> you're allowed to go forwards. Same for SELECT i, MAX(j) FROM t GROUP >> BY i if you're allowed to go backwards. Those queries are equivalent >> to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC] >> (though as Floris noted, the backwards version gives the wrong answers >> with v20). That does seem like a much more specific thing applicable >> only to MIN and MAX, and I think preprocess_minmax_aggregates() could >> be taught to handle that sort of query, building an index only scan >> path with skip scan in build_minmax_path(). > > For the MIN query you just need a path with Pathkeys: { i ASC, j ASC > }, UniqueKeys: { i, j }, doing the MAX query you just need j DESC. > Ok. > The more I think about these UniqueKeys, the more I think they need to > be a separate concept to PathKeys. For example, UniqueKeys: { x, y } > should be equivalent to { y, x }, but with PathKeys, that's not the > case, since the order of each key matters. UniqueKeys equivalent > version of pathkeys_contained_in() would not care about the order of > individual keys, it would say things like, { a, b, c } is contained in > { b, a }, since if the path is unique on columns { b, a } then it must > also be unique on { a, b, c }. > I'm looking at this, and will keep this in mind. Thanks ! [1] https://wiki.postgresql.org/wiki/Loose_indexscan Best regards, Jesper
В списке pgsql-hackers по дате отправления: