Re: Index Skip Scan

Поиск
Список
Период
Сортировка
От Jesper Pedersen
Тема Re: Index Skip Scan
Дата
Msg-id 9b36ad5c-53c2-47cc-2cc7-10588d24cb47@redhat.com
обсуждение исходный текст
Ответ на Re: Index Skip Scan  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
Hi David,

On 6/14/19 3:19 AM, David Rowley wrote:
> I read over this thread a few weeks ago while travelling back from
> PGCon. (I wish I'd read it on the outward trip instead since it would
> have been good to talk about it in person.)
> 
> First off. I think this is a pretty great feature. It certainly seems
> worthwhile working on it.
> 
> I've looked over the patch just to get a feel for how the planner part
> works and I have a few ideas to share.
> 
> The code in create_distinct_paths() I think should work a different
> way. I think it would be much better to add a new field to Path and
> allow a path to know what keys it is distinct for.  This sort of goes
> back to an idea I thought about when developing unique joins
> (9c7f5229ad) about an easier way to detect fields that a relation is
> unique for. I've been calling these "UniqueKeys" in a few emails [1].
> The idea was to tag these onto RelOptInfo to mention which columns or
> exprs a relation is unique by so that we didn't continuously need to
> look at unique indexes in all the places that call
> relation_has_unique_index_for().  The idea there was that unique joins
> would know when a join was unable to duplicate rows. If the outer side
> of a join didn't duplicate the inner side, then the join RelOptInfo
> could keep the UniqueKeys from the inner rel, and vice-versa. If both
> didn't duplicate then the join rel would obtain the UniqueKeys from
> both sides of the join.  The idea here is that this would be a better
> way to detect unique joins, and also when it came to the grouping
> planner we'd also know if the distinct or group by should be a no-op.
> DISTINCT could be skipped, and GROUP BY could do a group aggregate
> without any sort.
> 
> I think these UniqueKeys ties into this work, perhaps not adding
> UniqueKeys to RelOptInfo, but just to Path so that we create paths
> that have UniqueKeys during create_index_paths() based on some
> uniquekeys that are stored in PlannerInfo, similar to how we create
> index paths in build_index_paths() by checking if the index
> has_useful_pathkeys().  Doing it this way would open up more
> opportunities to use skip scans. For example, semi-joins and
> anti-joins could make use of them if the uniquekeys covered the entire
> join condition. With this idea, the code you've added in
> create_distinct_paths() can just search for the cheapest path that has
> the correct uniquekeys, or if none exist then just do the normal
> sort->unique or hash agg.  I'm not entirely certain how we'd instruct
> a semi/anti joined relation to build such paths, but that seems like a
> problem that could be dealt with when someone does the work to allow
> skip scans to be used for those.
> 
> Also, I'm not entirely sure that these UniqueKeys should make use of
> PathKey since there's no real need to know about pk_opfamily,
> pk_strategy, pk_nulls_first as those all just describe how the keys
> are ordered. We just need to know if they're distinct or not. All
> that's left after removing those fields is pk_eclass, so could
> UniqueKeys just be a list of EquivalenceClass? or perhaps even a
> Bitmapset with indexes into PlannerInfo->ec_classes (that might be
> premature for not since we've not yet got
> https://commitfest.postgresql.org/23/1984/ or
> https://commitfest.postgresql.org/23/2019/ )  However, if we did use
> PathKey, that does allow us to quickly check if the UniqueKeys are
> contained within the PathKeys, since pathkeys are canonical which
> allows us just to compare their memory address to know if two are
> equal, however, if you're storing eclasses we could probably get the
> same just by comparing the address of the eclass to the pathkey's
> pk_eclass.
> 
> Otherwise, I think how you're making use of paths in
> create_distinct_paths() and create_skipscan_unique_path() kind of
> contradicts how they're meant to be used.
> 

Thank you very much for this feedback ! Will need to revise the patch 
based on this.

> I also agree with James that this should not be limited to Index Only
> Scans. From testing the patch, the following seems pretty strange to
> me:
> 
> # create table abc (a int, b int, c int);
> CREATE TABLE
> # insert into abc select a,b,1 from generate_Series(1,1000) a,
> generate_Series(1,1000) b;
> INSERT 0 1000000
> # create index on abc(a,b);
> CREATE INDEX
> # explain analyze select distinct on (a) a,b from abc order by a,b; --
> this is fast.
>                                                           QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>   Index Only Scan using abc_a_b_idx on abc  (cost=0.42..85.00 rows=200
> width=8) (actual time=0.260..20.518 rows=1000 loops=1)
>     Scan mode: Skip scan
>     Heap Fetches: 1000
>   Planning Time: 5.616 ms
>   Execution Time: 21.791 ms
> (5 rows)
> 
> 
> # explain analyze select distinct on (a) a,b,c from abc order by a,b;
> -- Add one more column and it's slow.
>                                                                  QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------
>   Unique  (cost=0.42..50104.43 rows=200 width=12) (actual
> time=1.201..555.280 rows=1000 loops=1)
>     ->  Index Scan using abc_a_b_idx on abc  (cost=0.42..47604.43
> rows=1000000 width=12) (actual time=1.197..447.683 rows=1000000
> loops=1)
>   Planning Time: 0.102 ms
>   Execution Time: 555.407 ms
> (4 rows)
> 
> [1] https://www.postgresql.org/search/?m=1&q=uniquekeys&l=1&d=-1&s=r
> 

Ok, understood.

I have put the CF entry into "Waiting on Author".

Best regards,
  Jesper



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jesper Pedersen
Дата:
Сообщение: Re: Index Skip Scan
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [PATCH] Incremental sort (was: PoC: Partial sort)