Обсуждение: BUG #12763: Planner refuses to do index-only-scan on partial indexes despite having all the required data

Поиск
Список
Период
Сортировка

BUG #12763: Planner refuses to do index-only-scan on partial indexes despite having all the required data

От
toruvinn@lain.pl
Дата:
The following bug has been logged on the website:

Bug reference:      12763
Logged by:          toruvinn
Email address:      toruvinn@lain.pl
PostgreSQL version: 9.3.6
Operating system:   linux (debian wheezy; pg compiled from sources)
Description:

Ran into it doing:
# SELECT COUNT(*) FROM foos f JOIN items i ON (i.id=f.id) WHERE f.user = xxx
AND f.type=2::smallint AND i.type=1;

johto told me on irc it may be worth reporting it here - thanks!

Everything works fine if I include the "type" column in the index (even
though it's limited to type = 1 - create index whatever on items (id, type)
where type = 1):
# explain analyze SELECT COUNT(*) FROM foos f JOIN items i ON (i.id=f.id)
WHERE f.user = xxx AND f.type=2::smallint AND i.type=1;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=79432.93..79432.94 rows=1 width=0) (actual
time=1067.016..1067.017 rows=1 loops=1)
   ->  Merge Join  (cost=547.51..79141.32 rows=116644 width=0) (actual
time=0.152..1058.394 rows=126993 loops=1)
         Merge Cond: (f.id = i.id)
         ->  Index Only Scan using fooindex on foos f  (cost=0.56..3030.56
rows=123032 width=8) (actual time=0.022..45.369 rows=137840 loops=1)
               Index Cond: ((user = xxx) AND (type = 2::smallint))
               Heap Fetches: 11958
         ->  Index Only Scan using items_id_type_fortype1 on items i
(cost=0.43..70622.28 rows=1638331 width=8) (actual time=0.028..871.310
rows=1637824 loops=1)
               Index Cond: (type = 1)
               Heap Fetches: 826872
 Total runtime: 1067.057 ms

However, an index on (id) WHERE type = 1 won't be considered for an
index-only-scan:

\d items:
(...)
    "items_id_type" btree (id, type)
    "items_id_type1" btree (id) WHERE type = 1

# explain analyze SELECT COUNT(*) FROM foos f JOIN items i ON (i.id=f.id)
WHERE f.user = xxx AND f.type=2::smallint AND i.type=1;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=80475.69..80475.70 rows=1 width=0) (actual
time=986.580..986.580 rows=1 loops=1)
   ->  Merge Join  (cost=547.83..80184.08 rows=116645 width=0) (actual
time=0.099..978.766 rows=126997 loops=1)
         Merge Cond: (f.id = i.id)
         ->  Index Only Scan using fooindex on foos f  (cost=0.56..3032.54
rows=123033 width=8) (actual time=0.016..41.495 rows=137844 loops=1)
               Index Cond: ((user = xxx) AND (type = 2::smallint))
               Heap Fetches: 11981
         ->  Index Only Scan using items_id_type on items i
(cost=0.43..71664.18 rows=1638331 width=8) (actual time=0.016..801.884
rows=1637822 loops=1)
               Index Cond: (type = 1)
               Heap Fetches: 844599
 Total runtime: 986.613 ms

- ignored the "..._type1" index. Now let's drop "items_id_type".

# drop index items_id_type;
DROP INDEX
# explain analyze SELECT COUNT(*) FROM foos f JOIN items i ON (i.id=f.id)
WHERE f.user = xxx AND f.type=2::smallint AND i.type=1;
                                                                       QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=112051.53..112051.54 rows=1 width=0) (actual
time=1829.931..1829.932 rows=1 loops=1)
   ->  Merge Join  (cost=547.83..111759.91 rows=116646 width=0) (actual
time=0.166..1821.902 rows=126997 loops=1)
         Merge Cond: (f.id = i.id)
         ->  Index Only Scan using fooindex on foos f  (cost=0.56..3032.56
rows=123034 width=8) (actual time=0.015..44.455 rows=137844 loops=1)
               Index Cond: ((user = xxx) AND (type = 2::smallint))
               Heap Fetches: 11984
         ->  Index Scan using items_id_type1 on items i
(cost=0.43..103274.40 rows=1638331 width=8) (actual time=0.008..1327.167
rows=1637822 loops=1)
 Total runtime: 1829.970 ms
(8 rows)

Not an indexONLYscan anymore.

Hopefully I didn't fail my attempt at anonymizing and all the column names
match, although should be pretty obvious even without that.

Kind regards,
me.
toruvinn@lain.pl writes:
> johto told me on irc it may be worth reporting it here - thanks!

Not really.  See comment in check_index_only():

    /*
     * Check that all needed attributes of the relation are available from the
     * index.
     *
     * XXX this is overly conservative for partial indexes, since we will
     * consider attributes involved in the index predicate as required even
     * though the predicate won't need to be checked at runtime.  (The same is
     * true for attributes used only in index quals, if we are certain that
     * the index is not lossy.)  However, it would be quite expensive to
     * determine that accurately at this point, so for now we take the easy
     * way out.
     */

This is the same reason why expression indexes are difficult to use in
index-only scans: if the query refers to f(x), the information available
to check_index_only() says that the index must be able to return the value
of "x", not "f(x)".  It'd be nice to improve, but the only obvious ways to
do that would result in entirely unacceptable hits to planning speed.

            regards, tom lane