Обсуждение: Partial hash index is not used for implied qual.

Поиск
Список
Период
Сортировка

Partial hash index is not used for implied qual.

От
Sergei Glukhov
Дата:
Hi!

Partial hash index is not used if qual is an implied qual
since this qual is not added to indrestrictinfo and we cannot
get the keys needed to make hash index scan possible.
Suggested fix is to add implied qual for the indexes
which requires the presence of a key to scan the index.

How to repeat:

CREATE TABLE hash_partial(x) AS
        SELECT x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
EXPLAIN (COSTS OFF) SELECT x FROM hash_partial WHERE x = 1;
...
          QUERY PLAN
--------------------------
  Seq Scan on hash_partial
    Filter: (x = 1)
  ...

  Regards,
  Sergei Glukhov


Вложения

Re: Partial hash index is not used for implied qual.

От
David Rowley
Дата:
On Mon, 24 Nov 2025 at 20:33, Sergei Glukhov <s.glukhov@postgrespro.ru> wrote:
> Partial hash index is not used if qual is an implied qual
> since this qual is not added to indrestrictinfo and we cannot
> get the keys needed to make hash index scan possible.
> Suggested fix is to add implied qual for the indexes
> which requires the presence of a key to scan the index.
>
> How to repeat:

That's not very good. I see we abort building the index path at:

/*
 * If no clauses match the first index column, check for amoptionalkey
 * restriction.  We can't generate a scan over an index with
 * amoptionalkey = false unless there's at least one index clause.
 * (When working on columns after the first, this test cannot fail. It
 * is always okay for columns after the first to not have any
 * clauses.)
 */
if (index_clauses == NIL && !index->amoptionalkey)
    return NIL;

and that's there due to the fact that Hash doesn't support full
clauseless scans. Without the above, you'd get:

SELECT x FROM hash_partial WHERE x = 1;
ERROR:  hash indexes do not support whole-index scans

so, that leads me to believe the location you're adjusting is probably
the best place to fix this issue.

As for the patch, you didn't update the comment to include the reason
you're keeping the restrictinfo clause. That's not great. I think you
should break that new test out into a new "if" test, maybe like:

/*
 * Keep the restrictinfo for non-amoptionalkey index types as
 * dropping the clause could result in having no clauses to use to
 * scan the index.  That's unsupported by non-amoptionalkey
 * indexes, so if we dropped this qual, we'd fail to build a Path
 * for this index later in planning.
 */
if (!index->amoptionalkey)
    index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);

/* predicate_implied_by() assumes first arg is immutable */
else if (contain_mutable_functions((Node *) rinfo->clause) ||
           !predicate_implied_by(list_make1(rinfo->clause),
                                               index->indpred, false))
    index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);

David



Re: Partial hash index is not used for implied qual.

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> so, that leads me to believe the location you're adjusting is probably
> the best place to fix this issue.

Wouldn't it be better to handle it more like the is_target_rel logic
a few lines further up?  I also object to putting the test between
the contain_mutable_functions and predicate_implied_by calls; that's
both confusing and probably wrong.  We're only calling
contain_mutable_functions to guard an assumption that
predicate_implied_by makes.

A larger point is that I think leaving such quals in indrestrictinfo
probably distorts our estimates of indexscan costs: we are likely to
think they contribute selectivity when they don't.  Maybe that's a
problem to address separately, but it should be looked at.  We skated
past the same problem for is_target_rel cases on the grounds that that
consideration affects all indexes on the rel equally; but as proposed,
this will probably result in an improper bias towards a partial hash
index.

            regards, tom lane



Re: Partial hash index is not used for implied qual.

От
Tom Lane
Дата:
I wrote:
> Wouldn't it be better to handle it more like the is_target_rel logic
> a few lines further up?

Actually, after thinking a bit longer, it'd be better to do something
like the attached so that we don't keep redundant quals unless they'd
*all* be excluded.

There's definitely something fishy about the costing though.
I experimented with this variant of Sergei's example:

regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
SELECT 1000
regression=# ANALYZE hash_partial;
ANALYZE
regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
CREATE INDEX
regression=# set enable_seqscan TO 0;  -- else we'll go for a seqscan
SET
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Bitmap Heap Scan on hash_partial  (cost=24.08..32.56 rows=10 width=4)
   Recheck Cond: (x = 1)
   ->  Bitmap Index Scan on partial_idx  (cost=0.00..24.07 rows=10 width=0)
         Index Cond: (x = 1)
(4 rows)

regression=# drop index partial_idx;
DROP INDEX
regression=# CREATE INDEX ON hash_partial USING hash(x);
CREATE INDEX
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Bitmap Heap Scan on hash_partial  (cost=4.08..12.56 rows=10 width=4)
   Recheck Cond: (x = 1)
   ->  Bitmap Index Scan on hash_partial_x_idx  (cost=0.00..4.08 rows=10 width=0)
         Index Cond: (x = 1)
(4 rows)

Why are we thinking that a non-partial index would be substantially
cheaper to scan?  That seems surely wrong, and it runs counter to my
intuition about why this fix is incomplete.  (I expected an unfair
bias towards the partial index, not against it.)

            regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c62e3f87724..c2687dec425 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4038,6 +4038,7 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
     foreach(lc, rel->indexlist)
     {
         IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+        List       *newrestrictinfo;
         ListCell   *lcr;

         if (index->indpred == NIL)
@@ -4051,8 +4052,8 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
         if (is_target_rel)
             continue;

-        /* Else compute indrestrictinfo as the non-implied quals */
-        index->indrestrictinfo = NIL;
+        /* Else compute new indrestrictinfo as the non-implied quals */
+        newrestrictinfo = NIL;
         foreach(lcr, rel->baserestrictinfo)
         {
             RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr);
@@ -4061,8 +4062,18 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
             if (contain_mutable_functions((Node *) rinfo->clause) ||
                 !predicate_implied_by(list_make1(rinfo->clause),
                                       index->indpred, false))
-                index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);
+                newrestrictinfo = lappend(newrestrictinfo, rinfo);
         }
+
+        /*
+         * If we excluded every qual as implied by the predicate, and the
+         * index cannot do full-index scans, then it's better to leave
+         * indrestrictinfo alone so that we can still build a scan on this
+         * index.  XXX We will underestimate the cost of such an indexscan;
+         * what can we do about that?
+         */
+        if (!(newrestrictinfo == NIL && !index->amoptionalkey))
+            index->indrestrictinfo = newrestrictinfo;
     }
 }


Re: Partial hash index is not used for implied qual.

От
Sergei Glukhov
Дата:
Hi,

On 11/25/25 6:01 AM, Tom Lane wrote:
> I wrote:
>> Wouldn't it be better to handle it more like the is_target_rel logic
>> a few lines further up?
> Actually, after thinking a bit longer, it'd be better to do something
> like the attached so that we don't keep redundant quals unless they'd
> *all* be excluded.
>
> There's definitely something fishy about the costing though.
> I experimented with this variant of Sergei's example:
>
> regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
> SELECT 1000
> regression=# ANALYZE hash_partial;
> ANALYZE
> regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
> CREATE INDEX
> regression=# set enable_seqscan TO 0;  -- else we'll go for a seqscan
> SET
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
>                                   QUERY PLAN
> ----------------------------------------------------------------------------
>   Bitmap Heap Scan on hash_partial  (cost=24.08..32.56 rows=10 width=4)
>     Recheck Cond: (x = 1)
>     ->  Bitmap Index Scan on partial_idx  (cost=0.00..24.07 rows=10 width=0)
>           Index Cond: (x = 1)
> (4 rows)
>
> regression=# drop index partial_idx;
> DROP INDEX
> regression=# CREATE INDEX ON hash_partial USING hash(x);
> CREATE INDEX
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
>                                      QUERY PLAN
> ----------------------------------------------------------------------------------
>   Bitmap Heap Scan on hash_partial  (cost=4.08..12.56 rows=10 width=4)
>     Recheck Cond: (x = 1)
>     ->  Bitmap Index Scan on hash_partial_x_idx  (cost=0.00..4.08 rows=10 width=0)
>           Index Cond: (x = 1)
> (4 rows)
>
> Why are we thinking that a non-partial index would be substantially
> cheaper to scan?  That seems surely wrong, and it runs counter to my
> intuition about why this fix is incomplete.  (I expected an unfair
> bias towards the partial index, not against it.)
>
>             regards, tom lane
>

Thanks for the fix. It seems there is another case for investigation:

DROP TABLE hash_partial;
CREATE TABLE hash_partial(x, y) AS
SELECT x, x + x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx  ON hash_partial USING hash(x) WHERE x = 1;
SET enable_seqscan TO 0;
EXPLAIN SELECT x FROM hash_partial WHERE x = 1 and y < 0;
--------------------------------------------------------------------------------
Seq Scan on hash_partial  (cost=0.00..23.00 rows=1 width=4)
    Disabled: true
    Filter: ((y < 0) AND (x = 1))
(3 rows)


  Regarding strangeness of the cost,
  cost is depends on numIndexPages and
  in genericcostestimate() we calulate numIndexPages:

  numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);

  For non-partial index index->pages = 6 and index->tuples = 1000
  and for partial index index->pages = 6 and index->tuples = 10.
  Number of pages depends on index relation size and
  initial size is 6 * BLCKSZ for both, partial and non-partial hash indexes
  Initial size of the hash index relation, in turn,
  depends on total number of tuples in the table.

  Regards,
  Sergei Glukhov





Re: Partial hash index is not used for implied qual.

От
David Rowley
Дата:
On Tue, 25 Nov 2025 at 15:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Actually, after thinking a bit longer, it'd be better to do something
> like the attached so that we don't keep redundant quals unless they'd
> *all* be excluded.

I think your 1st patch was more along the correct lines. With the 2nd
one, I think you're maybe assuming that the non-empty newrestrictinfo
must contain an indexable qual, but the list we're working with at
that point is the rel's baserestrictinfo, which could contain a bunch
of stuff that'll never match to the index. If you continue to remove
the implied qual when there's a non-indexable base qual in the list,
then we'll still have the same issue that Sergei is trying to fix.
Just try adding an unrelated qual with your 2nd patch. Something like:
select * from hash_partial where x=1 and ctid >= '(0,0)';

I think you might have tried the 2nd method because you didn't see the
point in adding >1 indexable qual to scan the index with when we just
want ==1. If you wanted to do that, then maybe match_clause_to_index()
would be the place to ensure the list_length() doesn't go above 1 for
non-amoptionalkey indexes. Is that really worthwhile? hash indexes
only support equality anyway, so in what scenario would there be more
than 1 qual?

David