Обсуждение: Bool btree_gin index not chosen on equality contraint, but on greater/lower?

Поиск
Список
Период
Сортировка

Bool btree_gin index not chosen on equality contraint, but on greater/lower?

От
Patric Bechtel
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I tried to add bool support to the btree_gin contrib module, and as far as I can tell, it seems to
work (wasn't that complicated, actually).

But now I'm stuck, as PostgreSQL doesn't seem to like to use my new index, if I use equality or
unequality, just with greater and lower than.

My test subject is a table with 13690993 rows, one of them (bar) is a boolean, 376442 are true,
the others are false, no nulls. The index on bar is a btree_gin index. Table is vacuum analyzed
and all, so statistics are fresh and usable, as the estimates within the plans show.

Here's the plan if I ask for 300 rows with d, as in "select id from foo where bar":
Seq Scan on foo  (cost=0.00..684709.82 rows=385495 width=8) (actual time=0.014..2657.326
rows=376442 loops=1)  Filter: bar  Rows Removed by Filter: 13314551Planning time: 0.309 msExecution time: 2672.559 ms

But, if I query "select if from foo where bar>'f'":
Bitmap Heap Scan on foo  (cost=7955.59..313817.94 rows=385495 width=8) (actual
time=220.631..365.299 rows=376442 loops=1)  Recheck Cond: (bar > false)  Heap Blocks: exact=104100  ->  Bitmap Index
Scanon ix_foo_gin  (cost=0.00..7859.21 rows=385495 width=0) (actual
 
time=193.192..193.192 rows=376442 loops=1)        Index Cond: (bar > false)Planning time: 0.400 msExecution time:
377.518ms
 

It starts using the index. The rule seems to be: as long as I'm using <, <=, >= or >, it chooses
the index. If I use = or !=, it doesn't.

Here's my definition of the bool_ops for the gin index (it's very similar to the other indexes in
the btree_gin extension):

CREATE OPERATOR CLASS bool_ops
DEFAULT FOR TYPE bool USING gin
AS   OPERATOR        1       <,   OPERATOR        2       <=,   OPERATOR        3       =,   OPERATOR        4
>=,  OPERATOR        5       >,   FUNCTION        1       btboolcmp(bool,bool),   FUNCTION        2
gin_extract_value_bool(bool,internal),   FUNCTION        3       gin_extract_query_bool(bool, internal, int2, internal,
internal),  FUNCTION        4       gin_btree_consistent(internal, int2, anyelement, int4, internal,
 
internal),   FUNCTION        5       gin_compare_prefix_bool(bool,bool,int2, internal),
STORAGE         bool;

What am I overseeing?

- -- 
Patric
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAlbAdg0ACgkQfGgGu8y7ypBHZwCg0g1JSgZTc0OBYsMzrj6w4Zy6
DTQAn38gk8hfqFf86N8hWEzwqc9afjar
=SLMC
-----END PGP SIGNATURE-----



Re: Bool btree_gin index not chosen on equality contraint, but on greater/lower?

От
Tom Lane
Дата:
Patric Bechtel <patric.bechtel@gmail.com> writes:
> I tried to add bool support to the btree_gin contrib module, and as far as I can tell, it seems to
> work (wasn't that complicated, actually).
> But now I'm stuck, as PostgreSQL doesn't seem to like to use my new index, if I use equality or
> unequality, just with greater and lower than.

I think your problem is that the planner won't apply
match_boolean_index_clause() or expand_boolean_index_clause(),
because it has a hard-wired idea of which index opclasses could
possibly benefit from that, cf IsBooleanOpfamily().
        regards, tom lane



Re: Bool btree_gin index not chosen on equality contraint, but on greater/lower?

От
Patric Bechtel
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Tom,

Tom Lane schrieb am 14.02.2016 um 17:51:
> Patric Bechtel <patric.bechtel@gmail.com> writes:
>> I tried to add bool support to the btree_gin contrib module, and as far as I can tell, it
>> seems to work (wasn't that complicated, actually). But now I'm stuck, as PostgreSQL doesn't
>> seem to like to use my new index, if I use equality or unequality, just with greater and
>> lower than.
> 
> I think your problem is that the planner won't apply match_boolean_index_clause() or
> expand_boolean_index_clause(), because it has a hard-wired idea of which index opclasses could 
> possibly benefit from that, cf IsBooleanOpfamily().

oh, sh*t...

My motivation was the size of the bool indexes; they are tiny and really fast. It feels almost
like bitmap indexes.

I hope that's not too far over my head already... but I'll take a look.

If someone might give me a hint where to look, I'd be grateful.

Thanks a lot for the hint,

Patric
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAlbA+64ACgkQfGgGu8y7ypCfVwCg81dCY9Mv70+2dk8e3+5xChyO
C7cAn1fRV3NAosi0W3IisKNEmS9K9hZE
=Xd+r
-----END PGP SIGNATURE-----



Re: Bool btree_gin index not chosen on equality contraint, but on greater/lower?

От
Tom Lane
Дата:
Patric Bechtel <patric.bechtel@gmail.com> writes:
> Tom Lane schrieb am 14.02.2016 um 17:51:
>> I think your problem is that the planner won't apply match_boolean_index_clause() or
>> expand_boolean_index_clause(), because it has a hard-wired idea of which index opclasses could 
>> possibly benefit from that, cf IsBooleanOpfamily().

> If someone might give me a hint where to look, I'd be grateful.

In principle, instead of using the hard-wired IsBooleanOpfamily test,
we could check "op_in_opfamily(BooleanEqualOperator, opfamily)" to see
whether those transforms would produce something that works with the
index.  The reason I didn't do it that way is that those tests are made
for every WHERE clause the planner ever considers, so it seemed like
adding a syscache lookup that is usually going to accomplish nothing
would be pretty annoying overhead.  The trick to getting an acceptable
patch here would be to figure out how to arrange things to not cost
much when the optimization doesn't apply.

One idea worth exploring is to redefine IsBooleanOpfamily to inspect
the index's AM OID as well as the opfamily, and code it along the
lines of
((relam) == BTREE_AM_OID ? (opfamily) == BOOL_BTREE_FAM_OID : (relam) == HASH_AM_OID ? (opfamily) == BOOL_HASH_FAM_OID
:op_in_opfamily(BooleanEqualOperator, opfamily))
 

so that the extra syscache lookup doesn't have to happen at all for
btree and hash indexes.  But I doubt that moves the needle far enough.

Another idea is to try to rearrange match_clause_to_indexcol and
expand_indexqual_conditions so that IsBooleanOpfamily() doesn't get hit
quite so easily.  I don't remember at the moment exactly why that's the
first test rather than something further down.  Probably at least part of
the reason is an assumption that IsBooleanOpfamily() is cheap.  It's
possible that match_boolean_index_clause() is actually cheaper than the
full-fledged version of IsBooleanOpfamily() and so the order of those
tests should be reversed; moving them down to after other tests might also
be appropriate.

Another line of thought is to try to cache the result of the
IsBooleanOpfamily() test in the IndexOptInfo structs so it doesn't
have to get done so many times.  But that would be more invasive
and I'm not sure that it helps in simple cases.
        regards, tom lane