Обсуждение: GiST indexing question

Поиск
Список
Период
Сортировка

GiST indexing question

От
Greg Landrum
Дата:
Hi,

I'm attempting to expand an existing postgresql extension and I've run
into a wall with the way operator classes should be defined for GiST
indices.

What I have that works is the following two operators:
CREATE OPERATOR <@ (
    LEFTARG = mol,
    RIGHTARG = mol,
    PROCEDURE = rsubstruct(mol, mol),
    COMMUTATOR = '@>',
    RESTRICT = contsel,
    JOIN = contjoinsel
);
CREATE OPERATOR @> (
    LEFTARG = mol,
    RIGHTARG = mol,
    PROCEDURE = substruct(mol, mol),
    COMMUTATOR = '<@',
    RESTRICT = contsel,
    JOIN = contjoinsel
);

combined into an operator class for use in indexing :

CREATE OPERATOR CLASS mol_ops
DEFAULT FOR TYPE mol USING gist
AS
    OPERATOR    3    @> (mol, mol),
    OPERATOR    4    <@ (mol, mol),
    FUNCTION    1   gmol_consistent (bytea, internal, int4),
    FUNCTION    2   gmol_union (bytea, internal),
    FUNCTION    3   gmol_compress (internal),
    FUNCTION    4   gmol_decompress (internal),
    FUNCTION    5   gmol_penalty (internal, internal, internal),
    FUNCTION    6   gmol_picksplit (internal, internal),
    FUNCTION    7   gmol_same (bytea, bytea, internal),
STORAGE         bytea;

I'm now trying to add an equality operator (==) as follows:

CREATE OPERATOR == (
    LEFTARG = mol,
    RIGHTARG = mol,
    PROCEDURE = mol_eq,
    COMMUTATOR = '=',
    NEGATOR = '<>',
    RESTRICT = eqsel,
    JOIN = eqjoinsel
);

and I want to use this to extend the operator class:

CREATE OPERATOR CLASS mol_ops
DEFAULT FOR TYPE mol USING gist
AS
    OPERATOR    3    @> (mol, mol),
    OPERATOR    4    <@ (mol, mol),
    OPERATOR    6    == (mol, mol),
    FUNCTION    1   gmol_consistent (bytea, internal, int4),
    FUNCTION    2   gmol_union (bytea, internal),
    FUNCTION    3   gmol_compress (internal),
    FUNCTION    4   gmol_decompress (internal),
    FUNCTION    5   gmol_penalty (internal, internal, internal),
    FUNCTION    6   gmol_picksplit (internal, internal),
    FUNCTION    7   gmol_same (bytea, bytea, internal),
STORAGE         bytea;

I made something of a guess as to which strategy I should use.

I can now do basic tests with my == operator:
moltest=# select 'c1nnccc1'::mol=='c1ccnnc1'::mol;
 ?column?
----------
 t
(1 row)

moltest=# select 'c1nnccc1'::mol=='c1cnnnc1'::mol;
 ?column?
----------
 f
(1 row)

(yes, those are both correct).

I can create an index using these definitions:
moltest=# create index molidx on mols using gist(m);
CREATE INDEX


And as far as I can tell it looks like the index should be working:

moltest=# explain select count(*) from mols where m=='c1ncnnc1';
                               QUERY PLAN
-------------------------------------------------------------------------
 Aggregate  (cost=8.27..8.28 rows=1 width=0)
   ->  Index Scan using molidx on mols  (cost=0.00..8.27 rows=1 width=0)
         Index Cond: (m == 'c1cnncn1'::mol)
(3 rows)

But when I actually run a query I get an error:

moltest=# select count(*) from mols where m=='c1ncnnc1';
ERROR:  Unknown strategy: 6

The index isn't totally screwed up though, because an @> query still works:

moltest=# select count(*) from mols where m@>'c1ncnnc1';
 count
-------
     5
(1 row)

I guess I must be doing something stupid, but neither the docs nor the
internet have been particularly helpful in letting me know what.

These tests are all being done using postgresql 9.0.1 built on an
ubuntu 10.10 box.

In case it's helpful, the full extension code (without the
non-functioning attempts at adding == to the index) is here:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/PgSQL/rdkit/

Thanks in advance for any help,
-greg

Re: GiST indexing question

От
Yeb Havinga
Дата:
On 2010-12-11 06:09, Greg Landrum wrote:
> I'm now trying to add an equality operator (==) as follows:
>
> CREATE OPERATOR == (
>      LEFTARG = mol,
>      RIGHTARG = mol,
>      PROCEDURE = mol_eq,
>      COMMUTATOR = '=',
>      NEGATOR = '<>',
>      RESTRICT = eqsel,
>      JOIN = eqjoinsel
> );
Are you sure the commutator and negator exist and are correct?

> and I want to use this to extend the operator class:
>
> CREATE OPERATOR CLASS mol_ops
> DEFAULT FOR TYPE mol USING gist
> AS
>     OPERATOR    3    @>  (mol, mol),
>     OPERATOR    4    <@ (mol, mol),
>     OPERATOR    6    == (mol, mol),
>      FUNCTION    1   gmol_consistent (bytea, internal, int4),
>      FUNCTION    2   gmol_union (bytea, internal),
>      FUNCTION    3   gmol_compress (internal),
>      FUNCTION    4   gmol_decompress (internal),
>      FUNCTION    5   gmol_penalty (internal, internal, internal),
>      FUNCTION    6   gmol_picksplit (internal, internal),
>      FUNCTION    7   gmol_same (bytea, bytea, internal),
> STORAGE         bytea;
This seems to be ok.
> But when I actually run a query I get an error:
>
> moltest=# select count(*) from mols where m=='c1ncnnc1';
> ERROR:  Unknown strategy: 6
I could not find that error exact message in the code, but something
similar gave a hit.
$ grep -r 'Unknown strategy' *
$ grep -ri 'Unknown strategy' *
contrib/intarray/_int_gin.c:                    elog(ERROR,
"ginint4_consistent: unknown strategy number: %d",
src/backend/access/gin/ginarrayproc.c:                  elog(ERROR,
"ginarrayconsistent: unknown strategy number: %d",
src/backend/access/gist/gistproc.c:                     elog(ERROR,
"unknown strategy number: %d", strategy);
src/backend/access/gist/gistproc.c:                     elog(ERROR,
"unknown strategy number: %d", strategy);

both the gistproc hits are in built-in consistent functions. This makes
me believe that the error might be thrown in the rdkit's mol consistent
function.

> I guess I must be doing something stupid, but neither the docs nor the
> internet have been particularly helpful in letting me know what.
Yeah, a hint might help for that error message. But since it doesn't, a
quick way for problems like this is to increase log_error_verbosity to
verbose, trigger the error again. Then the sourcefile and linenumber are
logged as well. If it is then still unclear what's causing the error,
you can use that information in gdb to set a breakpoint on that source
location.
> In case it's helpful, the full extension code (without the
> non-functioning attempts at adding == to the index) is here:
> http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/PgSQL/rdkit/
If you take a look at
http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/PgSQL/rdkit/rdkit_gist.c?revision=1481&view=markup
and then line 651: that's the error being thrown. You need to extend
gmol_consistent to handle the strategy 6 number.

regards,
Yeb Havinga


Re: GiST indexing question

От
Greg Landrum
Дата:
Dear Yeb,

Thanks for the answer.

On Sat, Dec 11, 2010 at 3:10 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> On 2010-12-11 06:09, Greg Landrum wrote:
>> I guess I must be doing something stupid, but neither the docs nor the
>> internet have been particularly helpful in letting me know what.
>
> Yeah, a hint might help for that error message. But since it doesn't, a
> quick way for problems like this is to increase log_error_verbosity to
> verbose, trigger the error again. Then the sourcefile and linenumber are
> logged as well. If it is then still unclear what's causing the error, you
> can use that information in gdb to set a breakpoint on that source location.
>>
>> In case it's helpful, the full extension code (without the
>> non-functioning attempts at adding == to the index) is here:
>> http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/PgSQL/rdkit/
>
> If you take a look at
> http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/PgSQL/rdkit/rdkit_gist.c?revision=1481&view=markup
> and then line 651: that's the error being thrown. You need to extend
> gmol_consistent to handle the strategy 6 number.

Doh! That was it. No big wonder I couldn't find anything helpful on
the 'net. Now I'm really glad I included the link to the code.

Thanks again for both the answer and for providing a description of
how to track down future problems.

Best Regards,
-greg