Обсуждение: GIST versus GIN indexes for intarrays

Поиск
Список
Период
Сортировка

GIST versus GIN indexes for intarrays

От
Rusty Conover
Дата:
Hi Guys,

I'm a bit confused when the proper way to use GIST versus GIN indexes with integer arrays.

The documentation states:


The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere. As a rule of thumb, a GIN index is faster to search than a GiST index, but slower to build or update; so GIN is better suited for static data and GiST for often-updated data.

Since 100% of my queries are for retrieval, I should use GIN but it never appears to be used unlike how GIST indexes are:

gearbuyer_ig=# select version();
                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 8.3.6 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
(1 row)

With just a GIN index I get this plan (no use of GIN):

gearbuyer_ig=# explain select count(*) from items where items.fast_colors @> ARRAY[0];
                           QUERY PLAN                            
-----------------------------------------------------------------
 Aggregate  (cost=21194.27..21194.28 rows=1 width=0)
   ->  Seq Scan on items  (cost=0.00..21193.64 rows=251 width=0)
         Filter: (fast_colors @> '{0}'::integer[])
(3 rows)

With a GIST index created like:

gearbuyer_ig=# CREATE INDEX items_fast_colors_rdtree2_idx ON items USING gist (fast_colors gist__int_ops);

gearbuyer_ig=# explain select count(*) from items where items.fast_colors @> ARRAY[0];
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Aggregate  (cost=929.81..929.82 rows=1 width=0)
   ->  Bitmap Heap Scan on items  (cost=14.30..929.18 rows=251 width=0)
         Recheck Cond: (fast_colors @> '{0}'::integer[])
         ->  Bitmap Index Scan on items_fast_colors_rdtree2_idx  (cost=0.00..14.24 rows=251 width=0)
               Index Cond: (fast_colors @> '{0}'::integer[])
(5 rows)

Any insight is greatly appreciated.  Could this be a regression from 8.3.5 and 8.3.6?

Thanks,

Rusty

Re: GIST versus GIN indexes for intarrays

От
Tom Lane
Дата:
Rusty Conover <rconover@infogears.com> writes:
> Since 100% of my queries are for retrieval, I should use GIN but it
> never appears to be used unlike how GIST indexes are:

You haven't shown us either the table or the index declaration,
so it's a bit tough to comment on that.  It's worth noting though
that your GIST example appears to rely on a nonstandard operator class.

            regards, tom lane

Re: GIST versus GIN indexes for intarrays

От
Rusty Conover
Дата:
On Feb 12, 2009, at 1:54 PM, Tom Lane wrote:

> Rusty Conover <rconover@infogears.com> writes:
>> Since 100% of my queries are for retrieval, I should use GIN but it
>> never appears to be used unlike how GIST indexes are:
>
> You haven't shown us either the table or the index declaration,
> so it's a bit tough to comment on that.  It's worth noting though
> that your GIST example appears to rely on a nonstandard operator
> class.
>
>             regards, tom lane
>

Hi Tom,

My apologies, below is the table definition, and the GIN index creation.

The gist__int_ops is the default operator class for integer[] arrays,
as shown at:

http://www.postgresql.org/docs/current/static/intarray.html

gearbuyer_ig=# \d items
                                   Table "public.items"
          Column          |   Type    |                     Modifiers
-------------------------+-----------
+---------------------------------------------------
  item_id                 | integer   | not null default
nextval('generic_seq'::regclass)
  gb_product_url          | text      | not null
  group_id                | integer   |
  category_id             | integer   |
  product_name            | text      | not null
  gender                  | text      | not null
  description_extract     | text      | not null
  sort_price              | real      | not null
  price_range             | text      | not null
  brand_id                | integer   | not null
  xapian_doc_id           | integer   |
  average_rating          | uint1     |
  reviews_count           | smallint  |
  store_count             | uint1     |
  default_image_id        | integer   |
  available_sizes         | integer[] |
  fast_colors             | integer[] |
  has_coupons             | boolean   | not null default false
  age_low                 | uint1     |
  sale_percentage_low     | uint1     |
  store_count_low         | uint1     |
  price_range_low         | smallint  |
  offering_stores         | integer[] |
  subclassification_ids   | integer[] |
  popularity_rank         | integer   |
  default_similarity_type | uint1     |
  default_similarity_id   | integer   |
  gc_lookup_id            | integer   |

The GIN index was created via:

CREATE INDEX items_fast_colors_rdtree_idx ON items USING gin
(fast_colors);

Cheers,

Rusty
--
Rusty Conover
rconover@infogears.com
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
http://www.infogears.com
http://www.gearbuyer.com
http://www.footwearbuyer.com

Re: GIST versus GIN indexes for intarrays

От
Tom Lane
Дата:
Rusty Conover <rconover@infogears.com> writes:
> The gist__int_ops is the default operator class for integer[] arrays,
> as shown at:
> http://www.postgresql.org/docs/current/static/intarray.html

Ah, so you have contrib/intarray installed.

[ pokes at it... ]  Seems like what we have here is another iteration
of this ancient bug:
http://archives.postgresql.org/pgsql-committers/2004-01/msg00073.php
to wit, contrib/intarray is defining its own @> and <@ operators that
conflict with those since added to the core.  In the case Rusty is
showing, the @> gets resolved as intarray's @> (because that's an
exact match, where the core provides anyarray @> anyarray) and then
this operator is NOT a member of the core-provided GIN opclass for
integer arrays.

The short-term workaround for Rusty is probably to create his GIN
index using the intarray-provided gin__int_ops opclass.  But it
seems to me that we ought to get rid of intarray's @> and <@ operators
and have the module depend on the core anyarray operators, just as we
have already done for = and <>.  Comments?

            regards, tom lane

Re: GIST versus GIN indexes for intarrays

От
Rusty Conover
Дата:

On Feb 12, 2009, at 2:29 PM, Tom Lane wrote:

Rusty Conover <rconover@infogears.com> writes:
The gist__int_ops is the default operator class for integer[] arrays,  
as shown at:
http://www.postgresql.org/docs/current/static/intarray.html

Ah, so you have contrib/intarray installed.

[ pokes at it... ]  Seems like what we have here is another iteration
of this ancient bug:
http://archives.postgresql.org/pgsql-committers/2004-01/msg00073.php
to wit, contrib/intarray is defining its own @> and <@ operators that
conflict with those since added to the core.  In the case Rusty is
showing, the @> gets resolved as intarray's @> (because that's an
exact match, where the core provides anyarray @> anyarray) and then
this operator is NOT a member of the core-provided GIN opclass for
integer arrays.

The short-term workaround for Rusty is probably to create his GIN
index using the intarray-provided gin__int_ops opclass.  But it
seems to me that we ought to get rid of intarray's @> and <@ operators
and have the module depend on the core anyarray operators, just as we
have already done for = and <>.  Comments?

Hi Tom,

For the record using the GIN opclass does resolve the problem for me.  The indexes are now seeing usage.

Thanks for the help,

Rusty
--

Re: [HACKERS] GIST versus GIN indexes for intarrays

От
Kenneth Marshall
Дата:
On Fri, Feb 13, 2009 at 04:12:53PM +0300, Teodor Sigaev wrote:
>> The short-term workaround for Rusty is probably to create his GIN
>> index using the intarray-provided gin__int_ops opclass.  But it
> Right
>> seems to me that we ought to get rid of intarray's @> and <@ operators
>> and have the module depend on the core anyarray operators, just as we
>> have already done for = and <>.  Comments?
> Agree, will do. Although built-in anyarray operators have ~N^2 behaviour
> while intarray's version - only N*log(N)
Is there a way to have the buily-in anyarray opeators be N*log(N)?

Ken

Re: [HACKERS] GIST versus GIN indexes for intarrays

От
Tom Lane
Дата:
Teodor Sigaev <teodor@sigaev.ru> writes:
>> seems to me that we ought to get rid of intarray's @> and <@ operators
>> and have the module depend on the core anyarray operators, just as we
>> have already done for = and <>.  Comments?

> Agree, will do. Although built-in anyarray operators have ~N^2 behaviour while
> intarray's version - only N*log(N)

Really?  isort() looks like a bubble sort to me.

But in any case, a pre-sort is probably actually *slower* for small
numbers of array elements.  I wonder where the crossover is.  In
principle we could make the core implementation do a sort when working
with a sortable datatype, but I'm unsure it's worth the trouble.

            regards, tom lane