Обсуждение: Indexing a Bit String column

Поиск
Список
Период
Сортировка

Indexing a Bit String column

От
George Oakman
Дата:

Hi all,

 

I am planning to use the Bit String data type for a large number of binary strings, e.g.

 

   CREATE TABLE myTable (myBitStringCol BIT(3));

 

I will need to perform & (bitwise AND) operations using SELECT on this column, e.g.

 

   SELECT * FROM myTable WHERE myBitStringCol & B'101' = myBitStringCol;

 

To optimise this type of SELECT statement, I guess I’ll have to build an index on the Bit String column, e.g.

 

   CREATE INDEX myBitStringCol_idx ON myTable (myBitStringCol);

 

 Is it all I need to do? Will PgSQL know how to index properly a Bit String column? Should I build the index using a special method, e.g.

 

   CREATE INDEX myBitStringCol_idx ON myTable USING gist(myBitStringCol);

 

Since we’re already talking of a Bit String column, the USING gist() statement looks a bit redundant to me. Basically, I though I would ask if I need to do anything special when indexing a BIT column.

 

Thanks for your comments.

 

George.

 

 

 


Share your photos with Windows Live Photos – Free Find out more!

Re: Indexing a Bit String column

От
Gregory Stark
Дата:
George Oakman <oakmang@hotmail.com> writes:

>  Is it all I need to do? Will PgSQL know how to index properly a Bit String
>  column? Should I build the index using a special method, e.g.
>     CREATE INDEX myBitStringCol_idx ON myTable USING gist(myBitStringCol);

No, the default will be to build a btree index which won't help these types of
queries at all.

You would want a GIST index if there was a built-in GIST opclass for these
kinds of queries, but sadly there isn't. You could add one fairly easily but
it would require C code. I think it would be a valuable addition to Postgres
if you do write one.

Note that something like "WHERE myBitStringCol & B'101'" might be selecting
too much of your table to make an index useful anyways. If each bit is set in
half the table then you're talking about selecting 3/4 of the table in which
case a full table scan would be more efficient than any index.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Indexing a Bit String column

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> Note that something like "WHERE myBitStringCol & B'101'" might be selecting
> too much of your table to make an index useful anyways. If each bit is set in
> half the table then you're talking about selecting 3/4 of the table in which
> case a full table scan would be more efficient than any index.

If the expectation is that the bitstring is mostly zeroes, I wonder
whether the OP wouldn't be better off using an integer-array
representation, ie instead of '00000101' store '{6,8}'.  Then he could
use the existing GIST or GIN support for integer array operations.

            regards, tom lane

Re: Indexing a Bit String column

От
George Oakman
Дата:
Hi,

Thanks for the info. I new it would have been too easy! :)
 
Sorry, I made a mistake earlier, my queries will actually be more like
 
   SELECT * FROM myTable WHERE myBitStringCol & B'101' = B'101';
 
This doesn't make much difference for the indexing problem, but it may help address the very good point you raised regarding the predicted number of results, and therefore the justification of needing an index at all. In practice, my BitStrings will be 1024 bits long - both A and B in "WHERE A & B = B" will be 1024 bits long.
 
Assuming that bits are independents and randomly distributed in the database, am I right in thinking that a typical search is expected to return N*0.5^n, where N is the total number of rows in the table and n is the number of bits set to 1 in B?
 
If this calculation is correct, even if 1% of the bits are set to 1 in B, then this would give N*0.5^10 results, i.e. roughly 0.1% of the database.
 
So it looks like I'll need the index in the end!
 
I am actually new to PgSQL - I would be very grateful if you could point me to resources/tutorials to help me build an index extension in C?
 
Many thanks for your help.
 
George.
 


> To: oakmang@hotmail.com
> CC: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Indexing a Bit String column
> From: stark@enterprisedb.com
> Date: Tue, 24 Feb 2009 15:35:58 +0000
>
>
> George Oakman <oakmang@hotmail.com> writes:
>
> > Is it all I need to do? Will PgSQL know how to index properly a Bit String
> > column? Should I build the index using a special method, e.g.
> > CREATE INDEX myBitStringCol_idx ON myTable USING gist(myBitStringCol);
>
> No, the default will be to build a btree index which won't help these types of
> queries at all.
>
> You would want a GIST index if there was a built-in GIST opclass for these
> kinds of queries, but sadly there isn't. You could add one fairly easily but
> it would require C code. I think it would be a valuable addition to Postgres
> if you do write one.
>
> Note that something like "WHERE myBitStringCol & B'101'" might be selecting
> too much of your table to make an index useful anyways. If each bit is set in
> half the table then you're talking about selecting 3/4 of the table in which
> case a full table scan would be more efficient than any index.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's On-Demand Production Tuning
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Share your photos with Windows Live Photos - Free Try it Now!