Re: Special bloom index of INT, BIGINT, BIT, VARBIT for bitwiseoperation

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Special bloom index of INT, BIGINT, BIT, VARBIT for bitwiseoperation
Дата
Msg-id fb22d486-13f3-8ec2-be0e-4ad95f94416a@2ndquadrant.com
обсуждение исходный текст
Ответ на Special bloom index of INT, BIGINT, BIT, VARBIT for bitwise operation  (Takao Magoori <mago@magomago.jp>)
Список pgsql-performance

On 07/11/2018 08:02 AM, Takao Magoori wrote:
> Hello,
> 
> I have a table which has billions of rows and I want to select it by
> bitwise operation like,
> 
> =# CREATE TABLE IF NOT EXISTS t_bitwise (
>    id INTEGER NOT NULL
>    ,status_bigint BITINT NOT NULL
>    ,status_bit BIT(32) NOT NULL
> );
> 
> =# INSERT INTO t_bitwise (id, status_bigint, status_bit) SELECT
>    id
>    ,(random() * 4294967295)::BIGINT
>    ,(random() * 4294967295)::BIGINT::BIT(32)
> FROM generate_series(1, 3000000000) as t(id);
> 
> =# SELECT * FROM t_bitwise WHERE
>    status_bigint & 170 = 170
>    OR status_bigint & 256 = 256;
> 
> =# SELECT * FROM t_bitwise WHERE
>    status_bit & b'00000000000000000000000010101010'::BIT(32) =
> b'00000000000000000000000010101010'::BIT(32)
>    OR status_bit & b'00000000000000000000000100000000'::BIT(32) =
> b'00000000000000000000000100000000'::BIT(32);
> 
> Yes, these SELECT statements scan all rows. I guess possible index types are
> 
> - Expression indexes ?
> - Partial Indexes ?
> - GIN ?
> - GIST ?
> - bloom index ?
> 
> I googled but I feel there is no good solution and it would be good if
> I hava "bloom index specific for bitwise operation".
> 
> In case of normal bloom index, a value is hashed into a few bits which
> is mapped to a signature (default 80 bits).
> This is a lossy representation of the original value, and as such is
> prone to reporting false positives which requires "Recheck" process at
> SELECT. The more rows or the more complex condition, the more
> execution time.
> 
> My idea is that, in case of index for bitwise operation, each bit
> should be mapped to exactly same bit on a signature (One to one
> mapping). No false positives. No "Recheck" process is required. If the
> target coulmn is BIT(32), just 32 bits signature lengh is enough.
> 

So you're not computing a hash at all, so calling this "bloom index" is 
rather misleading.

Another question is where do you expect the performance advantage to 
come from? If the table is as narrow as this, such index will have 
almost the same size, and it won't allow fast lookups.

> Is there any index module like this ? Since I am not familiar with C
> and Postgresql, I can not write my own module.
> 

I doubt there's a suitable index available currently. Perhaps a simple 
btree index could help, assuming it's (a) smaller than the table, (b) we 
can use IOS and (c) only a small fraction of the table matches.

Another idea is to use partial indexes - one for each bit, i.e.

   CREATE INDEX ON t_bitwise (id)
    WHERE status_bit & b'10000000'::BIT(32) = b'10000000'::BIT(32);

And then build the query accordingly.

I wonder if it might work with BRIN indexes of some kind. If the range 
summary is defined as OR of the values, that might help, depending on 
variability within the page range. But that would probably require some 
development.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-performance по дате отправления:

Предыдущее
От: Fabio Pardi
Дата:
Сообщение: Re: Why HDD performance is better than SSD in this case
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Improving Performance of Query ~ Filter by A, Sort by B