Re: On-disk bitmap index patch

Поиск
Список
Период
Сортировка
От Mark Kirkwood
Тема Re: On-disk bitmap index patch
Дата
Msg-id 44C6C8CB.9030802@paradise.net.nz
обсуждение исходный текст
Ответ на Re: On-disk bitmap index patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: On-disk bitmap index patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane wrote:

> 
> And your point is?  Assuming a reasonable datatype like int4, a btree
> index on this table would occupy say 20GB (16 bytes per entry plus
> fillfactor overhead).  The bitmap version would require 10,000 bitmaps
> of 1G bits apiece, or 1250GB (not even counting overhead).  You need
> some wildly optimistic assumptions about the compressibility of the
> bitmaps to get even within hailing distance of the btree, let alone
> fit in RAM.  A realistic assumption for the numbers you mention is
> that the bitmaps have 1-bits about 100 bits apart, which means you
> could get maybe 3-to-1 compression using the runlength scheme that's
> in there ... leaving the bitmap a factor of 20 bigger than the btree.
> 
> AFAICS "low cardinality" has to mean just that, a few dozen distinct
> values at most, for this scheme to have any hope.
> 

I did a little testing of this when Jie first submitted the patch - for
a basically Zipfian distribution of int4's the bitmap is still clearly
smaller even at 1000 distinct values (see below). It is twice as big at
10000, so the crossover point is somewhere in the 1000-10000 range (for
this test - however the results seem to be reasonably typical).

In DSS distinct values < 1000 is very common (days of week, months of
year, lineitems of order, sex of animal...) so the applicability of this
range is perhaps larger than it seems at first sight.

Cheers

Mark
------------------------------------------------------------------------

bitmap=# \d bitmaptest  Table "public.bitmaptest" Column |  Type   | Modifiers
--------+---------+----------- id     | integer | not null val0   | integer | val1   | integer | val2   | integer |
val3  | integer | fil    | text    |
 


bitmap=# select count(distinct val0),                count(distinct val1),                count(distinct val2),
      count(distinct val3)         from bitmaptest; count | count | count | count
 
-------+-------+-------+-------    10 |   100 |  1000 | 10000


bitmap=# \i relsizes.sql  (BTREE)     relname     | relpages
-----------------+---------- bitmaptest      |    93458 bitmaptest_val0 |    21899 bitmaptest_val1 |    21899
bitmaptest_val2|    21899 bitmaptest_val3 |    21899
 


bitmap=# \i relsizes.sql (BITMAP)     relname     | relpages
-----------------+---------- bitmaptest      |    93458 bitmaptest_val0 |     1286 bitmaptest_val1 |     2313
bitmaptest_val2|     5726 bitmaptest_val3 |    41292
 


For the interested, the data generator, schema, index files are here:
http://homepages.paradise.net.nz/markir/download/bizgres/bitmaptest.tar.gz



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [PATCHES] Resurrecting per-page cleaner for btree
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [PATCHES] Resurrecting per-page cleaner for btree