Re: On-disk bitmap index patch

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: On-disk bitmap index patch
Дата
Msg-id 24309.1153860389@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: On-disk bitmap index patch  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: On-disk bitmap index patch  (Mark Kirkwood <markir@paradise.net.nz>)
Список pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:
> One particular compelling situation for on-disk bitmaps is for terabyte 
> tables where a btree index would not fit into memory.   Index 
> performance for an index which is 10x or more the size of RAM really 
> sucks ... I can come up with some test results if you doubt that.

Sure...

> Also note that "low cardinality" is relative.  For a 1 billion row 
> table, a column with 10,000 values is "low-cardinality", having around 
> 100,000 rows per value ... but that's still 0.01% of the table per 
> value, making index use still applicable.

And your point is?  Assuming a reasonable datatype like int4, a btree
index on this table would occupy say 20GB (16 bytes per entry plus
fillfactor overhead).  The bitmap version would require 10,000 bitmaps
of 1G bits apiece, or 1250GB (not even counting overhead).  You need
some wildly optimistic assumptions about the compressibility of the
bitmaps to get even within hailing distance of the btree, let alone
fit in RAM.  A realistic assumption for the numbers you mention is
that the bitmaps have 1-bits about 100 bits apart, which means you
could get maybe 3-to-1 compression using the runlength scheme that's
in there ... leaving the bitmap a factor of 20 bigger than the btree.

AFAICS "low cardinality" has to mean just that, a few dozen distinct
values at most, for this scheme to have any hope.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Luke Lonergan"
Дата:
Сообщение: Re: On-disk bitmap index patch
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: effective_cache_size is a real?