Re: On-disk bitmap index patch

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: On-disk bitmap index patch
Дата
Msg-id 1593.1153923961@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: On-disk bitmap index patch  (Mark Kirkwood <markir@paradise.net.nz>)
Ответы Re: On-disk bitmap index patch  (Mark Kirkwood <markir@paradise.net.nz>)
Re: On-disk bitmap index patch  ("Luke Lonergan" <llonergan@greenplum.com>)
Список pgsql-hackers
I wrote:
>> ...  A realistic assumption for the numbers you mention is
>> that the bitmaps have 1-bits about 100 bits apart, which means you
>> could get maybe 3-to-1 compression using the runlength scheme that's
>> in there ... leaving the bitmap a factor of 20 bigger than the btree.

I'm surprised no one caught me making this bogus computation.  I
realized this morning it's wrong: if there are 10000 distinct values
then on the average the 1-bits would be about 10000 bits apart, not 100.
At that rate there *is* substantial compression.  The representation
used in the bitmap patch isn't amazingly efficient for this (I wonder
whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs) but it
still manages to get down to something like 11 bytes per 1-bit.  Since
each 1-bit corresponds to a btree entry, this means the bitmap does come
out somewhat smaller than a btree (16 bytes per entry ignoring overhead)
--- at least if overhead such as wasted space on a page doesn't kill it.

Still, this isn't going to make the difference between fitting in RAM or
not.  For small numbers of distinct values, like less than 100, the
bitmap representation looks more plausible.  In that range you're down
to a byte or two per entry and so there is (very roughly) a 10x storage
savings over btree.  I don't believe the 100x numbers that have been
bandied around in this discussion, but 10x is plenty enough to be
interesting.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Diogo Biazus"
Дата:
Сообщение: xlogdump behaviour translating dropped relations
Следующее
От: "jkzhao"
Дата:
Сообщение: default lower case of identifier