Re: On-disk bitmap index patch

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: On-disk bitmap index patch
Дата
Msg-id 748.1153983035@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: On-disk bitmap index patch  ("Jie Zhang" <jzhang@greenplum.com>)
Ответы Re: On-disk bitmap index patch  ("Jie Zhang" <jzhang@greenplum.com>)
Список pgsql-hackers
"Jie Zhang" <jzhang@greenplum.com> writes:
> On 7/26/06 10:14 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> ... A nonuniform distribution would probably mean that some
>> of the bitmaps compress better-than-expected and others worse.  I have
>> no idea how to model that and guess what the overall result is ...

> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
> Wu et al gave an approximate answer for this question. Assume that there are
> c distinct values. Let the i-th value has a probability of p_i, the number
> of rows r, and the word size w. then the total size of the compressed bitmap
> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
> both \sum's, i is from 1 to c.

Hm, but that's still begging the question no?  It's still assuming that
any one value is uniformly distributed.  ISTM the cases that would break
my simplistic calculation involve clustering of particular values, such
that some areas of the bitmap are all-zero while other areas have lots
of ones.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jie Zhang"
Дата:
Сообщение: Re: On-disk bitmap index patch
Следующее
От: Michael Glaesemann
Дата:
Сообщение: Re: GUC with units, details