Re: CLUSTER and indisclustered

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: CLUSTER and indisclustered
Дата
Msg-id 1028737470.13419.182.camel@taru.tm.ee
обсуждение исходный текст
Ответ на Re: CLUSTER and indisclustered  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, 2002-08-07 at 16:26, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
> >> Right.  One form of the "lossy compression" idea I suggested is to
> >> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
> >> too large to work with.  
> 
> > If it is a real bitmap, should it not be easyeast to allocate at the
> > start ?
> 
> But it isn't a "real bitmap".  That would be a really poor
> implementation, both for space and speed --- do you really want to scan
> over a couple of megs of zeroes to find the few one-bits you care about,
> in the typical case?

I guess that depends on data. The typical case should be somthing the
stats process will find out so the optimiser can use it

The bitmap must be less than 1/48 (size of TID) full for best
uncompressed "active-tid-list" to be smaller than plain bitmap. If there
were some structure above list then this ratio would be even higher.

I have had good experience using "compressed delta lists", which will
scale well ofer the whole "fullness" spectrum of bitmap, but this is for
storage, not for initial constructing of lists.

>  "Bitmap" is a convenient term because it describes
> the abstract behavior we want, but the actual data structure will
> probably be nontrivial.  If I recall Ann's description correctly,
> Firebird's implementation uses run length coding of some kind (anyone
> care to dig in their source and get all the details?).

Plain RLL is probably a good way to store it and for merging two or more
bitmaps, but not as good for constructing it bit-by-bit. I guess the
most effective structure for updating is often still a plain bitmap
(maybe not if it is very sparse and all of it does not fit in cache),
followed by some kind of balanced tree (maybe rb-tree).

If the bitmap is relatively full then the plain bitmap is almost always
the most effective to update.

> If we tried anything in the way of lossy compression then there'd
> be even more stuff lurking under the hood.

Having three-valued (0,1,maybe) RLL-encoded "tritmap" would be a good
way to represent lossy compression, and it would also be quite
straightforward to merge two of these using AND or OR. It may even be
possible to easily construct it using a fixed-length b-tree and going
from 1 to "maybe" for nodes that get too dense.

---------------
Hannu



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Neil Conway
Дата:
Сообщение: Re: Open 7.3 items
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PITR, checkpoint, and local relations