Re: Question about indexes

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: Question about indexes
Дата
Msg-id 1075502634.4007.32.camel@fuji.krosing.net
обсуждение исходный текст
Ответ на Re: Question about indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Question about indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane kirjutas R, 30.01.2004 kell 16:48:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Also, what does an in-memory bitmapped index look like?
>
> One idea that might work: a binary search tree in which each node
> represents a single page of the table, and contains a bit array with
> one bit for each possible item number on the page.  You could not need
> more than BLCKSZ/(sizeof(HeapTupleHeaderData)+sizeof(ItemIdData)) bits
> in a node, or about 36 bytes at default BLCKSZ --- for most tables you
> could probably prove it would be a great deal less.  You only allocate
> nodes for pages that have at least one interesting row.

Another idea would be using bitmaps where we have just one bit per
database page and do a seq scan but just over marked pages.

Even when allocating them in full such indexes would occupy just
1/(8k*8bit) of the amount they describe, so index for 1GB table would be
1G/(8k*8bit) = 16 kilobytes (2 pages)

Also, such indexes, if persistent, could also be used (together with
FSM) when deciding placement of new tuples, so they provide a form of
clustering.

This would of course be most useful for data-warehouse type operations,
where database is significantöy bigger than memory.

And the seqscan over bitmap should not be done in simple page order, but
rather in two passes -1. over those pages which are already in cache (either postgresqls    or systems (if we find a
wayto get such info from the system))2. in sequential order over the rest. 

> I think this would represent a reasonable compromise between size and
> insertion speed.  It would only get large if the indexscan output
> demanded visiting many different pages --- but at some point you could
> abandon index usage and do a sequential scan, so I think that property
> is okay.

One case where almost full intermediate bitmap could be needed is when
doing a star join or just AND of several conditions, where each single
index spans a significant part of the table, but the result does not.

> A variant is to make the per-page bit arrays be entries in a hash table
> with page number as hash key.  This would reduce insertion to a nearly
> constant-time operation, but the drawback is that you'd need an explicit
> sort at the end to put the per-page entries into page number order
> before you scan 'em.  You might come out ahead anyway, not sure.
>
> Or we could try a true linear bitmap (indexed by page number times
> max-items-per-page plus item number) that's compressed in some fashion,
> probably just by eliminating large runs of zeroes.  The difficulty here
> is that inserting a new one-bit could be pretty expensive, and we need
> it to be cheap.
>
> Perhaps someone can come up with other better ideas ...

I have also contemplated a scenario, where we could use some
not-quite-max power-of-2 bits-per-page linear bitmap and mark intra-page
wraps (when we tried to mark a point past that not-quite-max number in a
page) in high bit (or another bitmap) making info for that page folded.
AN example would be setting bit 40 in 32-bits/page index - this would
set bit 40&31 and mark the page folded.

When combining such indexes using AND or OR, we need some spcial
handling of folded pages, but could still get non-folded (0) results out
from AND of 2 folded pages if the bits are distributed nicely.

--------------
Hannu















В списке pgsql-hackers по дате отправления:

Предыдущее
От: Scott Lamb
Дата:
Сообщение: Re: Mixing threaded and non-threaded
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: returning PGresult as xml