Implementing Bitmap Indexes

Поиск
Список
Период
Сортировка
От Victor Y. Yegorov
Тема Implementing Bitmap Indexes
Дата
Msg-id 41FB79DC.4040805@mits.lv
обсуждение исходный текст
Ответы Re: Implementing Bitmap Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Implementing Bitmap Indexes  ("Jim C. Nasby" <decibel@decibel.org>)
Список pgsql-hackers
Hello.

I'd like to implement bitmap indexes and want your comments. Here is
an essence of what I've found regarding bitmaps for the last month.

Consider the following table   So, the bitmap for attribute A will be the
with 1 attribute A(int2):      following: # | A                         Val | Bitmap(s)
----+---                       -----+--------------- 1 | 1                           1 | 11011001 0111 2 | 1
              2 | 00100100 1000 3 | 2                           3 | 00000010 0000 4 | 1 5 | 1 6 | 2 7 | 3 8 | 1 9 | 210
|111 | 112 | 1
 

Some points:
1) If some new value will be inserted (say, 4) at some point of time, a new  bitmap for it will be added. Same for
NULLs(if atrribute has no NOT NULL  contraint) --- one more bitmap. Or should we restrict "NOT NULL" for  bitmap'ed
attributes?;

2) Queries, like "where A = 1" or "where A != 2" will require only 1 scan of  the index, while "where A < 3" will
require2 stages: 1st create a 
 
list of  values lesser then 3, 2nd --- do OR of all bitmaps for that values.  For high cardinality attributes, this can
takea lot of time;
 

3) Each bitmap is only a bitmap, so there should be an array of 
corresponding  ctids pointers. Maybe, some more arrays (pages, don't know).

For 2)nd --- there are techniques, allowing better performance for "A < 3"
queries via increased storage space (see here for details:
http://delab.csd.auth.gr/papers/ADBIS03mmnm.pdf) and increased reaction time
for simple queries. I don't know, if they should be implemented, may later.

The most tricky part will be combinig multiple index scans on several
attributes --- as Neil Conway said on #postrgesql, this will be tricky, 
as some
modifications will be needed in the index scan api. I remember, Tom Lane
suggested on-disk bitmaps --- implementing bitmap index access method
would be of much use not only for bitmap indexes, I think.

WAH compressing method should be used for bitmaps (to my mind). Also, 
there is
a method of reordering heap tuples for better compression of bitmaps, I 
thought
it may be possible to implement it as some option to the existing CLUSTER
command, papers:
WAH: http://www-library.lbl.gov/docs/LBNL/496/26/PDF/LBNL-49626.pdf
CLUSTER: http://www.cse.ohio-state.edu/~hakan/publications/reordering.pdf

I'd like to hear from you, before starting to do something.

-- 

Victor


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mischa
Дата:
Сообщение: Re: Group-count estimation statistics
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [pgsql-hackers] Allow GRANT/REVOKE permissions to be applied to all schema