Re: Minmax indexes

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Minmax indexes
Дата
Msg-id 52495B7E.8080905@vmware.com
обсуждение исходный текст
Ответ на Re: Minmax indexes  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On 27.09.2013 21:43, Greg Stark wrote:
> On Fri, Sep 27, 2013 at 7:22 PM, Jim Nasby<jim@nasby.net>  wrote:
>>
>> Yeah, we obviously kept things simpler when adding forks in order to get the feature out the door. There's
improvementsthat need to be made. But IMHO that's not reason to automatically avoid forks; we need to consider the cost
ofimproving them vs what we gain by using them.
 
>
> We think this gives short change to the decision to introduce forks.
> If you go back to the discussion at the time it was a topic of debate
> and the argument which won the day is that interleaving different
> streams of data in one storage system is exactly what the file system
> is designed to do and we would just be reinventing the wheel if we
> tried to do it ourselves. I think that makes a lot of sense for things
> like the fsm or vm which grow indefinitely and are maintained by a
> different piece of code from the main heap.
>
> The tradeoff might be somewhat different for the pieces of a data
> structure like a bitmap index or gin index where the code responsible
> for maintaining it is all the same.

There are quite a dfew cases where we have several "streams" of data, 
all related to a single relation. We've solved them all in slightly 
different ways:

1. TOAST. A separate heap relation with accompanying b-tree index is 
created.

2. GIN. GIN contains a b-tree, and data pages (and somer other kinds of 
pages too IIRC). It would be natural to use the regular B-tree code for 
the B-tree, but instead it contains a completely separate 
implementation. All the different kinds of streams are stored in the 
main fork.

3. Free space map. Stored as a separate fork.

4. Visibility map. Stored as a separate fork.

And upcoming:

5. Minmax indexes, with the linearly-addressed range reverse map and 
variable lenghth index tuples.

6. Bitmap indexes. Like in GIN, there's a B-tree and the data pages 
containing the bitmaps.


A nice property of the VM and FSM forks currently is that they are just 
auxiliary information to speed things up. You can safely remove them 
(when the server is shut down), and the system will recreate them on 
next vacuum. It's not carved in stone that it has to be that way for all 
extra forks, but it is today and I like it.

I feel we need a new kind of a relation fork, something more 
heavy-weight than the current forks, but not as heavy-weight as the way 
TOAST does it. It would be nice if GIN and bitmap indexes could use the 
regular nbtree code. Or any other index type - imagine a bitmap index 
using a SP-GiST index instead of a B-tree! You could create a bitmap 
index for 2d points, and use it to speed up operations like overlap for 
example.

The nbtree code expects the data to be in the main fork and uses the FSM 
fork too. Maybe it could be abstracted, so that the regular b-tree could 
be used as part of another index type. Same with other indexams.

Perhaps relation forks need to be made more flexible, allowing access 
methods to define what forks exists. IOW, let's not avoid using relation 
forks, let's make them better instead.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ian Lawrence Barwick
Дата:
Сообщение: Re: review: psql and pset without any arguments
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Minmax indexes