Re: GSoC 2014 proposal

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GSoC 2014 proposal
Дата
Msg-id CAPpHfdv=N2YVM8xYp3O-Kco1u9e8CDupdrhmMn_5+nYhcDY_PA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GSoC 2014 proposal  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: GSoC 2014 proposal  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
The BIRCH algorithm as described in the paper describes building a tree in memory. If I understood correctly, you're suggesting to use a pre-built GiST index instead. Interesting idea!

There are a couple of signifcant differences between the CF tree described in the paper and GiST:

1. In GiST, a leaf item always represents one heap tuple. In the CF tree, a leaf item represents a cluster, which consists of one or more tuples. So the CF tree doesn't store an entry for every input tuple, which makes it possible to keep it in memory.

2. In the CF tree, "all entries in a leaf node must satisfy a threshold requirement, with respect to a threshold value T: the diameter (or radius) has to be less than T". GiST imposes no such restrictions. An item can legally be placed anywhere in the tree; placing it badly will just lead to degraded search performance, but it's still a legal GiST tree.

3. A GiST index, like any other index in PostgreSQL, holds entries also for deleted tuples, until the index is vacuumed. So you cannot just use information from a non-leaf node and use it in the result, as the information summarized at a non-leaf level includes noise from the dead tuples.

Can you elaborate how you are planning to use a GiST index to implement BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more strict in where in the tree an item can be stored, and lets the operator class to specify exactly when a node is split etc.

Hmmm, it's likely I've imagined something quite outside of this paper, and even already suggested it to Ivan... :)
I need a little time to rethink it.

------
With best regards,
Alexander Korotkov. 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Including replication slot data in base backups
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Including replication slot data in base backups