Re: Clustered index to preserve data locality in a multitenant application?

Поиск
Список
Период
Сортировка
От Nicolas Grilly
Тема Re: Clustered index to preserve data locality in a multitenant application?
Дата
Msg-id CAG3yVS5DL-DJyLpM-Df8yUhE9MuKhuJu8d_kQ752XGqKxDrxdg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Clustered index to preserve data locality in a multitenant application?  (Igor Neyman <ineyman@perceptron.com>)
Список pgsql-general
On Thu, Sep 1, 2016 at 3:08 PM, Igor Neyman <ineyman@perceptron.com> wrote:

Don’t know about plans to implement clustered indexes in PostgreSQL.


It was discussed on the mailing list in the past. 

I found an interesting thread dated from 2012 about integrating pg_reorg (the ancestor of pg_repack) in PostgreSQL core:


There is also an item titled "Automatically maintain clustering on a table" in the TODO list:

 

Not sure if this was mentioned, MS SQL Server has clustered indexes, where heap row is just stored on the leaf level of the index.

Oracle also has similar feature: IOT, Index Organized Table.

 

It seems to me (may be I’m wrong), that in PostgreSQL it should be much harder to implement clustered index (with the heap row stored in the index leaf) because of the way how MVCC implemented: multiple row versions are stored in the table itself (e.g. Oracle for that purpose keeps table “clean” and stores multiple row versions in UNDO tablespace/segment).


DB2, like PostgreSQL, stores rows in a heap, and not in the leafs of a Btree. But it's possible to define a "clustering" key for a table. When it is defined, DB2 tries to keep the rows in the heap ordered according to the clustering key. If DB2 can’t find space on the page where the row should go, then it looks a few pages before and after and puts it there, and if it still can’t find space then puts it at the end. There is also a feature called "multidimensional clustering" which is even more sophisticated. There is also a command REORG, which would be the equivalent of a non-blocking CLUSTER in PostgreSQL.

I think DB2's approach is interesting because it shows that maintaining spatial coherency is possible with a heap, without having to store rows in a Btree (like InnoDB).

В списке pgsql-general по дате отправления:

Предыдущее
От: Brent Douglas
Дата:
Сообщение: Re: PG_MODULE_MAGIC issue with small extension
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PG_MODULE_MAGIC issue with small extension