Re: Feature request for adoptive indexes

Поиск
Список
Период
Сортировка
От Pavel Borisov
Тема Re: Feature request for adoptive indexes
Дата
Msg-id CALT9ZEHoH1PLQqh=YLtA67jtAsKpTWEY5i+7P2RzkyaZA60OCg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Feature request for adoptive indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Feature request for adoptive indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
I've already answered OP but it seems in the wrong thread, so I copy it here:

I think now in many cases you can effectively use covering index to have fast index-only scans without index duplication. It will help if you don't have great selectivity on the last column (most probably you don't). E.g.:

CREATE INDEX ON table_name (`job`,`nlp`,`year`) INCLUDE (`scan_id`, `issue_flag`, `sequence`)

But I consider the feature can be useful when there is a very little selectivity in the first index columns. I.e. if (job`,`nlp`,`year') has many repeats and the most selection is done in the last column. I am not sure how often this can arise but in general, I see it as a useful b-tree generalization.

I'm not sure how it should be done. In my view, we need to add an ordered posting tree as a leaf element if b-tree and now we have index storage only for tuples. The change of on-disk format was previously not easy in nbtree and if we consider the change, we need an extra bit to mark posting trees among index tuples. Maybe it could be done in a way similar to deduplicated tuples if some bits in the tuple header are still could be freed.

Thoughts? 

If I get what you propose, you want to have a "top" tree for (job, nlp,
year), which "splits" the data set into subsets of ~5000-7000 rows. And
then for each subset you want a separate "small" trees on each of the
other columns, so in this case three trees.

Well, the problem with this is pretty obvious - each of the small trees
requires separate copies of the leaf pages. And remember, in a btree the
internal pages are usually less than 1% of the index, so this pretty
much triples the size of the index. And if you insert a row into the
index, it has to insert the item pointer into each of the small trees,
likely requiring a separate I/O for each.

So I'd bet this is not any different from just having three separate
indexes - it doesn't save space, doesn't save I/O, nothing.
 
Tomas, I really think we should not try realizing this feature using existing index pages that contain only tuples. You are right, it will cause large overhead. If instead we decide and succeed in creating "posting trees" as a new on-disk page entry type we can have an index with space comparable to the abovementioned covering index but with sorting of values in these trees (i.e. all values are sorted, and "key" ones).

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mahendra Singh Thalor
Дата:
Сообщение: Re: Replication & recovery_min_apply_delay
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: XTS cipher mode for cluster file encryption