Re: Feature request for adoptive indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Feature request for adoptive indexes
Дата
Msg-id a2183996-8eba-70be-c121-f76810717308@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Feature request for adoptive indexes  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Feature request for adoptive indexes  (Hayk Manukyan <manukyantt@gmail.com>)
Список pgsql-hackers

On 11/1/21 21:06, Robert Haas wrote:
> On Tue, Oct 26, 2021 at 11:11 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> If I get what you propose, you want to have a "top" tree for (job, nlp,
>> year), which "splits" the data set into subsets of ~5000-7000 rows. And
>> then for each subset you want a separate "small" trees on each of the
>> other columns, so in this case three trees.
>>
>> Well, the problem with this is pretty obvious - each of the small trees
>> requires separate copies of the leaf pages. And remember, in a btree the
>> internal pages are usually less than 1% of the index, so this pretty
>> much triples the size of the index. And if you insert a row into the
>> index, it has to insert the item pointer into each of the small trees,
>> likely requiring a separate I/O for each.
>>
>> So I'd bet this is not any different from just having three separate
>> indexes - it doesn't save space, doesn't save I/O, nothing.
> 
> I agree. In a lot of cases, it's actually useful to define the index
> on fewer columns, like (job, nlp, year) or even just (job, nlp) or
> even just (job) because it makes the index so much smaller and that's
> pretty important. If you have enough duplicate entries in a (job, nlp,
> year) index to justify create a (job, nlp, year, sequence) index, the
> number of distinct (job, nlp, year) tuples has to be small compared to
> the number of (job, nlp, year, sequence) tuples - and that means that
> you wouldn't actually save much by combining your (job, nlp, year,
> sequence) index with a (job, nlp, year, other-stuff) index. As you
> say, the internal pages aren't the problem.
> 
> I don't intend to say that there's no possible use for this kind of
> technology. Peter G. says that people are writing research papers
> about that and they probably wouldn't be doing that unless they'd
> found some case where it's a big win. But this example seems extremely
> unconvincing.
> 

I actually looked at the use case mentioned by Peter G, i.e. merged 
indexes with master-detail clustering (see e.g. [1]), but that seems 
like a rather different thing. The master-detail refers to storing rows 
from multiple tables, interleaved in a way that allows faster joins. So 
it's essentially a denormalization tool.

Perhaps there's something we could learn about efficient storage of the 
small trees, but I haven't found any papers describing that (I haven't 
spent much time on the search, though).

[1] Algorithms for merged indexes, Goetz Graefe
     https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.7709


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Следующее
От: Ilya Gladyshev
Дата:
Сообщение: Re: Partial aggregates pushdown