Re: Feature request for adoptive indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Feature request for adoptive indexes
Дата
Msg-id aef50e7e-55c1-2467-f7b0-3a2f90420a10@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Feature request for adoptive indexes  (Hayk Manukyan <manukyantt@gmail.com>)
Список pgsql-hackers

On 11/2/21 13:04, Hayk Manukyan wrote:
> Tomas Vondra
>  > Are you suggesting those are not the actual best/worst cases and we
>  > should use some other indexes? If yes, which ones?
> 
> I would say yes.
> In my case I am not querying only sequence column.
> I have the following cases which I want to optimize.
> 1. Select * from Some_table where job = <somthing> and nlp = <something> 
> and year = <something> and *scan_id = <something> *
> 2. Select * from Some_table where job = <somthing> and nlp = <something> 
> and year = <something> and *Issue_flag = <something> *
> 3. Select * from Some_table where job = <somthing> and nlp = <something> 
> and year = <something> and *sequence = <something> *
> Those are queries that my app send to db that is why I said that from 
> *read perspective* our *best case* is 3 separate indexes for
> *(job, nlp, year, sequence)* AND *(job, nlp, year, Scan_ID)* and *(job, 
> nlp, year,  issue_flag)*  and any other solution like
>   (job, nlp, year, sequence, Scan_ID, issue_flag) OR  (job, nlp, year ) 
> INCLUDE(sequence, Scan_ID, issue_flag)  OR just (job, nlp, year) can be 
> considered as*worst case *

I already explained why using INCLUDE in this case is the wrong thing to 
do, it'll harm performance compared to just defining a regular index.

> I will remind that in real world scenario I have ~50m rows and about 
> *~5k rows for each (job, nlp, year )*

Well, maybe this is the problem. We have 50M rows, but the three columns 
have too many distinct values - (job, nlp, year) defines ~50M groups, so 
there's only a single row per group. That'd explain why the two indexes 
perform almost equally.

So I guess you need to modify the data generator so that the data set is 
more like the case you're trying to improve.

>  From *write perspective* as far as we want to have only one index 
> our*best case* can be considered any of
> *(job, nlp, year, sequence, Scan_ID, issue_flag)* OR *(job, nlp, year ) 
> INCLUDE(sequence, Scan_ID, issue_flag) *
> and the*worst case* will be having 3 separate queries like in read 
> perspective
> (job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp, 
> year,  issue_flag)
> 

Maybe. It's true a write with three indexes will require modification to 
three leaf pages (on average). With a single index we have to modify 
just one leaf page, depending on where the row gets routed.

But with the proposed "merged" index, the row will have to be inserted 
into three smaller trees. If the trees are large enough, they won't fit 
into a single leaf page (e.g. the 5000 index tuples is guaranteed to 
need many pages, even if you use some smart encoding). So the write will 
likely need to modify at least 3 leaf pages, getting much closer to 
three separate indexes. At which point you could just use three indexes.

> So I think the comparison that we did is not right because we are 
> comparing different/wrong things.
>  > For right results we need to compare this two cases when we are doing
> random queries with 1,2,3  and random writes.
> 

I'm not going to spend any more time on tweaking the benchmark, but if 
you tweak it to demonstrate the difference / benefits I'll run it again 
on my machine etc.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nitin Jadhav
Дата:
Сообщение: Re: Multi-Column List Partitioning
Следующее
От: Pavel Borisov
Дата:
Сообщение: Re: Feature request for adoptive indexes