Re: Fast insertion indexes: why no developments

Поиск
Список
Период
Сортировка
От Leonardo Francalanci
Тема Re: Fast insertion indexes: why no developments
Дата
Msg-id 1383661732562-5777009.post@n5.nabble.com
обсуждение исходный текст
Ответ на Re: Fast insertion indexes: why no developments  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Fast insertion indexes: why no developments  (Claudio Freire <klaussfreire@gmail.com>)
Re: Fast insertion indexes: why no developments  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
Claudio Freire wrote
> Min-max indexes always require a sequential scan of the min-max index
> itself when querying. 

I'm worried about the number of heap pages that will be scanned.
My understanding is that given the random input, the index will
not be selective enough, and will end up requiring a lot of page 
scanning to get the relevant rows.

That is: the "selectivity" of the min/max values will be very low, given
the high cardinality and randomness of the input; I'm afraid that, in
most "page ranges", the min will be lower than the queried ones,
and the max higher, resulting in lots of heap page scans.

Quick test:
a table with 1M rows, with random values from 1 to 10000000:
create table t as select generate_series(1, 1000000) as i, trunc(random() *
10000000) as n;

suppose a page range contains 100 rows (but my understanding is that minmax
index will use a much bigger row count...), let's find how many "page
ranges"
should be scanned to find a series of 50 values (again, random in the
1-10000000 range):

with cte as (select min(n) as minn, max(n) as maxn, i/100 from t group by i/100),inp as (select generate_series(1, 50)
iinp,trunc(random() * 10000000) as
 
s)select count(*) from inp  left outer join cte on inp.s between minn and maxn group by iinp


I get > 9000 pages for 49 values out of 50... which means scanning 90% of
the table.

Either my sql is not correct (likely), or my understanding of the minmax
index is
not correct (even more likely), or the minmax index is not usable in a
random inputs
scenario.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777009.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: logical column order and physical column order
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] configure: add git describe output to PG_VERSION when building a git tree