Re: again on index usage

Поиск
Список
Период
Сортировка
От Daniel Kalchev
Тема Re: again on index usage
Дата
Msg-id 200201080922.LAA02480@dcave.digsys.bg
обсуждение исходный текст
Ответ на Re: again on index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: again on index usage
Список pgsql-hackers
>>>Tom Lane said:> It's difficult to tell from this what it thinks the selectivity of the> ipdate index would be, since
therows estimate includes the effect of> the ipaddr and router restrictions.  What do you get from just> > explain>
SELECTsum(input) FROM iplog_gate200112 > WHERE >   '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02
00:00:00+02'A    ND >   '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02';
 

Same result (sorry, should have included this originally):


Aggregate  (cost=47721.72..47721.72 rows=1 width=8) ->  Seq Scan on iplog_gate200112  (cost=0.00..47579.54 rows=56873
width=8)

> If you say "set enable_seqscan to off", does that change the plan?

Yes. As expected (I no longer have the problem of NaN estimates :)

Aggregate  (cost=100359.71..100359.71 rows=1 width=8) ->  Index Scan using iplog_gate200112_ipdate_idx on
iplog_gate200112 
 
(cost=0.00..100217.52 rows=56873 width=8)

My belief is that the planner does not want to use index due to low value 
dispersion of the indexed attribute. When splitting the table into several 
smaller tables, index is used.

This bites me, because each such query takes at least 3 minutes and the script 
that generates these needs to execute few thousands queries.
> BTW, the planner does not associate function calls with indexes.  If you> want to have the ipaddr index considered
forthis query, you need to write> ipaddr <<= '193.68.240.0/20' not network_subeq(ipaddr, '193.68.240.0/20').> (But
IIRC,that only works in 7.2 anyway, not earlier releases :-()
 

This is what I though too, but using the ipdate index will be sufficient.

I understand my complaint is not a bug, but rather question of proper planner 
optimization (it worked 'as expected' in 7.0). Perhaps the planner should 
consider the total number of rows, as well as the dispersion factor. With the 
dispersion being around 1.5% and total rows 1.7 million this gives about 25k 
rows with the same value - large enough to trigger sequential scan, as far as 
I understand it, but the cost of scanning 1.7 million rows sequentially is 
just too high.

By the way, the same query takes approx 10 sec with set enable_seqscan to off.

Daniel



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephan Szabo
Дата:
Сообщение: Re: ON ERROR triggers
Следующее
От: Holger Krug
Дата:
Сообщение: Re: ON ERROR triggers