Re: 7.0 like selectivity
От | Tom Lane |
---|---|
Тема | Re: 7.0 like selectivity |
Дата | |
Msg-id | 27244.955034968@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | 7.0 like selectivity ("Hiroshi Inoue" <Inoue@tpf.co.jp>) |
Ответы |
RE: 7.0 like selectivity
("Hiroshi Inoue" <Inoue@tpf.co.jp>)
|
Список | pgsql-hackers |
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > For the query > select hoge_cd,shimeinn,tel > from t_hoge > where shimeinn like 'imag%' > and tel like '012%' > order by hoge_cd > limit 100; > 64 rows returned immediately. > And for the query > select hoge_cd,shimeinn,tel > from t_hoge > where shimeinn like 'imag%' > and tel like '012-3%' > order by hoge_cd > limit 100; > 24 rows returned after waiting 8 minutes. So what were the plans for these two queries? Also, has this table been "vacuum analyzed"? > I got the following output from him. > explain select * from t_hoge where tel like '012%'; > Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.23 rows=1981 > width=676) > explain select * from t_hoge where tel like '012-3%'; > Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.00 rows=1981 > width=676) > In fact,count(*) is 342323 and 114741 respectively. > The first problem is that estimated cost is too low. > It seems that the index selectivity of '012-3%' = the index > selectivity of '012%' / (256*256),right ? > If so,does it give more practical estimation than before ? > It doesn't correspond to rows information either. The rows number is fairly bogus (because it's coming from application of eqsel, which is not the right thing; perhaps someday LIKE should have its very own selectivity estimation function). But the cost estimate is driven by the estimated selectivity oftel >= '012-3' AND tel < '012-4' and it would be nice to think that we have some handle on that. It could be that the thing is getting fooled by a very non-uniform distribution of telephone numbers. You indicate that most of the numbers in the DB begin with '012', but if there are a small number that begin with digits as high as 9, the selectivity estimates would be pretty bad. > In reality, * shimeinn like 'imag%' * is much more restrictive > than * tel like '012-3%' *. However I couldn't think of the > way to foresee which is more restrictive. Now I doubt whether > we have enough information to estimate LIKE selectivity > correctly. No, we don't, because we only keep track of the min and max values in each column and assume that the data is uniformly distributed between those limits. Perhaps someday we could keep a histogram instead --- but VACUUM ANALYZE would get a lot slower and more complicated ... regards, tom lane
В списке pgsql-hackers по дате отправления: