Re: [HACKERS] Why is that so slow?

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: [HACKERS] Why is that so slow?
Дата
Msg-id 199903061408.XAA00494@ext16.sra.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] Why is that so slow?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Why is that so slow?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid;
> > NOTICE:  QUERY PLAN:
> > 
> > Nested Loop  (cost=98.90 size=1 width=100)
> >   ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
> >   ->  Index Scan using pidindex on postal  (cost=2.05 size=1 width=74)
> > 
> > This is so slooow. Can anybody explain this? Am I missing something?
> 
> and later:
> > I had defined a btree index on pid and it has 2000
> > duplicate entries in average! After I removed the index, the query
> > runs unbelievably fast! Now explain shows:
> 
> > Nested Loop  (cost=933.82 size=1 width=100)
> > -> Index Scan using cityindex on postal  (cost=931.77 size=1 width=74)
> > -> Index Scan using prefpidindex on prefecture  (cost=2.05 size=47 width=26)
> 
> Hmm.  Removal of the index is just a hack --- the system should have
> been smart enough not to use it.  It looks like the system chose the
> first plan shown above because it thought that selecting postal entries
> matching a particular pid value would on average match only one postal
> tuple (note the "size" fields, which are estimates of the numbers of
> resulting tuples).  But in reality, each scan produced 2000 matching
> entries on average, according to your second message --- and each of
> those entries had to be tested to see if it had the right city name.
> So, very slow.
> 
> The question I have is why didn't the system realize that there would be
> lots of matches on pid?  The "dispersion" statistics it keeps ought to
> have given it a clue that this approach wouldn't be very selective.
> 
> The second example is fast because the scan over postal looking for city
> name matches finds only one match, so prefecture is scanned only once.

Actulally not only one since I use ~ operator. Anyway matching rows
would be reasonably small.

> However the cost estimates there also look bogus --- the system is again
> mis-guessing how many entries will be selected.  It seems to think that
> all 47 prefecture entries will be matched by a scan for a specific pid.
> So, bogus dispersion values again (or bad use of them).
> 
> Something is fishy here.  Have you done a "vacuum analyze" since loading
> the data in these tables?

Oh, I never thought about that. After re-made the index I removed in
the next letter and did vacuum analyze, I got:

Hash Join  (cost=951.50 size=19 width=100) ->  Index Scan using cityindex on postal  (cost=944.77 size=19 width=74) ->
Hash (cost=0.00 size=0 width=0)       ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
 

This plan looks good(and actually as fast as the previous
one). However, the cost estimate for prefecture is again 47?
--
Tatsuo Ishii


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Vadim Mikheev
Дата:
Сообщение: Re: [HACKERS] Bug on complex join
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Why is that so slow?