Re: [HACKERS] Why is that so slow?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Why is that so slow?
Дата
Msg-id 3107.920676281@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Why is that so slow?  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Ответы Re: [HACKERS] Why is that so slow?  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-hackers
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid;
> NOTICE:  QUERY PLAN:
> 
> Nested Loop  (cost=98.90 size=1 width=100)
>   ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
>   ->  Index Scan using pidindex on postal  (cost=2.05 size=1 width=74)
> 
> This is so slooow. Can anybody explain this? Am I missing something?

and later:
> I had defined a btree index on pid and it has 2000
> duplicate entries in average! After I removed the index, the query
> runs unbelievably fast! Now explain shows:

> Nested Loop  (cost=933.82 size=1 width=100)
> -> Index Scan using cityindex on postal  (cost=931.77 size=1 width=74)
> -> Index Scan using prefpidindex on prefecture  (cost=2.05 size=47 width=26)

Hmm.  Removal of the index is just a hack --- the system should have
been smart enough not to use it.  It looks like the system chose the
first plan shown above because it thought that selecting postal entries
matching a particular pid value would on average match only one postal
tuple (note the "size" fields, which are estimates of the numbers of
resulting tuples).  But in reality, each scan produced 2000 matching
entries on average, according to your second message --- and each of
those entries had to be tested to see if it had the right city name.
So, very slow.

The question I have is why didn't the system realize that there would be
lots of matches on pid?  The "dispersion" statistics it keeps ought to
have given it a clue that this approach wouldn't be very selective.

The second example is fast because the scan over postal looking for city
name matches finds only one match, so prefecture is scanned only once.
However the cost estimates there also look bogus --- the system is again
mis-guessing how many entries will be selected.  It seems to think that
all 47 prefecture entries will be matched by a scan for a specific pid.
So, bogus dispersion values again (or bad use of them).

Something is fishy here.  Have you done a "vacuum analyze" since loading
the data in these tables?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Davis
Дата:
Сообщение: permissions problem
Следующее
От: Michael Davis
Дата:
Сообщение: RE: [GENERAL] permissions problem