The standard 'why does it take so long' question

Поиск
Список
Период
Сортировка
От Nigel J. Andrews
Тема The standard 'why does it take so long' question
Дата
Msg-id Pine.LNX.4.21.0208090311300.3235-100000@ponder.fairway2k.co.uk
обсуждение исходный текст
Ответы Re: The standard 'why does it take so long' question  (Neil Conway <nconway@klamath.dyndns.org>)
Re: The standard 'why does it take so long' question  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general

I have this query which used to yield this EXPLAIN ANALYZE [reformated] output:

Sort  (cost=12830.25..12830.25 rows=391 width=26)
       (actual time=39966.50..40305.70 rows=26982 loops=1)
  ->  Nested Loop  (cost=0.00..12813.40 rows=391 width=26)
                    (actual time=69.16..34152.78 rows=26982 loops=1)
        ->  Index Scan using chat_user_lcasename_idx
                           , chat_user_lcasename_idx on chat_user u
                             (cost=0.00..7.21 rows=2 width=12)
                              (actual time=1.06..65.62 rows=2 loops=1)
        ->  Index Scan using chat_post_usertime_idx on chat_post p
                             (cost=0.00..6383.72 rows=1638 width=14)
                              (actual time=72.31..10764.40 rows=13491 loops=2)
Total runtime: 40739.10 msec

where chat_post_usertime_idx is from:
  CREATE INDEX blah blah ...  btree(user_id,time)


After adding another index chat_post_time_user_idx, which reverses the order of
the two columns in the above index, EXPLAIN ANALYZE yields:

Sort  (cost=9677.13..9677.13 rows=391 width=26)
       (actual time=62267.30..62554.49 rows=26982 loops=1)
  ->  Nested Loop  (cost=0.00..9660.28 rows=391 width=26)
                    (actual time=174.77..55699.39 rows=26982 loops=1)
        ->  Index Scan using chat_user_lcasename_idx
                           , chat_user_lcasename_idx on chat_user u
                             (cost=0.00..7.21 rows=2 width=12)
                              (actual time=37.82..97.44 rows=2 loops=1)
        ->  Index Scan using chat_post_timeuser_idx on chat_post p
                             (cost=0.00..4806.89 rows=1638 width=14)
                              (actual time=328.81..19932.71 rows=13491 loops=2)
Total runtime: 62983.64 msec


A 50% performance loss. I'm just trying to understand where the extra time
comes from.

Rearranging these explains into pairs of lines such that the first line of a
pair is from the first explain and the second line from the second (works best
in a _wide_ window) we can more easily compare things.
Sort  (cost=12830.25..12830.25 rows=391 width=26) (actual time=39966.50..40305.70 rows=26982 loops=1)
Sort  (cost=9677.13..9677.13 rows=391 width=26) (actual time=62267.30..62554.49 rows=26982 loops=1)

  ->  Nested Loop  (cost=0.00..12813.40 rows=391 width=26) (actual time=69.16..34152.78 rows=26982 loops=1)
  ->  Nested Loop  (cost=0.00..9660.28 rows=391 width=26) (actual time=174.77..55699.39 rows=26982 loops=1)

        ->  Index Scan using chat_user_lcasename_idx, chat_user_lcasename_idx on chat_user u  (cost=0.00..7.21 rows=2
width=12)(actual time=1.06..65.62 rows=2 loops=1) 
        ->  Index Scan using chat_user_lcasename_idx, chat_user_lcasename_idx on chat_user u  (cost=0.00..7.21 rows=2
width=12)(actual time=37.82..97.44 rows=2 loops=1) 

        ->  Index Scan using chat_post_usertime_idx on chat_post p  (cost=0.00..6383.72 rows=1638 width=14) (actual
time=72.31..10764.40rows=13491 loops=2) 
        ->  Index Scan using chat_post_timeuser_idx on chat_post p  (cost=0.00..4806.89 rows=1638 width=14) (actual
time=328.81..19932.71rows=13491 loops=2) 

Total runtime: 40739.10 msec
Total runtime: 62983.64 msec

It's pretty obvious from this that the index scan on chat_user takes longer
when the new index is used on the other table. The index scan of the new index
obviously takes longer as well.

The first question is why would the index scan on chat_user take significantly
longer than before? All I could think of is that the use of the new index has
forced more disk reads. Does that sound reasonable and what are the
alternatives?

The next question is why would the index scan of the new index take
significantly longer than before? From pg_class the new index uses 3814 pages
and the preexisting one 4191. Therefore, one would have thought that the scan
of the new index would take less time since less disk reads...uh oh, I think I
may have just thought this out clearer...the new index takes longer to scan
because it presumbly has to read more pages of the index to cover the required
time interval than the older index because the older index has already narrowed
the selection down by the user. Does that sound reasonable? That would then
support my supposition about the first question, right?

My ignorance will show once more now; why would the index scan on chat_user be
impacted by a loss of index pages from shared memory/disk cache? Isn't this
scan finished before the one on chat_post starts? Or would that scenario have
meant a hash join would be in the explain output? This would go some way to
explaining the extra time taken on chat_post if I'm missunderstanding this. I
think I'll look up the explain output in the manual again as well.

Oh, and before it's asked, the DB has been analyzed after the last data load
and there was no change in data between these runs. Also:

                               version
----------------------------------------------------------------------
 PostgreSQL 7.2.1 on i386-unknown-freebsd3.3, compiled by GCC 2.7.2.3


I hope I haven't put y'all to sleep.


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants


В списке pgsql-general по дате отправления:

Предыдущее
От: Rob Brown-Bayliss
Дата:
Сообщение: uncommited question
Следующее
От: Neil Conway
Дата:
Сообщение: Re: The standard 'why does it take so long' question