Обсуждение: Why is that index not used?

Поиск
Список
Период
Сортировка

Why is that index not used?

От
Michael Monnerie
Дата:
I know I'm pedantic today but I have logs enabled to see every query
done, and those who are called often should be as quick as possible,
where they aren't - and I want to understand why...

Can somebody explain me why on the last line, "dbmail_physmessage",
there is a seq. scan going on and no index used? The query says "where
pm.id = msg.physmessage_id", and there's an index on physmessage_id, so
what's the problem? This takes 700 of the 900ms of the query, as there
are 230k entries in physmessage. I see that that query is parallel to
the index scan, but that's not a good choice I think. Wouldn't it be
better to wait for the results of the "Filter message_idnr AND status"
and then search only the fitting physmessage_id's?

EXPLAIN ANALYZE SELECT seen_flag, answered_flag, deleted_flag,
flagged_flag, draft_flag, recent_flag, TO_CHAR(internal_date, 'YYYY-MM-
DD HH24:MI:SS' ), rfcsize, message_idnr FROM dbmail_messages msg,
dbmail_physmessage pm WHERE pm.id = msg.physmessage_id AND message_idnr
BETWEEN 3196318 AND 3619184 AND mailbox_idnr = 3241 AND status IN
(0,1,2) ORDER BY message_idnr ASC;
                                                                           QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=10466.09..10494.28 rows=11279 width=36) (actual
time=856.412..872.783 rows=13258 loops=1)
   Sort Key: msg.message_idnr
   Sort Method:  quicksort  Memory: 1420kB
   ->  Hash Join  (cost=6880.89..9706.93 rows=11279 width=36) (actual
time=702.001..822.022 rows=13258 loops=1)
         Hash Cond: (msg.physmessage_id = pm.id)
         ->  Index Scan using dbmail_messages_1 on dbmail_messages msg
(cost=0.00..2550.96 rows=11920 width=28) (actual time=0.123..30.881
rows=13258 loops=1)
               Index Cond: (mailbox_idnr = 3241)
               Filter: ((message_idnr >= 3196318) AND (message_idnr <=
3619184) AND (status = ANY ('{0,1,2}'::integer[])))
         ->  Hash  (cost=4004.84..4004.84 rows=230084 width=24) (actual
time=701.458..701.458 rows=229876 loops=1)
               ->  Seq Scan on dbmail_physmessage pm
(cost=0.00..4004.84 rows=230084 width=24) (actual time=0.015..319.395
rows=229876 loops=1)
 Total runtime: 897.722 ms

\d dbmail_physmessage;
 id            | bigint                      | not null default
nextval('dbmail_physmessage_id_seq'::regclass)
 messagesize   | bigint                      | not null default
(0)::bigint
 rfcsize       | bigint                      | not null default
(0)::bigint
 internal_date | timestamp without time zone |
Indexe:
    »dbmail_physmessage_pkey« PRIMARY KEY, btree (id) CLUSTER


mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Re: Why is that index not used?

От
Alvaro Herrera
Дата:
Michael Monnerie wrote:
> I know I'm pedantic today but I have logs enabled to see every query
> done, and those who are called often should be as quick as possible,
> where they aren't - and I want to understand why...

What Postgres version is this?



--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Why is that index not used?

От
Tom Lane
Дата:
Michael Monnerie <michael.monnerie@is.it-management.at> writes:
> Can somebody explain me why on the last line, "dbmail_physmessage",
> there is a seq. scan going on and no index used?

Because it chose to use a hash instead.  Given the rowcounts I don't
think that's wrong, but if you want to see the other plan you can try
setting enable_hashjoin = off (and maybe enable mergejoin = off if
it goes to a merge join next).

As for your other question, the reason it likes the dbmail_messages_1
index is probably that it's a lot smaller than dbmail_messages_7.
The only thing I'm seeing that seems curious is that when forced to
use dbmail_messages_7, it isn't using what seems to be an available
index condition.

            regards, tom lane

Re: Why is that index not used?

От
Michael Monnerie
Дата:
On Mittwoch 21 Januar 2009 Alvaro Herrera wrote:
> What Postgres version is this?

8.3 again, the same server as before (openSUSE 11.1)

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Re: Why is that index not used?

От
Michael Monnerie
Дата:
On Mittwoch 21 Januar 2009 Tom Lane wrote:
> > Can somebody explain me why on the last line, "dbmail_physmessage",
> > there is a seq. scan going on and no index used?
>
> Because it chose to use a hash instead.  Given the rowcounts I don't
> think that's wrong,

You mean reading 10k out of 234k entries from the table itself is
quicker than looking at the index and reading only those values needed
from the table?

> but if you want to see the other plan you can try
> setting enable_hashjoin = off (and maybe enable mergejoin = off if
> it goes to a merge join next).

Even worse:
# set enable_hashjoin = off ;
 Sort  (cost=12400.45..12427.22 rows=10707 width=36) (actual
time=922.751..934.921 rows=9935 loops=1)
   Sort Key: msg.message_idnr
   Sort Method:  quicksort  Memory: 1161kB
   ->  Merge Join  (cost=3240.39..11683.82 rows=10707 width=36) (actual
time=169.998..895.009 rows=9935 loops=1)
         Merge Cond: (pm.id = msg.physmessage_id)
         ->  Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage pm  (cost=0.00..7681.79 rows=230034 width=24) (actual
time=0.026..407.428 rows=229873 loops=1)
         ->  Sort  (cost=3239.57..3266.75 rows=10874 width=28) (actual
time=43.959..68.333 rows=9935 loops=1)
               Sort Key: msg.physmessage_id
               Sort Method:  quicksort  Memory: 1161kB
               ->  Index Scan using dbmail_messages_1 on dbmail_messages
msg  (cost=0.00..2510.54 rows=10874 width=28) (actual time=2.025..23.155
rows=9935 loops=1)
                     Index Cond: (mailbox_idnr = 3241)
                     Filter: ((message_idnr >= 3196318) AND
(message_idnr <= 3619184) AND (status = ANY ('{0,1,2}'::integer[])))
 Total runtime: 946.016 ms

Then in addition:
# set enable_mergejoin = off;
 Sort  (cost=15878.43..15905.19 rows=10707 width=36) (actual
time=343.430..358.492 rows=9935 loops=1)
   Sort Key: msg.message_idnr
   Sort Method:  quicksort  Memory: 1161kB
   ->  Nested Loop  (cost=0.00..15161.79 rows=10707 width=36) (actual
time=2.040..314.101 rows=9935 loops=1)
         ->  Index Scan using dbmail_messages_1 on dbmail_messages msg
(cost=0.00..2510.54 rows=10874 width=28) (actual time=1.953..30.407
rows=9935 loops=1)
               Index Cond: (mailbox_idnr = 3241)
               Filter: ((message_idnr >= 3196318) AND (message_idnr <=
3619184) AND (status = ANY ('{0,1,2}'::integer[])))
         ->  Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage pm  (cost=0.00..1.15 rows=1 width=24) (actual
time=0.016..0.018 rows=1 loops=9935)
               Index Cond: (pm.id = msg.physmessage_id)
 Total runtime: 372.209 ms

> As for your other question, the reason it likes the dbmail_messages_1
> index is probably that it's a lot smaller than dbmail_messages_7.

Sounds reasonable.

> The only thing I'm seeing that seems curious is that when forced to
> use dbmail_messages_7, it isn't using what seems to be an available
> index condition.

Yes, I simply deleted the _1 key. And I expected that _7 or _8 are used,
because they provide the same first field order, so there's no
difference in the sort order for that row. Especially since it's UNIQUE.

A general question again, because I would like to understand it:
Why can't I just delete index _1, if anyway I have
index _7 with the same field, just plus additional fields. I thought
that would be redundant:

Index _1 = mailbox_idnr
Index _7 = mailbox_idnr,status,seen_flag

So I would guess that Index _1 is redundant, and I can delete it because
the planner will use _7 instead. After all, for searching any
mailbox_idnr in that table (~234k entries) it doesn't really matter if
you use index _1 or _7, the sort is the same (if you don't care about
the other fields).

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4