Обсуждение: Question on Index usage
http://www.postgresql.org/docs/8.3/interactive/using-explain.html I tried reading that page, but it's still not clear to me, why the index dbmail_messages_1 is better than dbmail_messages_7: \d dbmail_messages Tabelle »public.dbmail_messages« Spalte | Typ | Attribute ----------------+-----------------------+--------------------------------------------------------------- message_idnr | bigint | not null default nextval('dbmail_message_idnr_seq'::regclass) mailbox_idnr | bigint | physmessage_id | bigint | <snip other colums not important> Indexe: »dbmail_messages_pkey« PRIMARY KEY, btree (message_idnr) »dbmail_messages_1« btree (mailbox_idnr) »dbmail_messages_2« btree (physmessage_id) »dbmail_messages_3« btree (seen_flag) »dbmail_messages_4« btree (unique_id) »dbmail_messages_5« btree (status) »dbmail_messages_6« btree (status) WHERE status < 2::smallint »dbmail_messages_7« btree (mailbox_idnr, status, seen_flag) CLUSTER »dbmail_messages_8« btree (mailbox_idnr, status, recent_flag) Then I do a SELECT that joins this with another table: EXPLAIN ANALYZE SELECT 1 FROM dbmail_messages msg JOIN dbmail_physmessage pm ON ( pm.id = msg.physmessage_id ) WHERE message_idnr BETWEEN 3178782 AND 3616157 AND mailbox_idnr = 3236 AND status IN (0,1,2) ORDER BY message_idnr ASC; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=8593.80..8605.30 rows=4599 width=8) (actual time=707.878..713.738 rows=5228 loops=1) Sort Key: msg.message_idnr Sort Method: quicksort Memory: 438kB -> Hash Join (cost=6891.33..8314.02 rows=4599 width=8) (actual time=667.133..697.308 rows=5228 loops=1) Hash Cond: (msg.physmessage_id = pm.id) -> Index Scan using dbmail_messages_1 on dbmail_messages msg (cost=0.00..1324.96 rows=4599 width=16) (actual time=0.094..10.958 rows=5228 loops=1) Index Cond: (mailbox_idnr = 3236) Filter: ((message_idnr >= 3178782) AND (message_idnr <= 3616157) AND (status = ANY ('{0,1,2}'::integer[]))) -> Hash (cost=4008.37..4008.37 rows=230637 width=8) (actual time=666.628..666.628 rows=229809 loops=1) -> Seq Scan on dbmail_physmessage pm (cost=0.00..4008.37 rows=230637 width=8) (actual time=0.012..303.139 rows=229809 loops=1) Total runtime: 729.972 ms Good, and now I DROP INDEX dbmail_messages_1; because anyway there are the _7 and _8 indices which both have mailbox_idnr as their first column, so sorting is the same as in the _1 index. And _7 is even used by CLUSTER. But that makes QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=10601.66..10613.20 rows=4615 width=8) (actual time=759.415..765.292 rows=5228 loops=1) Sort Key: msg.message_idnr Sort Method: quicksort Memory: 438kB -> Hash Join (cost=7036.45..10320.79 rows=4615 width=8) (actual time=720.352..748.912 rows=5228 loops=1) Hash Cond: (msg.physmessage_id = pm.id) -> Bitmap Heap Scan on dbmail_messages msg (cost=145.12..3330.99 rows=4650 width=16) (actual time=50.689..60.132 rows=5228 loops=1) Recheck Cond: (mailbox_idnr = 3236) Filter: ((message_idnr >= 3178782) AND (message_idnr <= 3616157) AND (status = ANY ('{0,1,2}'::integer[]))) -> Bitmap Index Scan on dbmail_messages_7 (cost=0.00..143.96 rows=5288 width=0) (actual time=50.628..50.628 rows=15759 loops=1) Index Cond: (mailbox_idnr = 3236) -> Hash (cost=4008.37..4008.37 rows=230637 width=8) (actual time=669.219..669.219 rows=229809 loops=1) -> Seq Scan on dbmail_physmessage pm (cost=0.00..4008.37 rows=230637 width=8) (actual time=0.014..301.793 rows=229809 loops=1) Total runtime: 781.238 ms So while with the _1 index an "Index Scan" is used, with the _7 it needs a "Bitmap Index Scan" plus a "Bitmap Heap Scan". Can somebody explain why the _1 index cannot be deleted without loosing performance? The plain "Index Scan" could be used with _7 or _8 anyway. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
Michael Monnerie wrote: > EXPLAIN ANALYZE SELECT 1 FROM dbmail_messages msg JOIN > dbmail_physmessage pm ON ( pm.id = msg.physmessage_id ) WHERE > message_idnr BETWEEN 3178782 AND 3616157 AND mailbox_idnr = 3236 AND > status IN (0,1,2) ORDER BY message_idnr ASC; [...] > -> Bitmap Heap Scan on dbmail_messages msg > (cost=145.12..3330.99 rows=4650 width=16) (actual time=50.689..60.132 > rows=5228 loops=1) > Recheck Cond: (mailbox_idnr = 3236) > Filter: ((message_idnr >= 3178782) AND (message_idnr <= > 3616157) AND (status = ANY ('{0,1,2}'::integer[]))) > -> Bitmap Index Scan on dbmail_messages_7 > (cost=0.00..143.96 rows=5288 width=0) (actual time=50.628..50.628 > rows=15759 loops=1) > Index Cond: (mailbox_idnr = 3236) > -> Hash (cost=4008.37..4008.37 rows=230637 width=8) (actual > time=669.219..669.219 rows=229809 loops=1) [...] > So while with the _1 index an "Index Scan" is used, with the _7 it needs > a "Bitmap Index Scan" plus a "Bitmap Heap Scan". Can somebody explain > why the _1 index cannot be deleted without loosing performance? The > plain "Index Scan" could be used with _7 or _8 anyway. Seems like the problem is that it is not pushing the "status IN" condition as part of the index condition for some reason, and instead using it as a filter. Maybe something to do with the selectivity of that clause? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mittwoch 21 Januar 2009 Alvaro Herrera wrote: > Seems like the problem is that it is not pushing the "status IN" > condition as part of the index condition for some reason, and instead > using it as a filter. Maybe something to do with the selectivity of > that clause? I was reading your answer several times, and I think I understand now what you mean. You mean "it doesn't use index _7 because the status IN (0,1,2) isn't usable". I didn't think about that, but that's another good point. Instead, I meant: Why can't I just delete index _1, if anyway I have index _7 with the same field, just plus additional fields. I thought that would be redundant: Index _1 = mailbox_idnr Index _7 = mailbox_idnr,status,seen_flag So I would guess that Index _1 is redundant, and I can delete it because the planner will use _7 instead. After all, for searching any mailbox_idnr in that table (~234k entries) it doesn't really matter if you use index _1 or _7, the sort is the same (if you don't care about the other fields). mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
Alvaro Herrera <alvherre@commandprompt.com> writes: > Seems like the problem is that it is not pushing the "status IN" > condition as part of the index condition for some reason, and instead > using it as a filter. Maybe something to do with the selectivity of > that clause? No, AFAIR the planner will *always* include every possibly relevant condition for a given index. If the condition is useless or nearly so, that might prompt it to pick a different index instead, but not to omit the indexqual. I'm thinking it's not believing it can use the status condition with that index, perhaps for datatype reasons. What is status declared as, exactly? Also, what PG version is this? regards, tom lane
On Mittwoch 21 Januar 2009 Tom Lane wrote: > No, AFAIR the planner will *always* include every possibly relevant > condition for a given index. If the condition is useless or nearly > so, that might prompt it to pick a different index instead, but not > to omit the indexqual. I'm thinking it's not believing it can use > the status condition with that index, perhaps for datatype reasons. > What is status declared as, exactly? OK, now I'm sure you're speaking in a direction I didn't mean ;-) But still, it could use the index _7 for the status field, as that's sorted and 0,1,2 would be all together. status | smallint | not null default (0)::smallint Maybe it doesn't do because it could well be 0,1,3 also, and it's not checking if all the values in the IN() statement appear in a row? > Also, what PG version is this? 8.3 on openSUSE 11.1: select version(); PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291] mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
On Mittwoch 21 Januar 2009 Alvaro Herrera wrote: > Seems like the problem is that it is not pushing the "status IN" > condition as part of the index condition for some reason, and instead > using it as a filter. Maybe something to do with the selectivity of > that clause? I was reading your answer several times, and I think I understand now what you mean. You mean "it doesn't use index _7 because the status IN (0,1,2) isn't usable". I didn't think about that, but that's another good point. Instead, I meant: Why can't I just delete index _1, if anyway I have index _7 with the same field, just plus additional fields. I thought that would be redundant: Index _1 = mailbox_idnr Index _7 = mailbox_idnr,status,seen_flag So I would guess that Index _1 is redundant, and I can delete it because the planner will use _7 instead. After all, for searching any mailbox_idnr in that table (~234k entries) it doesn't really matter if you use index _1 or _7, the sort is the same (if you don't care about the other fields). mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4