Обсуждение: Left Join Not Using Index?
I have a left join that doesn't seem to be using an index I created, and the query's performance needs to improve. I have two tables that model a message board: Table "public.message_board_topics" Column | Type | Modifiers ---------------------+-----------------------------+----------- rec_num | integer | not null topic_name | character varying(255) | topic_body | text | topic_author | character varying(20) | topic_author_email | character varying(50) | topic_date | date | topic_updated | timestamp without time zone | administrator_topic | boolean | number_of_comments | integer | Indexes: message_board_topics_pkey primary key btree (rec_num) Triggers: RI_ConstraintTrigger_819942, RI_ConstraintTrigger_819943 Table "public.message_board_comments" Column | Type | Modifiers ----------------------+------------------------+----------- rec_num | integer | not null topic_id | integer | comment_parent | integer | comment_name | character varying(255) | comment_body | text | comment_author | character varying(20) | comment_author_email | character varying(50) | comment_date | date | Indexes: message_board_comments_pkey primary key btree (rec_num), message_board_comments_topic_id btree (topic_id) Triggers: RI_ConstraintTrigger_819941 The query is: SELECT DISTINCT message_board_topics.rec_num, message_board_topics.topic_name, message_board_topics.topic_body, message_board_topics.topic_author, message_board_topics.topic_author_email, message_board_topics.topic_updated, message_board_topics.administrator_topic, message_board_topics.number_of_comments, to_char(topic_date, 'MM.DD.YYYY') as formatted_date FROM message_board_topics left join message_board_comments on (message_board_comments.topic_id=message_board_topics.rec_num) WHERE upper(topic_name) LIKE upper('madbrowser') OR upper(topic_body) LIKE upper('madbrowser') OR upper(topic_author) LIKE upper('madbrowser') OR upper(message_board_comments.comment_author) LIKE upper('madbrowser') ORDER BY message_board_topics.rec_num DESC Explain outputs: Unique (cost=34847.38..35145.38 rows=1192 width=293) -> Sort (cost=34847.38..34877.18 rows=11920 width=293) Sort Key: message_board_topics.rec_num, message_board_topics.topic_name, message_board_topics.topic_body, message_board_topics.topic_author, message_board_topics.topic_author_email, message_board_topics.topic_updated, message_board_topics.administrator_topic, message_board_topics.number_of_comments, to_char((message_board_topics.topic_date)::timestamp with time zone, 'MM.DD.YYYY'::text) -> Merge Join (cost=26858.21..33007.14 rows=11920 width=293) Merge Cond: ("outer".rec_num = "inner".topic_id) Filter: ((upper(("outer".topic_name)::text) ~~ 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ 'MADBROWSER'::text) OR (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) -> Sort (cost=2446.01..2475.81 rows=11920 width=265) Sort Key: message_board_topics.rec_num -> Seq Scan on message_board_topics (cost=0.00..712.20 rows=11920 width=265) -> Sort (cost=24412.20..24818.15 rows=162382 width=28) Sort Key: message_board_comments.topic_id -> Seq Scan on message_board_comments (cost=0.00..7203.82 rows=162382 width=28) It doesn't seem to be using the index in topic_id... What can I do to help the planner figure out about that index? Thanks, Hunter
On Tue, 22 Apr 2003, Hunter Hillegas wrote: > I have a left join that doesn't seem to be using an index I created, and the > query's performance needs to improve. > > I have two tables that model a message board: > > Table "public.message_board_topics" > Column | Type | Modifiers > ---------------------+-----------------------------+----------- > rec_num | integer | not null > topic_name | character varying(255) | > topic_body | text | > topic_author | character varying(20) | > topic_author_email | character varying(50) | > topic_date | date | > topic_updated | timestamp without time zone | > administrator_topic | boolean | > number_of_comments | integer | > Indexes: message_board_topics_pkey primary key btree (rec_num) > Triggers: RI_ConstraintTrigger_819942, > RI_ConstraintTrigger_819943 > > > Table "public.message_board_comments" > Column | Type | Modifiers > ----------------------+------------------------+----------- > rec_num | integer | not null > topic_id | integer | > comment_parent | integer | > comment_name | character varying(255) | > comment_body | text | > comment_author | character varying(20) | > comment_author_email | character varying(50) | > comment_date | date | > Indexes: message_board_comments_pkey primary key btree (rec_num), > message_board_comments_topic_id btree (topic_id) > Triggers: RI_ConstraintTrigger_819941 > > The query is: > > SELECT DISTINCT message_board_topics.rec_num, > message_board_topics.topic_name, message_board_topics.topic_body, > message_board_topics.topic_author, message_board_topics.topic_author_email, > message_board_topics.topic_updated, > message_board_topics.administrator_topic, > message_board_topics.number_of_comments, to_char(topic_date, 'MM.DD.YYYY') > as formatted_date FROM message_board_topics left join message_board_comments > on (message_board_comments.topic_id=message_board_topics.rec_num) WHERE > upper(topic_name) LIKE upper('madbrowser') OR upper(topic_body) LIKE > upper('madbrowser') OR upper(topic_author) LIKE upper('madbrowser') OR > upper(message_board_comments.comment_author) LIKE upper('madbrowser') ORDER > BY message_board_topics.rec_num DESC > > Explain outputs: > > Unique (cost=34847.38..35145.38 rows=1192 width=293) > -> Sort (cost=34847.38..34877.18 rows=11920 width=293) > Sort Key: message_board_topics.rec_num, > message_board_topics.topic_name, message_board_topics.topic_body, > message_board_topics.topic_author, message_board_topics.topic_author_email, > message_board_topics.topic_updated, > message_board_topics.administrator_topic, > message_board_topics.number_of_comments, > to_char((message_board_topics.topic_date)::timestamp with time zone, > 'MM.DD.YYYY'::text) > -> Merge Join (cost=26858.21..33007.14 rows=11920 width=293) > Merge Cond: ("outer".rec_num = "inner".topic_id) > Filter: ((upper(("outer".topic_name)::text) ~~ > 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ 'MADBROWSER'::text) OR > (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR > (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) > -> Sort (cost=2446.01..2475.81 rows=11920 width=265) > Sort Key: message_board_topics.rec_num > -> Seq Scan on message_board_topics > (cost=0.00..712.20 rows=11920 width=265) > -> Sort (cost=24412.20..24818.15 rows=162382 width=28) > Sort Key: message_board_comments.topic_id > -> Seq Scan on message_board_comments > (cost=0.00..7203.82 rows=162382 width=28) > > It doesn't seem to be using the index in topic_id... What can I do to help > the planner figure out about that index? It's deciding that seq scan + sort of all the rows is faster than the index scan over all the rows (which may very well be true). What does explain analyze say with enable_seqscan set to true and false? If you only had extra conditions on one of the two tables and had appropriate indexes (and reworked the query a little) I'd think you might be able to get a better plan. As it is, I'm not sure what'd work best I'd think that a query doing the left join with only the message_board_topics conditions unioned with an inner join and the message_board_comments condition would give the same results with some massaging, but I don't really know if that'd ever perform better anyway.
Thanks for responding... With enable_seqscan = false: Unique (cost=545747.57..546045.57 rows=1192 width=293) (actual time=40851.49..40854.80 rows=51 loops=1) -> Sort (cost=545747.57..545777.37 rows=11920 width=293) (actual time=40851.48..40852.09 rows=292 loops=1) Sort Key: message_board_topics.rec_num, message_board_topics.topic_name, message_board_topics.topic_body, message_board_topics.topic_author, message_board_topics.topic_author_email, message_board_topics.topic_updated, message_board_topics.administrator_topic, message_board_topics.number_of_comments, to_char((message_board_topics.topic_date)::timestamp with time zone, 'MM.DD.YYYY'::text) -> Merge Join (cost=0.00..543907.33 rows=11920 width=293) (actual time=482.05..40847.19 rows=292 loops=1) Merge Cond: ("outer".rec_num = "inner".topic_id) Filter: ((upper(("outer".topic_name)::text) ~~ 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ 'MADBROWSER'::text) OR (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) -> Index Scan using message_board_topics_pkey on message_board_topics (cost=0.00..2532.12 rows=11920 width=265) (actual time=428.32..4893.13 rows=11920 loops=1) -> Index Scan using message_board_comments_topic_id on message_board_comments (cost=0.00..535662.04 rows=162382 width=28) (actual time=28.45..32163.18 rows=162382 loops=1) Total runtime: 40855.59 msec With enable_seqscan = true: Unique (cost=34847.38..35145.38 rows=1192 width=293) (actual time=13005.13..13008.51 rows=51 loops=1) -> Sort (cost=34847.38..34877.18 rows=11920 width=293) (actual time=13005.12..13005.73 rows=292 loops=1) Sort Key: message_board_topics.rec_num, message_board_topics.topic_name, message_board_topics.topic_body, message_board_topics.topic_author, message_board_topics.topic_author_email, message_board_topics.topic_updated, message_board_topics.administrator_topic, message_board_topics.number_of_comments, to_char((message_board_topics.topic_date)::timestamp with time zone, 'MM.DD.YYYY'::text) -> Merge Join (cost=26858.21..33007.14 rows=11920 width=293) (actual time=4930.32..12949.93 rows=292 loops=1) Merge Cond: ("outer".rec_num = "inner".topic_id) Filter: ((upper(("outer".topic_name)::text) ~~ 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ 'MADBROWSER'::text) OR (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) -> Sort (cost=2446.01..2475.81 rows=11920 width=265) (actual time=628.30..953.50 rows=11920 loops=1) Sort Key: message_board_topics.rec_num -> Seq Scan on message_board_topics (cost=0.00..712.20 rows=11920 width=265) (actual time=0.10..223.96 rows=11920 loops=1) -> Sort (cost=24412.20..24818.15 rows=162382 width=28) (actual time=4301.14..5788.66 rows=162382 loops=1) Sort Key: message_board_comments.topic_id -> Seq Scan on message_board_comments (cost=0.00..7203.82 rows=162382 width=28) (actual time=0.10..1335.26 rows=162382 loops=1) Total runtime: 13108.33 msec (13 rows) Your suggestion didn't really make a whole lot of sense to me... Based on this info, what do you think? Hunter > From: Stephan Szabo <sszabo@megazone23.bigpanda.com> > Date: Tue, 22 Apr 2003 21:42:02 -0700 (PDT) > To: Hunter Hillegas <lists@lastonepicked.com> > Cc: PostgreSQL <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] Left Join Not Using Index? > > On Tue, 22 Apr 2003, Hunter Hillegas wrote: > >> I have a left join that doesn't seem to be using an index I created, and the >> query's performance needs to improve. >> >> I have two tables that model a message board: >> >> Table "public.message_board_topics" >> Column | Type | Modifiers >> ---------------------+-----------------------------+----------- >> rec_num | integer | not null >> topic_name | character varying(255) | >> topic_body | text | >> topic_author | character varying(20) | >> topic_author_email | character varying(50) | >> topic_date | date | >> topic_updated | timestamp without time zone | >> administrator_topic | boolean | >> number_of_comments | integer | >> Indexes: message_board_topics_pkey primary key btree (rec_num) >> Triggers: RI_ConstraintTrigger_819942, >> RI_ConstraintTrigger_819943 >> >> >> Table "public.message_board_comments" >> Column | Type | Modifiers >> ----------------------+------------------------+----------- >> rec_num | integer | not null >> topic_id | integer | >> comment_parent | integer | >> comment_name | character varying(255) | >> comment_body | text | >> comment_author | character varying(20) | >> comment_author_email | character varying(50) | >> comment_date | date | >> Indexes: message_board_comments_pkey primary key btree (rec_num), >> message_board_comments_topic_id btree (topic_id) >> Triggers: RI_ConstraintTrigger_819941 >> >> The query is: >> >> SELECT DISTINCT message_board_topics.rec_num, >> message_board_topics.topic_name, message_board_topics.topic_body, >> message_board_topics.topic_author, message_board_topics.topic_author_email, >> message_board_topics.topic_updated, >> message_board_topics.administrator_topic, >> message_board_topics.number_of_comments, to_char(topic_date, 'MM.DD.YYYY') >> as formatted_date FROM message_board_topics left join message_board_comments >> on (message_board_comments.topic_id=message_board_topics.rec_num) WHERE >> upper(topic_name) LIKE upper('madbrowser') OR upper(topic_body) LIKE >> upper('madbrowser') OR upper(topic_author) LIKE upper('madbrowser') OR >> upper(message_board_comments.comment_author) LIKE upper('madbrowser') ORDER >> BY message_board_topics.rec_num DESC >> >> Explain outputs: >> >> Unique (cost=34847.38..35145.38 rows=1192 width=293) >> -> Sort (cost=34847.38..34877.18 rows=11920 width=293) >> Sort Key: message_board_topics.rec_num, >> message_board_topics.topic_name, message_board_topics.topic_body, >> message_board_topics.topic_author, message_board_topics.topic_author_email, >> message_board_topics.topic_updated, >> message_board_topics.administrator_topic, >> message_board_topics.number_of_comments, >> to_char((message_board_topics.topic_date)::timestamp with time zone, >> 'MM.DD.YYYY'::text) >> -> Merge Join (cost=26858.21..33007.14 rows=11920 width=293) >> Merge Cond: ("outer".rec_num = "inner".topic_id) >> Filter: ((upper(("outer".topic_name)::text) ~~ >> 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ 'MADBROWSER'::text) OR >> (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR >> (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) >> -> Sort (cost=2446.01..2475.81 rows=11920 width=265) >> Sort Key: message_board_topics.rec_num >> -> Seq Scan on message_board_topics >> (cost=0.00..712.20 rows=11920 width=265) >> -> Sort (cost=24412.20..24818.15 rows=162382 width=28) >> Sort Key: message_board_comments.topic_id >> -> Seq Scan on message_board_comments >> (cost=0.00..7203.82 rows=162382 width=28) >> >> It doesn't seem to be using the index in topic_id... What can I do to help >> the planner figure out about that index? > > It's deciding that seq scan + sort of all the rows is faster than the > index scan over all the rows (which may very well be true). What does > explain analyze say with enable_seqscan set to true and false? > > If you only had extra conditions on one of the two tables and had > appropriate indexes (and reworked the query a little) I'd think you might > be able to get a better plan. As it is, I'm not sure what'd work best > I'd think that a query doing the left join with only the > message_board_topics conditions unioned with an inner join and the > message_board_comments condition would give the same results with some > massaging, but I don't really know if that'd ever perform better anyway. > >
> -----Original Message----- > From: Hunter Hillegas [mailto:lists@lastonepicked.com] > Sent: Tuesday, April 22, 2003 9:55 PM > To: Stephan Szabo > Cc: PostgreSQL > Subject: Re: [GENERAL] Left Join Not Using Index? > > > Thanks for responding... > > With enable_seqscan = false: > > Unique (cost=545747.57..546045.57 rows=1192 width=293) > (actual time=40851.49..40854.80 rows=51 loops=1) > -> Sort (cost=545747.57..545777.37 rows=11920 width=293) > (actual time=40851.48..40852.09 rows=292 loops=1) > Sort Key: message_board_topics.rec_num, > message_board_topics.topic_name, > message_board_topics.topic_body, > message_board_topics.topic_author, > message_board_topics.topic_author_email, > message_board_topics.topic_updated, > message_board_topics.administrator_topic, > message_board_topics.number_of_comments, > to_char((message_board_topics.topic_date)::timestamp with time zone, > 'MM.DD.YYYY'::text) > -> Merge Join (cost=0.00..543907.33 rows=11920 > width=293) (actual time=482.05..40847.19 rows=292 loops=1) > Merge Cond: ("outer".rec_num = "inner".topic_id) > Filter: ((upper(("outer".topic_name)::text) ~~ > 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ > 'MADBROWSER'::text) OR > (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR > (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) > -> Index Scan using message_board_topics_pkey > on message_board_topics (cost=0.00..2532.12 rows=11920 > width=265) (actual time=428.32..4893.13 rows=11920 loops=1) > -> Index Scan using > message_board_comments_topic_id on message_board_comments > (cost=0.00..535662.04 rows=162382 width=28) (actual > time=28.45..32163.18 rows=162382 loops=1) Total runtime: > 40855.59 msec > > With enable_seqscan = true: > > Unique (cost=34847.38..35145.38 rows=1192 width=293) > (actual time=13005.13..13008.51 rows=51 loops=1) > -> Sort (cost=34847.38..34877.18 rows=11920 width=293) > (actual time=13005.12..13005.73 rows=292 loops=1) > Sort Key: message_board_topics.rec_num, > message_board_topics.topic_name, > message_board_topics.topic_body, > message_board_topics.topic_author, > message_board_topics.topic_author_email, > message_board_topics.topic_updated, > message_board_topics.administrator_topic, > message_board_topics.number_of_comments, > to_char((message_board_topics.topic_date)::timestamp with time zone, > 'MM.DD.YYYY'::text) > -> Merge Join (cost=26858.21..33007.14 rows=11920 > width=293) (actual time=4930.32..12949.93 rows=292 loops=1) > Merge Cond: ("outer".rec_num = "inner".topic_id) > Filter: ((upper(("outer".topic_name)::text) ~~ > 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ > 'MADBROWSER'::text) OR > (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR > (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) > -> Sort (cost=2446.01..2475.81 rows=11920 > width=265) (actual time=628.30..953.50 rows=11920 loops=1) > Sort Key: message_board_topics.rec_num > -> Seq Scan on message_board_topics > (cost=0.00..712.20 rows=11920 width=265) (actual > time=0.10..223.96 rows=11920 loops=1) > -> Sort (cost=24412.20..24818.15 rows=162382 > width=28) (actual time=4301.14..5788.66 rows=162382 loops=1) > Sort Key: message_board_comments.topic_id > -> Seq Scan on message_board_comments > (cost=0.00..7203.82 rows=162382 width=28) (actual > time=0.10..1335.26 rows=162382 loops=1) Total runtime: > 13108.33 msec (13 rows) > > Your suggestion didn't really make a whole lot of sense to > me... Based on this info, what do you think? Suggestion: Perform the actual query with seqscan enabled/disabled and see which one is literally faster. This guess: > 40855.59 msec Certainly seems slower than this one: > 13108.33 msec (13 rows) Indicating that the strategy originally chosen should be correct. How accurate is the estimate on your machine?
Those plans are from 'EXPLAIN ANALYZE', not 'EXPLAIN'... So as I understand it, that's the actual run time for the query. Having seqscan turned on is a lot faster, but isn't that the default? The reason I was given this query to look at was that it's not running fast enough right now in our production application... So, anyone have any suggestion about anything else I could do? Those indices don't help at all? Hunter > From: "Dann Corbit" <DCorbit@connx.com> > Date: Tue, 22 Apr 2003 22:00:12 -0700 > To: "Hunter Hillegas" <lists@lastonepicked.com>, "Stephan Szabo" > <sszabo@megazone23.bigpanda.com> > Cc: "PostgreSQL" <pgsql-general@postgresql.org> > Subject: RE: [GENERAL] Left Join Not Using Index? > >> -----Original Message----- >> From: Hunter Hillegas [mailto:lists@lastonepicked.com] >> Sent: Tuesday, April 22, 2003 9:55 PM >> To: Stephan Szabo >> Cc: PostgreSQL >> Subject: Re: [GENERAL] Left Join Not Using Index? >> >> >> Thanks for responding... >> >> With enable_seqscan = false: >> >> Unique (cost=545747.57..546045.57 rows=1192 width=293) >> (actual time=40851.49..40854.80 rows=51 loops=1) >> -> Sort (cost=545747.57..545777.37 rows=11920 width=293) >> (actual time=40851.48..40852.09 rows=292 loops=1) >> Sort Key: message_board_topics.rec_num, >> message_board_topics.topic_name, >> message_board_topics.topic_body, >> message_board_topics.topic_author, >> message_board_topics.topic_author_email, >> message_board_topics.topic_updated, >> message_board_topics.administrator_topic, >> message_board_topics.number_of_comments, >> to_char((message_board_topics.topic_date)::timestamp with time zone, >> 'MM.DD.YYYY'::text) >> -> Merge Join (cost=0.00..543907.33 rows=11920 >> width=293) (actual time=482.05..40847.19 rows=292 loops=1) >> Merge Cond: ("outer".rec_num = "inner".topic_id) >> Filter: ((upper(("outer".topic_name)::text) ~~ >> 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ >> 'MADBROWSER'::text) OR >> (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR >> (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) >> -> Index Scan using message_board_topics_pkey >> on message_board_topics (cost=0.00..2532.12 rows=11920 >> width=265) (actual time=428.32..4893.13 rows=11920 loops=1) >> -> Index Scan using >> message_board_comments_topic_id on message_board_comments >> (cost=0.00..535662.04 rows=162382 width=28) (actual >> time=28.45..32163.18 rows=162382 loops=1) Total runtime: >> 40855.59 msec >> >> With enable_seqscan = true: >> >> Unique (cost=34847.38..35145.38 rows=1192 width=293) >> (actual time=13005.13..13008.51 rows=51 loops=1) >> -> Sort (cost=34847.38..34877.18 rows=11920 width=293) >> (actual time=13005.12..13005.73 rows=292 loops=1) >> Sort Key: message_board_topics.rec_num, >> message_board_topics.topic_name, >> message_board_topics.topic_body, >> message_board_topics.topic_author, >> message_board_topics.topic_author_email, >> message_board_topics.topic_updated, >> message_board_topics.administrator_topic, >> message_board_topics.number_of_comments, >> to_char((message_board_topics.topic_date)::timestamp with time zone, >> 'MM.DD.YYYY'::text) >> -> Merge Join (cost=26858.21..33007.14 rows=11920 >> width=293) (actual time=4930.32..12949.93 rows=292 loops=1) >> Merge Cond: ("outer".rec_num = "inner".topic_id) >> Filter: ((upper(("outer".topic_name)::text) ~~ >> 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ >> 'MADBROWSER'::text) OR >> (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR >> (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) >> -> Sort (cost=2446.01..2475.81 rows=11920 >> width=265) (actual time=628.30..953.50 rows=11920 loops=1) >> Sort Key: message_board_topics.rec_num >> -> Seq Scan on message_board_topics >> (cost=0.00..712.20 rows=11920 width=265) (actual >> time=0.10..223.96 rows=11920 loops=1) >> -> Sort (cost=24412.20..24818.15 rows=162382 >> width=28) (actual time=4301.14..5788.66 rows=162382 loops=1) >> Sort Key: message_board_comments.topic_id >> -> Seq Scan on message_board_comments >> (cost=0.00..7203.82 rows=162382 width=28) (actual >> time=0.10..1335.26 rows=162382 loops=1) Total runtime: >> 13108.33 msec (13 rows) >> >> Your suggestion didn't really make a whole lot of sense to >> me... Based on this info, what do you think? > > Suggestion: > Perform the actual query with seqscan enabled/disabled and see which one > is literally faster. > > This guess: >> 40855.59 msec > > Certainly seems slower than this one: >> 13108.33 msec (13 rows) > > Indicating that the strategy originally chosen should be correct. > > How accurate is the estimate on your machine?
"Dann Corbit" <DCorbit@connx.com> writes: > This guess: >> 40855.59 msec > Certainly seems slower than this one: >> 13108.33 msec Those are not guesses, those are measurements. Translation: the planner made the right choice here. (Hot diggety ;-)) regards, tom lane
Okay... So the planner is in great shape and chose the right solution... Can anyone point me in another direction to optimize this for some more speed? Other query suggestions? System setting changes? My impatient users don't like to wait. :-) Any help is very much appreciated. Hunter > From: Tom Lane <tgl@sss.pgh.pa.us> > Date: Wed, 23 Apr 2003 01:09:39 -0400 > To: "Dann Corbit" <DCorbit@connx.com> > Cc: "Hunter Hillegas" <lists@lastonepicked.com>, "Stephan Szabo" > <sszabo@megazone23.bigpanda.com>, "PostgreSQL" <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] Left Join Not Using Index? > > "Dann Corbit" <DCorbit@connx.com> writes: >> This guess: >>> 40855.59 msec >> Certainly seems slower than this one: >>> 13108.33 msec > > Those are not guesses, those are measurements. Translation: the planner > made the right choice here. (Hot diggety ;-)) > > regards, tom lane
Hunter Hillegas <lists@lastonepicked.com> writes: > Can anyone point me in another direction to optimize this AFAICS you cannot improve that without changing the query structure and/or the database layout. Because the WHERE clause is a bunch of OR'd conditions, it's useless for restricting either individual table scan making up the join --- there is really no implementation short of forming the entire join described by the FROM ... JOIN ... ON ... part of the query and then testing each individual row against the WHERE clause. While that is the abstract semantic model implied by the SQL spec, it's not exactly how you want a query to really be done :-(. regards, tom lane
So, what you're basically saying is that my best bet for improving the speed of a query to get this result set is to tweak my settings and get faster hardware? > From: Tom Lane <tgl@sss.pgh.pa.us> > Date: Wed, 23 Apr 2003 01:30:05 -0400 > To: Hunter Hillegas <lists@lastonepicked.com> > Cc: Dann Corbit <DCorbit@connx.com>, Stephan Szabo > <sszabo@megazone23.bigpanda.com>, PostgreSQL <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] Left Join Not Using Index? > > Hunter Hillegas <lists@lastonepicked.com> writes: >> Can anyone point me in another direction to optimize this > > AFAICS you cannot improve that without changing the query structure > and/or the database layout. Because the WHERE clause is a bunch of OR'd > conditions, it's useless for restricting either individual table scan > making up the join --- there is really no implementation short of > forming the entire join described by the FROM ... JOIN ... ON ... part > of the query and then testing each individual row against the WHERE > clause. While that is the abstract semantic model implied by the SQL > spec, it's not exactly how you want a query to really be done :-(. > > regards, tom lane
I think he said you have to change your query or your database layout to get better speed. Can you think of a different query or set of queries that will do what you want? What are you trying to achieve with your query? Link. At 10:35 PM 4/22/2003 -0700, Hunter Hillegas wrote: >So, what you're basically saying is that my best bet for improving the speed >of a query to get this result set is to tweak my settings and get faster >hardware? > > > From: Tom Lane <tgl@sss.pgh.pa.us> > > Date: Wed, 23 Apr 2003 01:30:05 -0400 > > To: Hunter Hillegas <lists@lastonepicked.com> > > Cc: Dann Corbit <DCorbit@connx.com>, Stephan Szabo > > <sszabo@megazone23.bigpanda.com>, PostgreSQL <pgsql-general@postgresql.org> > > Subject: Re: [GENERAL] Left Join Not Using Index? > > > > Hunter Hillegas <lists@lastonepicked.com> writes: > >> Can anyone point me in another direction to optimize this > > > > AFAICS you cannot improve that without changing the query structure > > and/or the database layout. Because the WHERE clause is a bunch of OR'd > > conditions, it's useless for restricting either individual table scan > > making up the join --- there is really no implementation short of
On Tue, 22 Apr 2003, Hunter Hillegas wrote: > Your suggestion didn't really make a whole lot of sense to me... Based on > this info, what do you think? I was wondering if something like (columns removed because I'd go insane otherwise, but I think this illustrates it): select message_board_topics.rec_num from message_board_topics where upper(topic_name) LIKE upper('madbrowser') union select message_board_topics.rec_num from message_board_topics where upper(topic_body) LIKE upper('madbrowser') union select message_board_topics.rec_num from message_board_topics where upper(topic_author) LIKE upper('madbrowser') union select message_board_topics.rec_num from message_board_topics,message_board_comments where message_board_comments.topic_id=message_board_topics.rec_num and upper(message_board_comments.comment_author) LIKE upper('madbrowser') order by 1 desc; with indexes on upper(topic_name), upper(topic_body), etc... was both the same and faster. However, the best solution is probably some sort of full text indexing solution. Putting the keywords from the various columns you want to index along with the rec_num (or topic_id) of the row and an index on the text. Then you could join message_board_topics with that and probably get a much better plan.
On Tuesday, April 22, 2003, at 08:44 PM, Hunter Hillegas wrote: > I have a left join that doesn't seem to be using an index I created, > and the > query's performance needs to improve. [snip] > The query is: > > SELECT DISTINCT message_board_topics.rec_num, > message_board_topics.topic_name, message_board_topics.topic_body, > message_board_topics.topic_author, > message_board_topics.topic_author_email, > message_board_topics.topic_updated, > message_board_topics.administrator_topic, > message_board_topics.number_of_comments, to_char(topic_date, > 'MM.DD.YYYY') > as formatted_date FROM message_board_topics left join > message_board_comments > on (message_board_comments.topic_id=message_board_topics.rec_num) WHERE > upper(topic_name) LIKE upper('madbrowser') OR upper(topic_body) LIKE > upper('madbrowser') OR upper(topic_author) LIKE upper('madbrowser') OR > upper(message_board_comments.comment_author) LIKE upper('madbrowser') > ORDER > BY message_board_topics.rec_num DESC > I'm not very good at reading EXPLAIN output in Postgres yet, but it seems like it's all those "upper(table.foo) LIKE upper('madbrowser')" conditions that are causing the slowness. For starters, change it to "upper(table.foo) LIKE 'MADBROWSER'". Then since you're not using wildcards there, change it to "upper(table.foo) = 'MADBROWSER'". -Ken
I think he is saying that efforts in data design, and query design are called for. Hunter Hillegas wrote: > So, what you're basically saying is that my best bet for improving the speed > of a query to get this result set is to tweak my settings and get faster > hardware? > > >>From: Tom Lane <tgl@sss.pgh.pa.us> >>Date: Wed, 23 Apr 2003 01:30:05 -0400 >>To: Hunter Hillegas <lists@lastonepicked.com> >>Cc: Dann Corbit <DCorbit@connx.com>, Stephan Szabo >><sszabo@megazone23.bigpanda.com>, PostgreSQL <pgsql-general@postgresql.org> >>Subject: Re: [GENERAL] Left Join Not Using Index? >> >>Hunter Hillegas <lists@lastonepicked.com> writes: >> >>>Can anyone point me in another direction to optimize this >> >>AFAICS you cannot improve that without changing the query structure >>and/or the database layout. Because the WHERE clause is a bunch of OR'd >>conditions, it's useless for restricting either individual table scan >>making up the join --- there is really no implementation short of >>forming the entire join described by the FROM ... JOIN ... ON ... part >>of the query and then testing each individual row against the WHERE >>clause. While that is the abstract semantic model implied by the SQL >>spec, it's not exactly how you want a query to really be done :-(. >> >>regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Tue, 22 Apr 2003, Hunter Hillegas wrote: > Thanks for responding... > > With enable_seqscan = false: SNIP > Total runtime: 40855.59 msec > > With enable_seqscan = true: > > Unique (cost=34847.38..35145.38 rows=1192 width=293) (actual > time=13005.13..13008.51 rows=51 loops=1) > -> Sort (cost=34847.38..34877.18 rows=11920 width=293) (actual > time=13005.12..13005.73 rows=292 loops=1) > Sort Key: message_board_topics.rec_num, > message_board_topics.topic_name, message_board_topics.topic_body, > message_board_topics.topic_author, message_board_topics.topic_author_email, > message_board_topics.topic_updated, > message_board_topics.administrator_topic, > message_board_topics.number_of_comments, > to_char((message_board_topics.topic_date)::timestamp with time zone, > 'MM.DD.YYYY'::text) Note the actual time on the merge join here (about 8000 msec): > -> Merge Join (cost=26858.21..33007.14 rows=11920 width=293) > (actual time=4930.32..12949.93 rows=292 loops=1) > Merge Cond: ("outer".rec_num = "inner".topic_id) > Filter: ((upper(("outer".topic_name)::text) ~~ > 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ 'MADBROWSER'::text) OR > (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR > (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text)) > -> Sort (cost=2446.01..2475.81 rows=11920 width=265) > (actual time=628.30..953.50 rows=11920 loops=1) > Sort Key: message_board_topics.rec_num > -> Seq Scan on message_board_topics > (cost=0.00..712.20 rows=11920 width=265) (actual time=0.10..223.96 > rows=11920 loops=1) Note the sort is showing ~ 4000 to 5000 msec > -> Sort (cost=24412.20..24818.15 rows=162382 width=28) > (actual time=4301.14..5788.66 rows=162382 loops=1) > Sort Key: message_board_comments.topic_id > -> Seq Scan on message_board_comments > (cost=0.00..7203.82 rows=162382 width=28) (actual time=0.10..1335.26 > rows=162382 loops=1) Run time: > Total runtime: 13108.33 msec > (13 rows) So, this query, using the seq scan, is 3 times faster. I.e. the planner made the right move on the seq scan versus the index scan. There may be a faster way than using a merge join, but more than likely, your biggest gain will come from tuning postgresql and your OS to handle more data at a time therefore faster. What are your settings for sort_mem, shared_buffers, effective_cache_size? They're all explained in the 3.4.2. Planner and Optimizer Tuning section of the 7.3.2 docs quite well. Look at changing any of these to off and see how it affects the planner as well. enable_seqscan enable_indexscan enable_tidscan enable_sort enable_nestloop enable_mergejoin enable_hashjoin > Your suggestion didn't really make a whole lot of sense to me... Based on > this info, what do you think? Well, the problem with the suggestion was that while it did turn on index scans, it actually resulted in slower performance, since a seq scan proved 3 times faster. Barring other minor errors in the query planner, you can either make the machine faster / tune postgresql, change your query to something that runs faster by the nature of how it's written. Maybe having a more selective where clause or using one in a subselect that will be run first, thus knocking down the amount of data your database has to sling around, or go to full text indexing. There were two or three solutions in the contrib directory last I looked that both were way faster than the typical "roll your own" solutions.
Basically this query searches those two tables, which model a message board, for text strings... I'll have to take a closer look and see. > From: Lincoln Yeoh <lyeoh@pop.jaring.my> > Date: Wed, 23 Apr 2003 14:00:42 +0800 > To: Hunter Hillegas <lists@lastonepicked.com> > Cc: PostgreSQL <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] Left Join Not Using Index? > > I think he said you have to change your query or your database layout to > get better speed. > > Can you think of a different query or set of queries that will do what you > want? > > What are you trying to achieve with your query? > > Link. > > At 10:35 PM 4/22/2003 -0700, Hunter Hillegas wrote: > >> So, what you're basically saying is that my best bet for improving the speed >> of a query to get this result set is to tweak my settings and get faster >> hardware? >> >>> From: Tom Lane <tgl@sss.pgh.pa.us> >>> Date: Wed, 23 Apr 2003 01:30:05 -0400 >>> To: Hunter Hillegas <lists@lastonepicked.com> >>> Cc: Dann Corbit <DCorbit@connx.com>, Stephan Szabo >>> <sszabo@megazone23.bigpanda.com>, PostgreSQL <pgsql-general@postgresql.org> >>> Subject: Re: [GENERAL] Left Join Not Using Index? >>> >>> Hunter Hillegas <lists@lastonepicked.com> writes: >>>> Can anyone point me in another direction to optimize this >>> >>> AFAICS you cannot improve that without changing the query structure >>> and/or the database layout. Because the WHERE clause is a bunch of OR'd >>> conditions, it's useless for restricting either individual table scan >>> making up the join --- there is really no implementation short of > >
Wow! This query is MUCH faster than my old query... About 1/8 of the time. > From: Stephan Szabo <sszabo@megazone23.bigpanda.com> > Date: Wed, 23 Apr 2003 06:41:27 -0700 (PDT) > To: Hunter Hillegas <lists@lastonepicked.com> > Cc: PostgreSQL <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] Left Join Not Using Index? > > > On Tue, 22 Apr 2003, Hunter Hillegas wrote: > >> Your suggestion didn't really make a whole lot of sense to me... Based on >> this info, what do you think? > > I was wondering if something like (columns removed because I'd go insane > otherwise, but I think this illustrates it): > > select message_board_topics.rec_num from > message_board_topics where upper(topic_name) LIKE upper('madbrowser') > union > select message_board_topics.rec_num from > message_board_topics where upper(topic_body) LIKE upper('madbrowser') > union > select message_board_topics.rec_num from > message_board_topics where upper(topic_author) LIKE upper('madbrowser') > union > select message_board_topics.rec_num from > message_board_topics,message_board_comments where > message_board_comments.topic_id=message_board_topics.rec_num > and upper(message_board_comments.comment_author) LIKE upper('madbrowser') > order by 1 desc; > > with indexes on upper(topic_name), upper(topic_body), etc... was > both the same and faster. > > However, the best solution is probably some sort of full text indexing > solution. Putting the keywords from the various columns you want to index > along with the rec_num (or topic_id) of the row and an index on the text. > Then you could join message_board_topics with that and probably get a much > better plan.