Обсуждение: query becomes fas on 'SET enable_hashjoin TO off;'
Hi , I have a query in which two huge tables (A,B) are joined using an indexed column and a search is made on tsvector on some column on B. Very limited rows of B are expected to match the query on tsvector column. With default planner settings the query takes too long ( > 100 secs) , but with hashjoin off it returns almost immediately. The question is , is it is advisable to tweak planner settings for specific queries in application ? The plans are as follows. 1. With default settings explain select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Limit (cost=4109.11..11127.78 rows=20 width=4) -> Hash Join (cost=4109.11..90789.72 rows=247 width=4) Hash Cond: (trade_leads.profile_id = pm.profile_id) -> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8) Filter: ((status)::text = 'm'::text) -> Hash (cost=4095.68..4095.68 rows=1074 width=4) -> Bitmap Heap Scan on profile_master pm (cost=40.89..4095.68 rows=1074 width=4) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Index Scan on profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) (10 rows) 2. with SET enable_hashjoin TO off; explain analyze select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3.42..13080.44 rows=20 width=4) (actual time=1530.039..1530.039 rows=0 loops=1) -> Nested Loop (cost=3.42..161504.56 rows=247 width=4) (actual time=1530.037..1530.037 rows=0 loops=1) -> Index Scan using profile_master_co_name_vec on profile_master pm (cost=0.00..4335.36 rows=1074 width=4) (actual time=220.821..1014.501 rows=7 loops=1) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Heap Scan on trade_leads (cost=3.42..145.75 rows=47 width=8) (actual time=73.640..73.640 rows=0 loops=7) Recheck Cond: (trade_leads.profile_id = pm.profile_id) Filter: ((status)::text = 'm'::text) -> Bitmap Index Scan on trade_leads_profile_id (cost=0.00..3.41 rows=47 width=0) (actual time=73.579..73.579 rows=0 loops=7) Index Cond: (trade_leads.profile_id = pm.profile_id) Total runtime: 1530.137 ms regds mallah.
On Tue, Feb 10, 2009 at 5:31 AM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: > I have a query in which two huge tables (A,B) are joined using an indexed > column and a search is made on tsvector on some column on B. Very limited > rows of B are expected to match the query on tsvector column. > > With default planner settings the query takes too long ( > 100 secs) , but > with hashjoin off it returns almost immediately. The question is , is > it is advisable to > tweak planner settings for specific queries in application ? The ones that start with "enable_" usually shouldn't be changed. They're mostly for debugging and finding problems. > The plans are as follows. It's a little hard to figure out what's gone wrong here because you've only included EXPLAIN ANALYZE output for one of the plans - the other is just regular EXPLAIN. Can you send that, along with the output of the following query: SELECT SUM(1) FROM trade_leads WHERE status = 'm' I'm guessing that the problem is that the selectivity estimate for co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm not real familiar with full text search, so I'm not sure whether there's anything sensible you can do about it. ...Robert
Dear Robert, thanks for ur interest. Our server was too loaded what i posted my last observation, now the other explain analyze can also be run and i am posting both the result , as you can see latter is 55ms versus 3000 ms . explain analyze select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=4109.11..11127.78 rows=20 width=4) (actual time=3076.059..3076.059 rows=0 loops=1) -> Hash Join (cost=4109.11..90789.72 rows=247 width=4) (actual time=3076.057..3076.057 rows=0 loops=1) Hash Cond: (trade_leads.profile_id = pm.profile_id) -> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8) (actual time=0.020..2972.446 rows=127371 loops=1) Filter: ((status)::text = 'm'::text) -> Hash (cost=4095.68..4095.68 rows=1074 width=4) (actual time=42.368..42.368 rows=7 loops=1) -> Bitmap Heap Scan on profile_master pm (cost=40.89..4095.68 rows=1074 width=4) (actual time=42.287..42.360 rows=7 loops=1) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Index Scan on profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0) (actual time=42.252..42.252 rows=7 loops=1) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Total runtime: 3076.121 ms (11 rows) tradein_clients=> SET enable_hashjoin TO off; SET tradein_clients=> explain analyze select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=3.42..13080.44 rows=20 width=4) (actual time=55.233..55.233 rows=0 loops=1) -> Nested Loop (cost=3.42..161504.56 rows=247 width=4) (actual time=55.232..55.232 rows=0 loops=1) -> Index Scan using profile_master_co_name_vec on profile_master pm (cost=0.00..4335.36 rows=1074 width=4) (actual time=16.578..46.175 rows=7 loops=1) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Heap Scan on trade_leads (cost=3.42..145.75 rows=47 width=8) (actual time=1.287..1.287 rows=0 loops=7) Recheck Cond: (trade_leads.profile_id = pm.profile_id) Filter: ((status)::text = 'm'::text) -> Bitmap Index Scan on trade_leads_profile_id (cost=0.00..3.41 rows=47 width=0) (actual time=1.285..1.285 rows=0 loops=7) Index Cond: (trade_leads.profile_id = pm.profile_id) Total runtime: 55.333 ms (11 rows) SELECT SUM(1) FROM general.trade_leads WHERE status = 'm'; sum -------- 127371 this constitutes 90% of the total rows. regds mallah. On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Feb 10, 2009 at 5:31 AM, Rajesh Kumar Mallah > <mallah.rajesh@gmail.com> wrote: >> I have a query in which two huge tables (A,B) are joined using an indexed >> column and a search is made on tsvector on some column on B. Very limited >> rows of B are expected to match the query on tsvector column. >> >> With default planner settings the query takes too long ( > 100 secs) , but >> with hashjoin off it returns almost immediately. The question is , is >> it is advisable to >> tweak planner settings for specific queries in application ? > > The ones that start with "enable_" usually shouldn't be changed. > They're mostly for debugging and finding problems. > >> The plans are as follows. > > It's a little hard to figure out what's gone wrong here because you've > only included EXPLAIN ANALYZE output for one of the plans - the other > is just regular EXPLAIN. Can you send that, along with the output of > the following query: > > SELECT SUM(1) FROM trade_leads WHERE status = 'm' > > I'm guessing that the problem is that the selectivity estimate for > co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm > not real familiar with full text search, so I'm not sure whether > there's anything sensible you can do about it. > > ...Robert >
Rajesh Kumar Mallah <mallah.rajesh@gmail.com> writes: > On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I'm guessing that the problem is that the selectivity estimate for >> co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm >> not real familiar with full text search, so I'm not sure whether >> there's anything sensible you can do about it. Yeah, the bad selectivity estimate seems to be the entire problem --- if that were even slightly closer to reality the planner would've preferred the nestloop. I don't think there's a good solution to this in 8.3, because its estimator for @@ is just a stub. There will be a non-toy estimator in 8.4, fwiw. A possibility that seems a bit less crude than turning off hashjoins is to reduce random_page_cost, so as to bias things toward nestloop indexscans in general. regards, tom lane
On Tue, Feb 10, 2009 at 9:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Rajesh Kumar Mallah <mallah.rajesh@gmail.com> writes: >> On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> I'm guessing that the problem is that the selectivity estimate for >>> co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm >>> not real familiar with full text search, so I'm not sure whether >>> there's anything sensible you can do about it. > > Yeah, the bad selectivity estimate seems to be the entire problem --- > if that were even slightly closer to reality the planner would've > preferred the nestloop. > > I don't think there's a good solution to this in 8.3, this is 8.2 server at the moment. >because its > estimator for @@ is just a stub. There will be a non-toy estimator > in 8.4, fwiw. > > A possibility that seems a bit less crude than turning off hashjoins > is to reduce random_page_cost, so as to bias things toward nestloop > indexscans in general. reducing random_page_cost from 4 (default) to 3 does switch the plan in favour of nested loop thanks for the suggestion. SET random_page_cost TO 4; SET tradein_clients=> explain select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Limit (cost=4109.11..11127.78 rows=20 width=4) -> Hash Join (cost=4109.11..90789.72 rows=247 width=4) Hash Cond: (trade_leads.profile_id = pm.profile_id) -> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8) Filter: ((status)::text = 'm'::text) -> Hash (cost=4095.68..4095.68 rows=1074 width=4) -> Bitmap Heap Scan on profile_master pm (cost=40.89..4095.68 rows=1074 width=4) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Index Scan on profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) (10 rows) tradein_clients=> SET random_page_cost TO 3; SET tradein_clients=> explain select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..9944.78 rows=20 width=4) -> Nested Loop (cost=0.00..122818.07 rows=247 width=4) -> Index Scan using profile_master_co_name_vec on profile_master pm (cost=0.00..3256.28 rows=1074 width=4) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Index Scan using trade_leads_profile_id on trade_leads (cost=0.00..110.76 rows=45 width=8) Index Cond: (trade_leads.profile_id = pm.profile_id) Filter: ((status)::text = 'm'::text) (8 rows) > > regards, tom lane >