Force a merge join?
От | Doug Fields |
---|---|
Тема | Force a merge join? |
Дата | |
Msg-id | 5.1.0.14.2.20020515142918.01f0f718@mail.pexicom.com обсуждение исходный текст |
Ответы |
Re: Force a merge join?
(Martijn van Oosterhout <kleptog@svana.org>)
|
Список | pgsql-general |
Hello, [7.2.1, Debian/woody kernel 2.2.20] I have an extremely slow running query which uses a nested loop. I would like to force it to use a merge join instead, but I have been unable to figure out how to fool the optimizer to do that. Alternately, I would welcome suggestions on how to speed up the query. Yes, I have tried turning off enable_nestloop and playing with the tuple costs somewhat. When I say extremely slowly, I mean it. We're talking 280 seconds for a query which returns 3000 rows out of a possible ~800,000 using two parts of the same table (joined together) of sizes 500 and 15000 or so. Please find below some details on my query, the explain, the table and the indices. Some columns not referenced have been omitted. Note that I just increased the RAM on the database server to 2 gigs and allocated almost 1.4 gigs to shared buffers. All my tables and indices total size (for now) is about 600 megs, so everything in theory could be running from memory. I have a continuous vmstat running and show no swapping, and no "blocks in" activity indicating it is indeed running fully from memory. The INTENT of the query is to take a set of e-mail addresses (list A) and another set (list B) and find the intersection - that is, all entries in A which have a corresponding entry in B based upon having the same e-mail address. This is the last one part of a larger query (which is otherwise very fast) which selects parts of list A and then removes (via combining with EXCEPT) the remaining ones in the query below. These are then inserted into yet another table so the whole query looks like INSERT ... SELECT ... EXCEPT SELECT ... where the last SELECT is the slow query I show below. By breaking the query into a bunch (about 40) of queries without any ORs and UNIONing them, I can get about a 3-4 times improvement, but that still means this is an 80 second query, and is not scalable to the case where it can't be broken into multiple queries. Query: -- Note the odd 745 is because this is used in an INSERT ... SELECT statement -- Also note that there is no difference if I use IN (1,2,3,4) syntax SELECT 745, a.list_entry_id FROM list_entries a, list_entries b WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR a.list_id=144) AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418) AND LOWER(a.email) = LOWER(b.email); psql's \d on the table name and indices: (use fixed-width font) Table "list_entries" Column | Type | Modifiers ---------------+--------------------------+-------------------------------------------------------------------- list_id | integer | not null list_entry_id | integer | not null default nextval('"list_entries_list_entry_id_seq"'::text) email | character varying(64) | (about a dozen more mostly varchar fields not shown) Indexes: list_id_idx, lower_email_idx Primary key: list_entries_pkey Triggers: RI_ConstraintTrigger_1572852 Index "list_id_idx" Column | Type ---------+--------- list_id | integer btree Index "lower_email_idx" Column | Type --------+------ lower | text btree Query analysis: -> Nested Loop (cost=0.00..219086859.00 rows=1585343 width=52) (actual time=0.27..294568.88 rows=2970 loops=1) -> Index Scan using list_id_idx, list_id_idx, list_id_idx, list_id_idx, list_id_idx on list_entries a (cost=0.00..29025.99 rows=14115 width=28) (actual time=0.05..176.79 rows=15859 loops=1) -> Index Scan using lower_email_idx on list_entries b (cost=0.00..15513.40 rows=112 width=24) (actual time=0.69..18.55 rows=0 loops=15859) Database optimizer settings: NOTICE: cpu_operator_cost is 0.0025 NOTICE: cpu_tuple_cost is 0.05 NOTICE: cpu_index_tuple_cost is 0.001 All enable_X options are ON Many thanks, Doug
В списке pgsql-general по дате отправления: