Why is a hash join being used?

Поиск

Список

Период

Сортировка

От	Tim Jacobs
Тема	Why is a hash join being used?
Дата	20 июня 2012 г. 07:54:22
Msg-id	9D1AF2AE-9605-41B5-8BCC-177B5EF6F1A5@gmail.com обсуждение исходный текст
Ответы	Re: Why is a hash join being used? (Sergey Konoplev <sergey.konoplev@postgresql-consulting.com>) Re: Why is a hash join being used? ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список	pgsql-performance

Дерево обсуждения

I am running the following query:

SELECT res1.x, res1.y, res1.z
FROM test t
JOIN residue_atom_coords res1 ON
        t.struct_id_1 = res1.struct_id AND
        res1.atomno IN (1,2,3,4) AND
        (res1.seqpos BETWEEN t.pair_1_helix_1_begin AND t.pair_1_helix_1_end)
WHERE
t.compare_id BETWEEN 1 AND 10000;

The 'test' table is very large (~270 million rows) as is the residue_atom_coords table (~540 million rows).

The number of compare_ids I select in the 'WHERE' clause determines the join type in the following way:

t.compare_id BETWEEN 1 AND 5000;

 Nested Loop  (cost=766.52..15996963.12 rows=3316307 width=24)
   ->  Index Scan using test_pkey on test t  (cost=0.00..317.20 rows=5372 width=24)
         Index Cond: ((compare_id >= 1) AND (compare_id <= 5000))
   ->  Bitmap Heap Scan on residue_atom_coords res1  (cost=766.52..2966.84 rows=625 width=44)
         Recheck Cond: ((struct_id = t.struct_id_1) AND (seqpos >= t.pair_1_helix_1_begin) AND (seqpos <=
t.pair_1_helix_1_end)AND (atomno = ANY ('{1,2,3,4}'::integer[]))) 
         ->  Bitmap Index Scan on residue_atom_coords_pkey  (cost=0.00..766.36 rows=625 width=0)
               Index Cond: ((struct_id = t.struct_id_1) AND (seqpos >= t.pair_1_helix_1_begin) AND (seqpos <=
t.pair_1_helix_1_end)AND (atomno = ANY ('{1,2,3,4}'::integer[]))) 

t.compare_id BETWEEN 1 AND 10000;

 Hash Join  (cost=16024139.91..20940899.94 rows=6633849 width=24)
   Hash Cond: (t.struct_id_1 = res1.struct_id)
   Join Filter: ((res1.seqpos >= t.pair_1_helix_1_begin) AND (res1.seqpos <= t.pair_1_helix_1_end))
   ->  Index Scan using test_pkey on test t  (cost=0.00..603.68 rows=10746 width=24)
         Index Cond: ((compare_id >= 1) AND (compare_id <= 10000))
   ->  Hash  (cost=13357564.16..13357564.16 rows=125255660 width=44)
         ->  Seq Scan on residue_atom_coords res1  (cost=0.00..13357564.16 rows=125255660 width=44)
               Filter: (atomno = ANY ('{1,2,3,4}'::integer[]))

The nested loop join performs very quickly, whereas the hash join is incredibly slow. If I disable the hash join
temporarilythen a nested loop join is used in the second case and is the query runs much more quickly. How can I change
myconfiguration to favor the nested join in this case? Is this a bad idea? Alternatively, since I will be doing
selectionslike this many times, what indexes can be put in place to expedite the query without mucking with the query
optimizer?I've already created an index on the struct_id field of residue_atom_coords (each unique struct_id should
onlyhave a small number of rows for the residue_atom_coords table). 

Thanks in advance,
Tim

В списке pgsql-performance по дате отправления:

Предыдущее

От: Eyal Wilde
Дата: 20 июня 2012 г., 07:54:10
Сообщение: index-only scan is missing the INCLUDE feature

Следующее

От: "Strange, John W"
Дата: 20 июня 2012 г., 07:54:36
Сообщение: pgbouncer - massive overhead?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Why is a hash join being used?

Предыдущее

Следующее