Hash Join vs Nested Loops in 7.2.1 ...

Поиск
Список
Период
Сортировка
От Ed Loehr
Тема Hash Join vs Nested Loops in 7.2.1 ...
Дата
Msg-id 3CB26973.5010808@bluepolka.net
обсуждение исходный текст
Ответы Re: Hash Join vs Nested Loops in 7.2.1 ...  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
I have a 7.2.1 query with two peculiar characteristics and wondered if anyone
could offer some insight.

First, my query takes 90 seconds with a hash join, but only 1 second with
nested loops.

Second, the same query sometimes takes 10-50 seconds shortly after possibly a
dump or other high-data-volume queries are executed, after which it then
returns to 1 second execution time.  Getting crowded out of shared memory?

Finally, I am inclined to turn off hash joins altogether.  What should I
expect to lose in doing so?

Schema, query, and plans shown below...

Thanks,
Ed


-- 700,000 rows
CREATE TABLE story (
         value TEXT,
         key   SERIAL
);
CREATE UNIQUE INDEX story_pkey ON story USING btree (key);

-- 700,000 rows
CREATE TABLE dict (
         word VARCHAR,
         key  SERIAL
);
CREATE UNIQUE INDEX dict_pkey ON dict USING btree (key);
CREATE UNIQUE INDEX dict_word_key ON dict USING btree (word);

-- 28,000,000 rows
CREATE TABLE story_dict (
         dictkey INTEGER NOT NULL,
         storykey INTEGER NOT NULL
);
CREATE UNIQUE INDEX story_dict_pkey ON story_dict(dictkey, storykey);
CREATE INDEX story_dict_tk_idx ON story_dict(storykey);

-- Query:
-- ======

SELECT DISTINCT ftd1.storykey
FROM story_dict ftd1, dict d1
WHERE d1.word = 'foo'
   AND d1.key = ftd1.dictkey
   AND EXISTS (
         SELECT ft2.key
         FROM story ft2, story_dict ftd2, dict d2
         WHERE d2.word = 'bar'
           AND d2.key = ftd2.dictkey
           AND ftd2.storykey = ft2.key
           AND ftd2.storykey = ftd1.storykey)
   AND EXISTS (
         SELECT ft3.key
         FROM story ft3, story_dict ftd3, dict d3
         WHERE d3.word = 'baz'
           AND d3.key = ftd3.dictkey
           AND ftd3.storykey = ft3.key
           AND ftd3.storykey = ftd1.storykey)
ORDER BY ftd1.storykey
LIMIT 1000;


Plan with PGOPTIONS = "-fn":

Limit (cost=15409053054.71..15409053054.73 rows=1 width=12)
  -> Unique (cost=15409053054.71..15409053054.73 rows=1 width=12)
   -> Sort (cost=15409053054.71..15409053054.71 rows=9 width=12)
     -> Nested Loop (cost=100000000.00..15409053054.57 rows=9 width=12)
      -> Index Scan using word_idx on dict d1 (cost=0.00..5.98 rows=1 width=4)
      -> Index Scan using story_dict_pkey on story_dict ftd1
(cost=0.00..15309052993.63 rows=4398 width=8)
        SubPlan
        -> Merge Join (cost=429157.62..435130.62 rows=1 width=16)
          -> Sort (cost=13.11..13.11 rows=1 width=12)
           -> Hash Join (cost=5.98..13.10 rows=1 width=12)
             -> Index Scan using story_dict_tk_idx on story_dict ftd2
(cost=0.00..6.59 rows=106 width=8)
             -> Hash (cost=5.98..5.98 rows=1 width=4)
              -> Index Scan using word_idx on dict d2 (cost=0.00..5.98 rows=1
width=4)
          -> Sort (cost=429144.50..429144.50 rows=2389196 width=4)
           -> Seq Scan on story ft2 (cost=0.00..86701.96 rows=2389196 width=4)
        -> Merge Join (cost=429157.62..435130.62 rows=1 width=16)
          -> Sort (cost=13.11..13.11 rows=1 width=12)
           -> Hash Join (cost=5.98..13.10 rows=1 width=12)
             -> Index Scan using story_dict_tk_idx on story_dict ftd3
(cost=0.00..6.59 rows=106 width=8)
             -> Hash (cost=5.98..5.98 rows=1 width=4)
              -> Index Scan using word_idx on dict d3 (cost=0.00..5.98 rows=1
width=4)
          -> Sort (cost=429144.50..429144.50 rows=2389196 width=4)
           -> Seq Scan on story ft3 (cost=0.00..86701.96 rows=2389196 width=4)


Plan with PGOPTIONS = "-fh":

Limit (cost=635283.31..635283.33 rows=1 width=12)
  -> Unique (cost=635283.31..635283.33 rows=1 width=12)
   -> Sort (cost=635283.31..635283.31 rows=9 width=12)
     -> Nested Loop (cost=0.00..635283.17 rows=9 width=12)
      -> Index Scan using word_idx on dict d1 (cost=0.00..5.98 rows=1 width=4)
      -> Index Scan using story_dict_pkey on story_dict ftd1
(cost=0.00..635222.22 rows=4398 width=8)
        SubPlan
        -> Nested Loop (cost=0.00..16.07 rows=1 width=16)
          -> Nested Loop (cost=0.00..12.00 rows=1 width=12)
           -> Index Scan using word_idx on dict d2 (cost=0.00..5.98 rows=1
width=4)
           -> Index Scan using story_dict_pkey on story_dict ftd2
(cost=0.00..6.01 rows=1 width=8)
          -> Index Scan using story_pkey on story ft2 (cost=0.00..4.06 rows=1
width=4)
        -> Nested Loop (cost=0.00..16.07 rows=1 width=16)
          -> Nested Loop (cost=0.00..12.00 rows=1 width=12)
           -> Index Scan using word_idx on dict d3 (cost=0.00..5.98 rows=1
width=4)
           -> Index Scan using story_dict_pkey on story_dict ftd3
(cost=0.00..6.01 rows=1 width=8)
          -> Index Scan using story_pkey on story ft3 (cost=0.00..4.06 rows=1
width=4)



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: now() AT TIME ZONE 'GMT';
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Hash Join vs Nested Loops in 7.2.1 ...