GROUP BY with reasonable timings in PLAN but unreasonable execution time

Поиск
Список
Период
Сортировка
От Clem Dickey
Тема GROUP BY with reasonable timings in PLAN but unreasonable execution time
Дата
Msg-id iv0h49$1var$1@news.hub.org
обсуждение исходный текст
Ответы Re: GROUP BY with reasonable timings in PLAN but unreasonable execution time  (Clem Dickey <dickeycl@us.ibm.com>)
Список pgsql-performance
I have a query which seems to be taking an extraordinarily long time
(many minutes, at least) when seemingly equivalent queries have
different plans and execute in seconds. naturally, I'd like to know why.

Version is Postgresql 8.4.8. The table, "t", is

  Column |  Type   | Modifiers
--------+---------+-----------
  y      | integer | not null
  x      | integer | not null
  k      | integer | not null
  j      | integer | not null
  z      | integer | not null
Indexes:
     "t_pkey" PRIMARY KEY, btree (j, k, x, y, z)

The table population, in pseudocode, is this:
   for x in 0..9
     for y in 0..9999
       for z in 0..29
         INSERT INTO t VALUES(y,x,0,0,z)

So the table has 300000 entries, with j and k always 0.

The query is:

  SELECT *
    FROM (
     SELECT * FROM t GROUP BY j,k,x,z,y
    ) AS f
    NATURAL JOIN t;

The plan:

  Merge Join  (cost=44508.90..66677.96 rows=1 width=20)
   Merge Cond: ((public.t.j = public.t.j) AND (public.t.k = public.t.k)
                AND (public.t.x = public.t.x))
   Join Filter: ((public.t.y = public.t.y) AND (public.t.z = public.t.z))
   -> Group (cost=44508.90..49008.90 rows=30000 width=20)
      ->  Sort (cost=44508.90..45258.90 rows=300000 width=20)
         Sort Key: public.t.j, public.t.k, public.t.x, public.t.z,
                   public.t.y
         ->  Seq Scan on t  (cost=0.00..4911.00 rows=300000 width=20)
   ->  Index Scan using t_pkey on t  (cost=0.00..14877.18 rows=300000
                                      width=20)

This query runs at least 20 minutes, with postmaster CPU utilization at
99%, without completing. System is a 3.2GHz Zeon, 3GB memory, and not
much else running.

By contrast, placing an intermediate result in a table "u" provides a
result in about 3 seconds:

  CREATE TEMPORARY TABLE u AS SELECT * FROM t GROUP BY j,k,x,z,y;
  SELECT * FROM u NATURAL JOIN t;

Changing the order of the GROUP BY clause varies the plan, sometimes
yielding shorter execution times. For example, this ordering executes in
about 1.5 seconds:

  SELECT *
    FROM (
     SELECT * FROM t GROUP BY j,k,x,y,z
    ) AS f
    NATURAL JOIN t;

With 120 permutations, I didn't try them all.

I should note that the plans tend to have similar costs, so the query
planner presumably does not know that some permutations have
significantly greater execution times.

Clem Dickey

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Matthias Howell"
Дата:
Сообщение: Query in 9.0.2 not using index in 9.0.0 works fine
Следующее
От: Samuel Gendler
Дата:
Сообщение: Re: Query in 9.0.2 not using index in 9.0.0 works fine