Planner having way wrong estimate for group aggregate

Поиск

Список

Период

Сортировка

От	Steinar H. Gunderson
Тема	Planner having way wrong estimate for group aggregate
Дата	18 сентября 2004 г. 18:01:30
Msg-id	20040918170117.GA3354@uio.no обсуждение
Ответы	Re: Planner having way wrong estimate for group aggregate
Список	pgsql-performance

Дерево обсуждения

Hi,

I'm using PostgreSQL 7.4 on a table with ~700.000 rows looking like this:

                                   Table "public.enkeltsalg"
   Column   |           Type           |                       Modifiers
------------+--------------------------+-------------------------------------------------------
 id         | integer                  | not null default nextval('"enkeltsalg_id_seq"'::text)
 kommentar  | text                     | not null default ''::text
 antall     | numeric(14,4)            | not null
 belop      | numeric(10,0)            | not null
 type       | character(1)             | not null
 tid        | timestamp with time zone | default now()
 eksternid  | integer                  |
 kasseid    | integer                  |
 baraapning | integer                  |
 salgspris  | integer                  |
 firma      | integer                  |
 bongid     | integer                  |
Indexes:
    "enkeltsalg_pkey" primary key, btree (id)
    "enkeltsalg_aapn" btree (baraapning)
    "enkeltsalg_aapn_pris" btree (baraapning, salgspris)
    "enkeltsalg_aapn_type" btree (baraapning, "type")
    "enkeltsalg_pris" btree (salgspris)
Check constraints:
    "enkeltsalg_type_valid" CHECK ("type" = 'K'::bpchar OR "type" = 'B'::bpchar OR "type" = 'M'::bpchar OR "type" =
'T'::bpchar)

And I'm doing the query (after VACUUM ANALYZE)

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by
date_trunc('day',tid);
                                                            QUERY PLAN
          

----------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=108062.34..114477.98 rows=172735 width=17) (actual time=20977.544..23890.020 rows=361 loops=1)
   ->  Sort  (cost=108062.34..109912.99 rows=740263 width=17) (actual time=20947.372..21627.107 rows=710720 loops=1)
         Sort Key: date_trunc('day'::text, tid)
         ->  Seq Scan on enkeltsalg  (cost=0.00..18010.29 rows=740263 width=17) (actual time=0.091..7180.528
rows=710720loops=1) 
 Total runtime: 23908.538 ms
(5 rows)

Now, as you can see, the GroupAggregate here is _way_ off, so the planner
makes the wrong choice (it should do a hash aggregate). If I set sort_mem to
131072 instead of 16384, it does a hash aggregate (which is 10 seconds
instead of 24), but I can't have sort_mem that high generally.

Now, my first notion was creating a functional index to help the planner:

smt=# create index enkeltsalg_dag on enkeltsalg ( date_trunc('day',tid) );
CREATE INDEX
smt=# vacuum analyze;
VACUUM

However, this obviously didn't help the planner (this came as a surprise to
me, but probably won't come as a surprise to the more seasoned users here :-)
):

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by
date_trunc('day',tid);
                                                            QUERY PLAN
          

----------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=103809.15..110017.11 rows=175512 width=17) (actual time=21061.357..23917.370 rows=361 loops=1)
   ->  Sort  (cost=103809.15..105585.95 rows=710720 width=17) (actual time=21032.239..21695.674 rows=710720 loops=1)
         Sort Key: date_trunc('day'::text, tid)
         ->  Seq Scan on enkeltsalg  (cost=0.00..17641.00 rows=710720 width=17) (actual time=0.091..7231.387
rows=710720loops=1) 
 Total runtime: 23937.791 ms
(5 rows)

I also tried to increase the statistics on the "tid" column:

smt=# alter table enkeltsalg alter column tid set statistics 500;
ALTER TABLE
smt=# analyze enkeltsalg;
ANALYZE

However, this made the planner only do a _worse_ estimate:

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by
date_trunc('day',tid);
                                                            QUERY PLAN
          

----------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=107906.59..114449.09 rows=199715 width=17) (actual time=20947.197..23794.389 rows=361 loops=1)
   ->  Sort  (cost=107906.59..109754.56 rows=739190 width=17) (actual time=20918.001..21588.735 rows=710720 loops=1)
         Sort Key: date_trunc('day'::text, tid)
         ->  Seq Scan on enkeltsalg  (cost=0.00..17996.88 rows=739190 width=17) (actual time=0.092..7166.488
rows=710720loops=1) 
 Total runtime: 23814.624 ms
(5 rows)

Actually, it seems that the higher I set statistics on "tid", the worse the
estimate becomes.

Also, I was told (on #postgresql :-) ) to include the following information:

smt=# select n_distinct from pg_stats where attname='tid';
 n_distinct
 ------------
  -0.270181
(1 row)

Any ideas for speeding this up?

/* Steinar */
--
Homepage: http://www.sesse.net/

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Planner having way wrong estimate for group aggregate