Обсуждение: Bogus startup cost for WindowAgg

Поиск
Список
Период
Сортировка

Bogus startup cost for WindowAgg

От
Ants Aasma
Дата:
I hit an issue with window aggregate costing while experimenting with
providing a count of the full match along side a limited result set.
Seems that the window aggregate node doesn't take into account that it
has to consume the whole input before outputting the first row. When
this is combined with a limit, the resulting cost estimate is wildly
underestimated, leading to suboptimal plans.

Is this a known issue? I couldn't find anything referring to this on
the mailing list or todo.

Code to reproduce follows:

ants=# CREATE TABLE test (a int, b int);
CREATE TABLE
ants=# INSERT INTO test (a,b) SELECT random()*1e6, random()*1e6 FROM
generate_series(1,1000000);
INSERT 0 1000000
ants=# CREATE INDEX a_idx ON test (a);
CREATE INDEX
ants=# CREATE INDEX b_idx ON test (b);
CREATE INDEX
ants=# ANALYZE test;
ANALYZE
ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a <
2500 ORDER BY b LIMIT 10;
                                                             QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..195.31 rows=10 width=8) (actual
time=728.325..728.339 rows=10 loops=1)
   ->  WindowAgg  (cost=0.00..46209.93 rows=2366 width=8) (actual
time=728.324..728.337 rows=10 loops=1)
         ->  Index Scan using b_idx on test  (cost=0.00..46180.36
rows=2366 width=8) (actual time=0.334..727.221 rows=2512 loops=1)
               Filter: (a < 2500)
 Total runtime: 728.401 ms
(5 rows)

ants=# SET enable_indexscan = off;
SET
ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a <
2500 ORDER BY b LIMIT 10;
                                                              QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3986.82..3986.85 rows=10 width=8) (actual
time=7.186..7.189 rows=10 loops=1)
   ->  Sort  (cost=3986.82..3992.74 rows=2366 width=8) (actual
time=7.185..7.187 rows=10 loops=1)
         Sort Key: b
         Sort Method:  top-N heapsort  Memory: 25kB
         ->  WindowAgg  (cost=46.70..3935.69 rows=2366 width=8)
(actual time=4.181..6.508 rows=2512 loops=1)
               ->  Bitmap Heap Scan on test  (cost=46.70..3906.12
rows=2366 width=8) (actual time=0.933..3.555 rows=2512 loops=1)
                     Recheck Cond: (a < 2500)
                     ->  Bitmap Index Scan on a_idx  (cost=0.00..46.10
rows=2366 width=0) (actual time=0.512..0.512 rows=2512 loops=1)
                           Index Cond: (a < 2500)
 Total runtime: 7.228 ms
(10 rows)

Re: Bogus startup cost for WindowAgg

От
Mladen Gogala
Дата:
Ants Aasma wrote:
> I hit an issue with window aggregate costing while experimenting with
> providing a count of the full match along side a limited result set.
> Seems that the window aggregate node doesn't take into account that it
> has to consume the whole input before outputting the first row. When
> this is combined with a limit, the resulting cost estimate is wildly
> underestimated, leading to suboptimal plans.
>
> Is this a known issue? I couldn't find anything referring to this on
> the mailing list or todo.
>
>
What is your histogram size? That's defined by the
default_statistics_target in your postgresql.conf.
Check the column histograms like this:

    news=> select attname,array_length(most_common_vals,1)
    from pg_stats
    where tablename='moreover_documents_y2010m09';
           attname        | array_length
    ----------------------+--------------
     document_id          |
     dre_reference        |
     headline             |         1024
     author               |          212
     url                  |
     rank                 |           59
     content              |         1024
     stories_like_this    |
     internet_web_site_id |         1024
     harvest_time         |         1024
     valid_time           |         1024
     keyword              |           95
     article_id           |
     media_type           |            5
     source_type          |            1
     created_at           |         1024
     autonomy_fed_at      |         1024
     language             |           37
    (18 rows)

    news=> show default_statistics_target;
     default_statistics_target
    ---------------------------
     1024
    (1 row)

You will see that for most of the columns, the length of the histogram
array corresponds to the value of the default_statistics_target
parameter. For those that are smaller, the size is the total number of
values in the column in the sample taken by the "analyze" command. The
longer histogram, the better plan. In this case, the size does matter.
Note that there are no histograms for the document_id and dre_reference
columns. Those are the primary and unique keys, the optimizer can easily
guess the distribution of values.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Bogus startup cost for WindowAgg

От
Tom Lane
Дата:
Ants Aasma <ants.aasma@eesti.ee> writes:
> Seems that the window aggregate node doesn't take into account that it
> has to consume the whole input before outputting the first row.

Well, the reason it doesn't assume that is it's not true ;-).  In this
particular case it's true, but more generally you only have to read the
current input partition, and often not even all of that.

I'm not sure offhand how much intelligence would have to be added to
make a reasonable estimate of the effects of having to read ahead of the
current input row, but it's probably not trivial.  We haven't spent much
time at all yet on creating a realistic cost model for WindowAgg...

            regards, tom lane

Re: Bogus startup cost for WindowAgg

От
Mladen Gogala
Дата:
Ants Aasma wrote:
> I hit an issue with window aggregate costing while experimenting with
> providing a count of the full match along side a limited result set.
> Seems that the window aggregate node doesn't take into account that it
> has to consume the whole input before outputting the first row. When
> this is combined with a limit, the resulting cost estimate is wildly
> underestimated, leading to suboptimal plans.
>
What is your histogram size? That's defined by the
default_statistics_target in your postgresql.conf.
Check the column histograms like this:

    news=> select attname,array_length(most_common_vals,1)
    from pg_stats
    where tablename='moreover_documents_y2010m09';
           attname        | array_length
    ----------------------+--------------
     document_id          |
     dre_reference        |
     headline             |         1024
     author               |          212
     url                  |
     rank                 |           59
     content              |         1024
     stories_like_this    |
     internet_web_site_id |         1024
     harvest_time         |         1024
     valid_time           |         1024
     keyword              |           95
     article_id           |
     media_type           |            5
     source_type          |            1
     created_at           |         1024
     autonomy_fed_at      |         1024
     language             |           37
    (18 rows)

    news=> show default_statistics_target;
     default_statistics_target
    ---------------------------
     1024
    (1 row)

You will see that for most of the columns, the length of the histogram
array corresponds to the value of the default_statistics_target
parameter. For those that are smaller, the size is the total number of
values in the column in the sample taken by the "analyze" command. The
longer histogram, the better plan. In this case, the size does matter.
Note that there are no histograms for the document_id and dre_reference
columns. Those are the primary and unique keys, the optimizer can easily
guess the distribution of values.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Bogus startup cost for WindowAgg

От
Ants Aasma
Дата:
On Wed, Oct 13, 2010 at 10:35 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> You will see that for most of the columns, the length of the histogram array
> corresponds to the value of the default_statistics_target parameter. For
> those that are smaller, the size is the total number of values in the column
> in the sample taken by the "analyze" command. The longer histogram, the
> better plan. In this case, the size does matter.

The issue here isn't that the statistics are off. The issue is, as Tom
said, that the optimizer doesn't consider them for the cost model of
the window aggregate. The trivial case I put forward wouldn't be too
hard to cover - if there's no partitioning of the window and the frame
is over the full partition, the startup cost should be nearly the same
as the full cost. But outside of the trick I tried, I'm not sure if
the trivial case matters much. I can also see how the estimation gets
pretty hairy when partitioning, frames and real window functions come
into play.

One idea would be to cost three different cases. If the aggregate
needs to read ahead some most likely constant number of rows, i.e. is
not using an unbounded following frame, leave the startup cost as is.
If there is partitioning, estimate the number of groups produced by
the partitioning and add one n-th of the difference between startup
and total cost. Otherwise, if the frame is to the end of the partition
and there is no partitioning, set the startup cost equal to total
cost, or in terms of the previous case, n=1. I don't know how accurate
estimating the number of groups would be, or even if it is feasible to
do it. If those assumptions hold, then it seems to me that this method
should at-least cover any large O(n) effects.