Re: expensive function in select list vs limit clause

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: expensive function in select list vs limit clause
Дата
Msg-id A737B7A37273E048B164557ADEF4A58B53A1CE81@ntex2010i.host.magwien.gv.at
обсуждение исходный текст
Ответы Re: expensive function in select list vs limit clause
Список pgsql-general
Chris Mair wrote:
> I've found a (simple) situation where the planner does something I don't understand.
> 
> Below is a complete test case followed by output.
> 
>  From the timings it appears that in the second explain analyze query a function
> call in the select list (expensive()) is evaluated in the sequential scan node
> *for each* row in big, despite the use of limit.
> 
> I would have expected expensive() to be evaluated only for the ten rows
> in the result set. Hence the second explain analyze query shouldn't be more
> expensive than the first one.
> 
> My trust in Postgres' planner goes so far as I feel the planner is right and there
> must be a reason for this :)
> 
> Could someone help me understand this behaviour?
[...]
> create function expensive() returns double precision as
> $$
>      begin
>          for i in 1 .. 15000 loop
>          end loop;
>          return random();
>      end;
> $$ language 'plpgsql';

This is unrelated, but you should set COST for an expensive function
to help the planner.

[...]
> -- now do the same, but add an expensive() column to the result:
> -- takes ~ 29s => WHY?
> 
> explain analyze select r, expensive() from big order by r offset 0 limit 10;
[...]
>                                                            QUERY PLAN
> ------------------------------------------------------------------------------------------
> ------------------------------------
>   Limit  (cost=286034.64..286034.67 rows=10 width=8) (actual time=28932.311..28932.314
> rows=10 loops=1)
>     ->  Sort  (cost=286034.64..288534.64 rows=1000000 width=8) (actual
> time=28932.309..28932.310 rows=10 loops=1)
>           Sort Key: r
>           Sort Method: top-N heapsort  Memory: 25kB
>           ->  Seq Scan on big  (cost=0.00..264425.00 rows=1000000 width=8) (actual
> time=0.062..28822.520 rows=1000000 loops=1)
>   Planning time: 0.038 ms
>   Execution time: 28932.339 ms
> (7 rows)

ORDER BY can only be processed after all rows have been fetched, this
includes the expensive result column.

You can easily avoid that by applying the LIMIT first:

  SELECT r, expensive()
  FROM (SELECT r
        FROM big
        ORDER BY r
        LIMIT 10
       ) inner;

I don't know how hard it would be to only fetch the necessary columns before
the ORDER BY and fetch the others after the LIMIT has been applied, but it
is probably nontrivial and would require processing time for *everybody*
who runs a query with ORDER BY to solve a rare problem that can easily be
worked around.

Yours,
Laurenz Albe

В списке pgsql-general по дате отправления:

Предыдущее
От: Daniel Westermann
Дата:
Сообщение: Re: Query never completes with low work_mem (at least notwithin one hour)
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Unexpected interval comparison