Обсуждение: Sequential scan evaluating function for each row, seemingly needlessly

Поиск
Список
Период
Сортировка

Sequential scan evaluating function for each row, seemingly needlessly

От
Bryce Nesbitt
Дата:
On psql 8.3.9, I ran a limited query limited to 5 results.  There was a 
moderately expensive function call
which I expected to be called 5 times, but was apparently called for 
each row of the sequential scan.  Why?



preproduction=> explain analyze select url(context_key) from extractq 
order by add_date desc limit 5;                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=19654.53..19654.54 rows=5 width=12) (actual 
 
time=10001.976..10001.990 rows=5 loops=1)   ->  Sort  (cost=19654.53..19826.16 rows=68651 width=12) (actual 
time=10001.972..10001.976 rows=5 loops=1)         Sort Key: add_date         Sort Method:  top-N heapsort  Memory: 25kB
       ->  Seq Scan on extractq  (cost=0.00..18514.26 rows=68651 
 
width=12) (actual time=19.145..9770.689 rows=73550 loops=1) Total runtime: 10002.150 ms
(6 rows)


preproduction=> explain analyze select context_key from extractq order 
by add_date desc limit 5;                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
Limit (cost=2491.78..2491.79 rows=5 width=12) (actual 
 
time=250.188..250.203 rows=5 loops=1)   ->  Sort  (cost=2491.78..2663.41 rows=68651 width=12) (actual 
time=250.184..250.188 rows=5 loops=1)         Sort Key: add_date         Sort Method:  top-N heapsort  Memory: 25kB
   ->  Seq Scan on extractq  (cost=0.00..1351.51 rows=68651 
 
width=12) (actual time=0.015..145.432 rows=73557 loops=1) Total runtime: 250.450 ms
(6 rows)




preproduction=> select version();                                            version
------------------------------------------------------------------------------------------------ PostgreSQL 8.3.9 on
x86_64-pc-linux-gnu,compiled by GCC gcc-4.3.real 
 
(Debian 4.3.2-1.1) 4.3.2



Re: Sequential scan evaluating function for each row, seemingly needlessly

От
Tom Lane
Дата:
Bryce Nesbitt <bryce2@obviously.com> writes:
> On psql 8.3.9, I ran a limited query limited to 5 results.  There was a 
> moderately expensive function call
> which I expected to be called 5 times, but was apparently called for 
> each row of the sequential scan.  Why?

Given the plan:

>   Limit  (cost=19654.53..19654.54 rows=5 width=12) (actual 
> time=10001.976..10001.990 rows=5 loops=1)
>     ->  Sort  (cost=19654.53..19826.16 rows=68651 width=12) (actual 
> time=10001.972..10001.976 rows=5 loops=1)
>           Sort Key: add_date
>           Sort Method:  top-N heapsort  Memory: 25kB
>           ->  Seq Scan on extractq  (cost=0.00..18514.26 rows=68651 
> width=12) (actual time=19.145..9770.689 rows=73550 loops=1)
>   Total runtime: 10002.150 ms
> (6 rows)

any interesting work is going to be done at the seqscan level.  Sort
just sorts, and Limit just limits; neither do any user-defined
calculations.  So yeah, your functions got run for every row of the
table.  (This isn't totally a PG aberration, btw: if you read the SQL
spec closely you'll discover that ORDER BY is defined to happen after
any calculations specified in the SELECT list.)

You could try something like
select my_expensive_function(...), etc, etc from    (select * from some-tables order by foo limit n) ss;

where the inner select list just pulls the columns you'll need in
the outer calculations.
        regards, tom lane