Обсуждение: Sequential scan evaluating function for each row, seemingly needlessly
On psql 8.3.9, I ran a limited query limited to 5 results. There was a moderately expensive function call which I expected to be called 5 times, but was apparently called for each row of the sequential scan. Why? preproduction=> explain analyze select url(context_key) from extractq order by add_date desc limit 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Limit (cost=19654.53..19654.54 rows=5 width=12) (actual time=10001.976..10001.990 rows=5 loops=1) -> Sort (cost=19654.53..19826.16 rows=68651 width=12) (actual time=10001.972..10001.976 rows=5 loops=1) Sort Key: add_date Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on extractq (cost=0.00..18514.26 rows=68651 width=12) (actual time=19.145..9770.689 rows=73550 loops=1) Total runtime: 10002.150 ms (6 rows) preproduction=> explain analyze select context_key from extractq order by add_date desc limit 5; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Limit (cost=2491.78..2491.79 rows=5 width=12) (actual time=250.188..250.203 rows=5 loops=1) -> Sort (cost=2491.78..2663.41 rows=68651 width=12) (actual time=250.184..250.188 rows=5 loops=1) Sort Key: add_date Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on extractq (cost=0.00..1351.51 rows=68651 width=12) (actual time=0.015..145.432 rows=73557 loops=1) Total runtime: 250.450 ms (6 rows) preproduction=> select version(); version ------------------------------------------------------------------------------------------------ PostgreSQL 8.3.9 on x86_64-pc-linux-gnu,compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2
Bryce Nesbitt <bryce2@obviously.com> writes: > On psql 8.3.9, I ran a limited query limited to 5 results. There was a > moderately expensive function call > which I expected to be called 5 times, but was apparently called for > each row of the sequential scan. Why? Given the plan: > Limit (cost=19654.53..19654.54 rows=5 width=12) (actual > time=10001.976..10001.990 rows=5 loops=1) > -> Sort (cost=19654.53..19826.16 rows=68651 width=12) (actual > time=10001.972..10001.976 rows=5 loops=1) > Sort Key: add_date > Sort Method: top-N heapsort Memory: 25kB > -> Seq Scan on extractq (cost=0.00..18514.26 rows=68651 > width=12) (actual time=19.145..9770.689 rows=73550 loops=1) > Total runtime: 10002.150 ms > (6 rows) any interesting work is going to be done at the seqscan level. Sort just sorts, and Limit just limits; neither do any user-defined calculations. So yeah, your functions got run for every row of the table. (This isn't totally a PG aberration, btw: if you read the SQL spec closely you'll discover that ORDER BY is defined to happen after any calculations specified in the SELECT list.) You could try something like select my_expensive_function(...), etc, etc from (select * from some-tables order by foo limit n) ss; where the inner select list just pulls the columns you'll need in the outer calculations. regards, tom lane