Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

Поиск
Список
Период
Сортировка
От Maxim Boguk
Тема Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries
Дата
Msg-id CAK-MWwT5+xaQhydZ0TZo_U+Q=Rb9C2H=m+f4LuVqxkk7BU3CVQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries  (Craig Ringer <ringerc@ringerc.id.au>)
Список pgsql-performance
I understand that position.
However if assumption: " the definition of ORDER BY --- it happens after computing the select list, according to the SQL standard"
is correct,
then plans like:

postgres=# EXPLAIN ANALYZE SELECT * from test order by _data limit 10 offset 1000;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2884.19..2913.03 rows=10 width=8) (actual time=3.584..3.620 rows=10 loops=1)
   ->  Index Scan using random_key on test  (cost=0.00..2884190.16 rows=1000000 width=8) (actual time=0.103..3.354 rows=1010 loops=1)
 Total runtime: 3.663 ms
(3 rows)
should not be used at all.


In realty I was bite by next scenario (that is simplified case):

postgres=# CREATE TABLE test as (select random() as _data from (select * from generate_series(1,1000000)) as t);
SELECT 1000000
postgres=# CREATE INDEX random_key on test(_data);
CREATE INDEX
postgres=# analyze test;
ANALYZE
postgres=# set seq_page_cost to 1;
SET
postgres=# set random_page_cost to 4;
SET
postgres=# set effective_cache_size to '16MB';
SET

Now:
postgres=# EXPLAIN analyze SELECT *,(select pg_sleep(10)) from test order by _data limit 10;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.01..28.85 rows=10 width=8) (actual time=10001.132..10001.198 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=10001.076..10001.078 rows=1 loops=1)
   ->  Index Scan using random_key on test  (cost=0.00..2884190.16 rows=1000000 width=8) (actual time=10001.129..10001.188 rows=10 loops=1)
 Total runtime: 10001.252 ms
(5 rows)

Is ok.

postgres=# EXPLAIN analyze SELECT *,(select pg_sleep(10)) from test order by _data limit 10 offset 10000;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=28841.91..28870.76 rows=10 width=8) (actual time=10037.850..10037.871 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=10001.040..10001.041 rows=1 loops=1)
   ->  Index Scan using random_key on test  (cost=0.00..2884190.16 rows=1000000 width=8) (actual time=10001.094..10036.022 rows=10010 loops=1)
 Total runtime: 10037.919 ms
(5 rows)

Is still ok.


postgres=# EXPLAIN SELECT *,(select pg_sleep(10)) from test order by _data limit 10 offset 100000;
                                QUERY PLAN
--------------------------------------------------------------------------
 Limit  (cost=102723.94..102723.96 rows=10 width=8)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  Sort  (cost=102473.92..104973.92 rows=1000000 width=8)
         Sort Key: _data
         ->  Seq Scan on test  (cost=0.00..14425.00 rows=1000000 width=8)
(6 rows)

Ooops, there project screwed.

And it is not possible to predict in advance where and when you get hit by that problem.
E.g. all usually fast statements with some arguments become slow as a snail once DB switch from index scan to top node sort.

Only way prevent that is always write all queries way you suggested.

Kind Regards,
Maksym

On Fri, Nov 25, 2011 at 4:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maxim Boguk <maxim.boguk@gmail.com> writes:
> Is here any reason why Postgresql calculates subqueries/storable procedures
> in select list before applying ORDER BY / LIMIT?

Well, that's the definition of ORDER BY --- it happens after computing
the select list, according to the SQL standard.  We try to optimize this
in some cases but you can't really complain when we don't.  Consider
putting the expensive function outside the ORDER BY/LIMIT, ie

select ..., expensive_fn() from (select ... order by ... limit ...) ss;

                       regards, tom lane



--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Cédric Villemain
Дата:
Сообщение: Re: query uses index but takes too much time?
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries