Re: Possible optimisation: push down SORT and LIMIT nodes

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Possible optimisation: push down SORT and LIMIT nodes
Дата
Msg-id 20180530190231.GH5164@telsasoft.com
обсуждение исходный текст
Ответ на Possible optimisation: push down SORT and LIMIT nodes  (Christopher Wilson <chris.wilson@cantabcapital.com>)
Ответы Re: Possible optimisation: push down SORT and LIMIT nodes  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-performance
On Wed, May 30, 2018 at 03:46:40PM +0000, Christopher Wilson wrote:
> We have a query which is rather slow (about 10 seconds), and it looks like this:
> 
> The inventory table has the quantity of each asset in the inventory on each
> date (complete SQL to create and populate the tables with dummy data is
> below). The query plan looks like this (the non-parallel version is similar):

Hi,

Thanks for including the test case.

> Limit  (cost=217591.77..217603.60 rows=100 width=32) (actual time=9122.235..9122.535 rows=100 loops=1)
...
>          ->  Sort  (cost=216591.73..220628.83 rows=1614839 width=32) (actual time=8879.909..8880.030 rows=727
loops=4)
>                Sort Key: inventory.date, asset.name
>                Sort Method: external merge  Disk: 50904kB
>                Buffers: shared hit=27365, temp read=25943 written=25947

Yep, the sort is expensive and largely wasted..

> I'm imagining something like a sort-limit-finish node, which sorts its input
> and then returns at least the limit number of rows, but keeps returning rows
> until it exhausts the last sort prefix that it read.
[...]
> Does this sound correct, reasonable and potentially interesting to Postgres
> developers?

I think your analysis may be (?) unecessarily specific to your specific problem
query.

For diagnostic purposes, I was able to to vastly improve the query runtime with
a CTE (WITH):

|postgres=# explain(analyze,buffers) WITH x AS (SELECT inventory.date, asset.name, inventory.quantity FROM
temp.inventoryLEFT JOIN temp.asset ON asset.id=id_asset LIMIT 99) SELECT * FROM x ORDER BY date, name;
 
| Sort  (cost=1090.60..1090.85 rows=99 width=40) (actual time=3.764..3.988 rows=99 loops=1)
|   Sort Key: x.date, x.name
|   Sort Method: quicksort  Memory: 32kB
|   Buffers: shared hit=298
|   CTE x
|     ->  Limit  (cost=0.28..889.32 rows=99 width=31) (actual time=0.063..2.385 rows=99 loops=1)
|           Buffers: shared hit=298
|           ->  Nested Loop Left Join  (cost=0.28..44955006.99 rows=5006001 width=31) (actual time=0.058..1.940 rows=99
loops=1)
|                 Buffers: shared hit=298
|                 ->  Seq Scan on inventory  (cost=0.00..5033061.00 rows=5006001 width=12) (actual time=0.020..0.275
rows=99loops=1)
 
|                       Buffers: shared hit=1
|                 ->  Index Scan using asset_pkey on asset  (cost=0.28..7.98 rows=1 width=27) (actual time=0.008..0.008
rows=1loops=99)
 
|                       Index Cond: (id = inventory.id_asset)
|                       Buffers: shared hit=297
|   ->  CTE Scan on x  (cost=0.00..198.00 rows=99 width=40) (actual time=0.073..2.989 rows=99 loops=1)
|         Buffers: shared hit=298
| Planning time: 0.327 ms
| Execution time: 4.260 ms

It's not clear to me if there's some reason why the planner couldn't know to
use a similar plan (sort-limit-... rather than limit-sort-...)

Justin


В списке pgsql-performance по дате отправления:

Предыдущее
От: Christopher Wilson
Дата:
Сообщение: Possible optimisation: push down SORT and LIMIT nodes
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Possible optimisation: push down SORT and LIMIT nodes