Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Поиск
Список
Период
Сортировка
От James Coleman
Тема Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Дата
Msg-id 153625271389.4078.18405235923135713189.pgcf@coridan.postgresql.org
обсуждение исходный текст
Ответ на Re: Re: [HACKERS] [PATCH] Incremental sort  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

A fairly common planning problem for us is what we call "most recent first" queries; i.e., "the 50 most recent <table>
rowsfor a <foreign key>".
 

Here's a basic setup:

-- created_at has very high cardinality
create table foo(pk serial primary key, owner_fk integer, created_at timestamp);
create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at);

-- technically this data guarantees unique created_at values,
-- but there's no reason it couldn't be modified to have a few
-- random non-unique values to prove the point
insert into foo(owner_fk, created_at)
  select i % 100, now() - (i::text || ' minutes')::interval
  from generate_series(1, 1000000) t(i);


And here's the naive query to get the results we want:

select *
from foo
where owner_fk = 23
-- pk is only here to disambiguate/guarantee a stable sort
-- in the rare case that there are collisions in the other
-- sort field(s)
order by created_at desc, pk desc
limit 50;


On stock Postgres this ends up being pretty terrible for cases where the fk filter represents a large number of rows,
becausethe planner generates a sort node under the limit node and therefore fetches all matches, sorts them, and then
appliesthe limit. Here's the plan:
 

 Limit  (cost=61386.12..61391.95 rows=50 width=16) (actual time=187.814..191.653 rows=50 loops=1)
   ->  Gather Merge  (cost=61386.12..70979.59 rows=82224 width=16) (actual time=187.813..191.647 rows=50 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=60386.10..60488.88 rows=41112 width=16) (actual time=185.639..185.642 rows=42 loops=3)
               Sort Key: created_at DESC, pk DESC
               Sort Method: top-N heapsort  Memory: 27kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 27kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 27kB
               ->  Parallel Bitmap Heap Scan on foo  (cost=3345.24..59020.38 rows=41112 width=16) (actual
time=25.150..181.804rows=33333 loops=3)
 
                     Recheck Cond: (owner_fk = 23)
                     Heap Blocks: exact=18014
                     ->  Bitmap Index Scan on idx_foo_on_owner_and_created_at  (cost=0.00..3320.57 rows=98668 width=0)
(actualtime=16.992..16.992 rows=100000 loops=1)
 
                           Index Cond: (owner_fk = 23)
 Planning Time: 0.384 ms
 Execution Time: 191.704 ms


I have a recursive CTE that implements the algorithm:
- Find first n+1 results
- If result at n+1’s created_at value differs from the n’s value, return first n values.
- If those equal, gather more results until a new created_at value is encountered.
- Sort all results by created_at and a tie-breaker (e.g., pk) and return the first n values.
But nobody wants to use/write that normally (it's quite complex).

This patch solves the problem presented; here's the plan:

 Limit  (cost=2.70..2.76 rows=50 width=16) (actual time=0.233..0.367 rows=50 loops=1)
   ->  Incremental Sort  (cost=2.70..111.72 rows=98668 width=16) (actual time=0.232..0.362 rows=50 loops=1)
         Sort Key: created_at DESC, pk DESC
         Presorted Key: created_at
         Sort Method: quicksort  Memory: 26kB
         Sort Groups: 2
         ->  Index Scan Backward using idx_foo_on_owner_and_created_at on foo  (cost=0.56..210640.79 rows=98668
width=16)(actual time=0.054..0.299 rows=65 loops=1)
 
               Index Cond: (owner_fk = 23)
 Planning Time: 0.428 ms
 Execution Time: 0.393 ms


While check world fails, the only failure appears to be a plan output change in
test/isolation/expected/drop-index-concurrently-1.outthat just needs to be updated (incremental sort is now used in
thisplan); I don't see any functionality breakage. 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: *_to_xml() should copy SPI_processed/SPI_tuptable
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] Incremental sort (was: PoC: Partial sort)