Re: PoC: Partial sort

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: PoC: Partial sort
Дата
Msg-id CAPpHfdvHBXX5CVsbt8a8ao3RomHe+WZ3Ph06_gHSr93pQo1aXw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PoC: Partial sort  (Andreas Karlsson <andreas@proxel.se>)
Ответы Re: PoC: Partial sort
Список pgsql-hackers
On Tue, Dec 24, 2013 at 6:02 AM, Andreas Karlsson <andreas@proxel.se> wrote:
On 12/22/2013 04:38 PM, Alexander Korotkov wrote:
postgres=# explain analyze select * from test order by v1, id limit 10;
                                                                   QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=11441.77..11442.18 rows=10 width=12) (actual
time=79.980..79.982 rows=10 loops=1)
    ->  Partial sort  (cost=11441.77..53140.44 rows=1000000 width=12)
(actual time=79.978..79.978 rows=10 loops=1)
          Sort Key: v1, id
          Presorted Key: v1
          Sort Method: top-N heapsort  Memory: 25kB
          ->  Index Scan using test_v1_idx on test  (cost=0.42..47038.83
rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
  Total runtime: 81.786 ms
(7 rows)

Have you thought about how do you plan to print which sort method and how much memory was used? Several different sort methods may have been use in the query. Should the largest amount of memory/disk be printed?

Apparently, now amount of memory for sorted last group is printed. Your proposal makes sense: largest amount of memory/disk should be printed.
 
However, work with joins needs more improvements.

That would be really nice to have, but the patch seems useful even without the improvements to joins.

Attached revision of patch implements partial sort usage in merge joins.

create table test1 as (
select id, 
(random()*100)::int as v1, 
(random()*10000)::int as v2
from generate_series(1,1000000) id);

create table test2 as (
select id, 
(random()*100)::int as v1, 
(random()*10000)::int as v2
from generate_series(1,1000000) id);
create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);

create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);

# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1 and t1.v2 = t2.v2;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=2257.67..255273.39 rows=983360 width=24)
   Merge Cond: ((t1.v1 = t2.v1) AND (t1.v2 = t2.v2))
   ->  Partial sort  (cost=1128.84..116470.79 rows=1000000 width=12)
         Sort Key: t1.v1, t1.v2
         Presorted Key: t1.v1
         ->  Index Scan using test1_v1_idx on test1 t1  (cost=0.42..47604.01 rows=1000000 width=12)
   ->  Materialize  (cost=1128.83..118969.00 rows=1000000 width=12)
         ->  Partial sort  (cost=1128.83..116469.00 rows=1000000 width=12)
               Sort Key: t2.v1, t2.v2
               Presorted Key: t2.v1
               ->  Index Scan using test2_v1_idx on test2 t2  (cost=0.42..47602.22 rows=1000000 width=12)

I believe now patch covers desired functionality. I'm going to focus on nailing down details, refactoring and documenting.

------
With best regards,
Alexander Korotkov.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "MauMau"
Дата:
Сообщение: Re: [bug fix] connection service file doesn't take effect with ECPG apps
Следующее
От: David Rowley
Дата:
Сообщение: Re: PoC: Partial sort