Re: PoC: Partial sort

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: PoC: Partial sort
Дата
Msg-id CAPpHfdsjKJUDJnwgMTTQek6pZfL_opwoE1DtpgukNVNXSJeOSg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PoC: Partial sort  (Marti Raudsepp <marti@juffo.org>)
Ответы Re: PoC: Partial sort  (Marti Raudsepp <marti@juffo.org>)
Re: PoC: Partial sort  (Marti Raudsepp <marti@juffo.org>)
Список pgsql-hackers
Hi!

On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp <marti@juffo.org> wrote:
First, thanks a lot for working on this feature. This PostgreSQL shortcoming crops up in all the time in web applications that implement paging by multiple sorted columns.
 
Thanks!

I've been trying it out in a few situations. I implemented a new enable_partialsort GUC to make it easier to turn on/off, this way it's a lot easier to test. The attached patch applies on top of partial-sort-5.patch

I though about such option. Generally not because of testing convenience, but because of overhead of planning. This way you implement it is quite naive :) For instance, merge join rely on partial sort which will be replaced with simple sort.
 
I will spend more time reviewing the patch, but some of this planner code is over my head. If there's any way I can help to make sure this lands in the next version, let me know.

----

The patch performs just as well as I would expect it to:

marti=# select ac.name, r.name from artist_credit ac join release r on (ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
Time: 9.830 ms
marti=# set enable_partialsort = off;
marti=# select ac.name, r.name from artist_credit ac join release r on (ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
Time: 1442.815 ms

A difference of almost 150x!

There's a missed opportunity in that the code doesn't consider pushing new Sort steps into subplans. For example, if there's no index on language(name) then this query cannot take advantage partial sorts:

marti=# explain select l.name, r.name from language l join release r on (l.id=r.language) order by l.name, r.name limit 1000;
 Limit  (cost=123203.20..123205.70 rows=1000 width=32)
   ->  Sort  (cost=123203.20..126154.27 rows=1180430 width=32)
         Sort Key: l.name, r.name
         ->  Hash Join  (cost=229.47..58481.49 rows=1180430 width=32)
               Hash Cond: (r.language = l.id)
               ->  Seq Scan on release r  (cost=0.00..31040.10 rows=1232610 width=26)
               ->  Hash  (cost=131.43..131.43 rows=7843 width=14)
                     ->  Seq Scan on language l  (cost=0.00..131.43 rows=7843 width=14)

But because there are only so few languages, it would be a lot faster to sort languages in advance and then do partial sort:
 Limit  (rows=1000 width=31)
   ->  Partial sort  (rows=1180881 width=31)
         Sort Key: l.name, r.name
         Presorted Key: l.name
         ->  Nested Loop  (rows=1180881 width=31)
               ->  Sort  (rows=7843 width=10)
                     Sort Key: name
                     ->  Seq Scan on language  (rows=7843 width=14)
               ->  Index Scan using release_language_idx on release r  (rows=11246 width=25)
                     Index Cond: (language = l.id)

Even an explicit sorted CTE cannot take advantage of partial sorts:
marti=# explain with sorted_lang as (select id, name from language order by name)
marti-# select l.name, r.name from sorted_lang l join release r on (l.id=r.language) order by l.name, r.name limit 1000;
 Limit  (cost=3324368.83..3324371.33 rows=1000 width=240)
   CTE sorted_lang
     ->  Sort  (cost=638.76..658.37 rows=7843 width=14)
           Sort Key: language.name
           ->  Seq Scan on language  (cost=0.00..131.43 rows=7843 width=14)
   ->  Sort  (cost=3323710.46..3439436.82 rows=46290543 width=240)
         Sort Key: l.name, r.name
         ->  Merge Join  (cost=664.62..785649.92 rows=46290543 width=240)
               Merge Cond: (r.language = l.id)
               ->  Index Scan using release_language_idx on release r  (cost=0.43..87546.06 rows=1232610 width=26)
               ->  Sort  (cost=664.19..683.80 rows=7843 width=222)
                     Sort Key: l.id
                     ->  CTE Scan on sorted_lang l  (cost=0.00..156.86 rows=7843 width=222)

But even with these limitations, this will easily be the killer feature of the next release, for me at least.

I see. But I don't think it can be achieved by small changes in planner. Moreover, I didn't check but I think if you remove ordering by r.name you will still not get sorting languages in the inner node. So, this problem is not directly related to partial sort.

------
With best regards,
Alexander Korotkov

 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance