Re: Merging large volumes of data

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Merging large volumes of data
Дата
Msg-id 6357.1178550568@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Merging large volumes of data  (Andreas Kostyrka <andreas@kostyrka.org>)
Список pgsql-performance
Andreas Kostyrka <andreas@kostyrka.org> writes:
> Ambrus Wagner (IJ/ETH) wrote:
>> Is there a way to prevent PostgreSQL from doing a full sort on the result set after the unions have been completed?
Evenif I write 
>>
>> (select a,b,c,d,e from table1 order by a,b) union all
>> (select a,b,c,d,e from table2 order by a,b)  union all
>> etc...
>> (select a,b,c,d,e from tablen order by a,b)  order by a,b;
>>
>> PostgreSQL does not seem to realise (maybe it should not be able to do this trick anyway) that the last "order by"
clauseis merely a final merge step on the ordered data sets. 

At least to a first-order approximation, teaching it this would be a
waste of time.

Assume for simplicity that each of your K sub-selects yields N tuples.
Then there are KN items altogether, so if we just sort the big data set
it takes O(KN*log(KN)) time ... which is the same as O(KN*(log K + log N)).
OTOH, if we sort each sub-select by itself, that takes O(N*log N) time,
or O(KN*log N) for all K sub-sorts.  Now we've got to do a K-way merge,
which will take O(log K) comparisons for each of the KN tuples, ie,
O(KN*log K)).  Net result: exactly the same runtime.

Of course this argument fails if you have some way of obtaining the
sub-select values pre-sorted for free.  But it's never really free.
Historical experience is that full-table indexscans often underperform
explicit sorts, at least when there are enough tuples involved to
make the problem interesting.

So the bottom line is that the use-case for this optimization seems
far too narrow to justify the implementation effort.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Gregory Stark
Дата:
Сообщение: Re: Merging large volumes of data
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: How to Find Cause of Long Vacuum Times - NOOB Question