Re: Re: Abbreviated keys for Datum tuplesort

Поиск
Список
Период
Сортировка
От Andrew Gierth
Тема Re: Re: Abbreviated keys for Datum tuplesort
Дата
Msg-id 87ppa3gx3p.fsf@news-spur.riddles.org.uk
обсуждение исходный текст
Ответ на Re: Abbreviated keys for Datum tuplesort (was: Re: B-Tree support function number 3 (strxfrm() optimization))  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Re: Abbreviated keys for Datum tuplesort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Re: Abbreviated keys for Datum tuplesort  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
>> Here's the cleaned-up version of the patch to allow abbreviated keys>> when sorting a single Datum. This also
removescomments that suggest>> that the caller of tuplesort_begin_datum should ever have to care>> about the
abbreviatedkey optimization.>> >> I'll add this to the CF.
 
Robert> I think this is a good idea.  Do you have a test case thatRobert> shows the benefit?

The best test case for datum sort performance is to use percentile_disc,
since that has almost no overhead beyond performing the actual sort.
(Unlike, say, count(distinct) or mode(), both of which have to do an
additional comparison pass over the sorted data; but count(distinct) is
probably the most common use of the datum sort in the wild, so it's
useful to try that too.)

So given some suitable test data, such as

create table stuff as select random()::text as randtext from generate_series(1,1000000);  -- or however many rows

you can do

select percentile_disc(0) within group (order by randtext) from stuff;

or

select count(distinct randtext) from stuff;

The performance improvements I saw were pretty much exactly as expected
from the improvement in the ORDER BY and CREATE INDEX cases.

The best test case for checking the correct order of results is to use
array_agg(x order by x), for example as follows:

select u, u <= lag(u) over () from (select unnest(a) as u         from (select array_agg(randtext order by randtext)
            from stuff) s1) s2;
 

(note that array_agg(x order by x) uses the datum sort, but
array_agg(x order by y) uses the ordinary heap tuple sort)

-- 
Andrew (irc:RhodiumToad)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: row_to_array function
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: New CF app deployment