Re: Abbreviated keys for Numeric

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Abbreviated keys for Numeric
Дата
Msg-id 54E81519.308@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Abbreviated keys for Numeric  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Abbreviated keys for Numeric  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Re: Abbreviated keys for Numeric  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
Hi,

On 21.2.2015 02:06, Tomas Vondra wrote:
> On 21.2.2015 02:00, Andrew Gierth wrote:
>>>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>
>>  >> Right...so don't test a datum sort case, since that isn't supported
>>  >> at all in the master branch. Your test case is invalid for that
>>  >> reason.
>>
>>  Tomas> What do you mean by 'Datum sort case'?
>>
>> A case where the code path goes via tuplesort_begin_datum rather than
>> tuplesort_begin_heap.
>>
>>  Tomas> The test I was using is this:
>>
>>  Tomas>    select percentile_disc(0) within group (order by randnum) from stuff;
>>
>> Sorting single columns in aggregate calls uses the Datum sort path (in
>> fact I think it's currently the only place that does).
>>
>> Do that test with _both_ the Datum and Numeric sort patches in place,
>> and you will see the effect. With only the Numeric patch, the numeric
>> abbrev code is not called.
>
> D'oh! Thanks for the explanation.

OK, so I've repeated the benchmarks with both patches applied, and I
think the results are interesting. I extended the benchmark a bit - see
the SQL script attached.

  1) multiple queries

     select percentile_disc(0) within group (order by val) from stuff

     select count(distinct val) from stuff

     select * from
       (select * from stuff order by val offset 100000000000) foo

  2) multiple data types - int, float, text and numeric

  3) multiple scales - 1M, 2M, 3M, 4M and 5M rows

Each query was executed 10x, the timings were averaged. I do know some
of the data types don't benefit from the patches, but I included them to
get a sense of how noisy the results are.

I did the measurements for

  1) master
  2) master + datum_sort_abbrev.patch
  3) master + datum_sort_abbrev.patch + numeric_sortsup.patch

and then computed the speedup for each type/scale combination (the
impact on all the queries is almost exactly the same).

Complete results are available here: http://bit.ly/1EA4mR9

I'll post all the summary here, although some of the numbers are about
the other abbreviated keys patch.


1) datum_sort_abbrev.patch vs. master

    scale      float      int    numeric     text
    ---------------------------------------------
    1          101%       99%       105%     404%
    2          101%       98%        96%      98%
    3          101%      101%        99%      97%
    4          100%      101%        98%      95%
    5           99%       98%        93%      95%

2) numeric_sortsup.patch vs. master

    scale     float       int    numeric     text
    ---------------------------------------------
    1           97%       98%       374%     396%
    2          100%      101%       407%      96%
    3           99%      102%       407%      95%
    4           99%      101%       423%      92%
    5           95%       99%       411%      92%


I think the gains are pretty awesome - I mean, 400% speedup for Numeric
accross the board? Yes please!

The gains for text are also very nice, although in this case that only
happens for the smallest scale (1M rows), and for larger scales it's
actually slower than current master :-(

It's not just rainbows and unicorns, though. With both patches applied,
text sorts get even slower (up to ~8% slower than master), It also seems
to impact float (which gets ~5% slower, for some reason), but I don't
see how that could happen ... but I suspect this might be noise.

I'll repeat the tests on another machine after the weekend, and post an
update whether the results are the same or significantly different.

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: Bootstrap DATA is a pita
Следующее
От: Gavin Flower
Дата:
Сообщение: Re: Abbreviated keys for Numeric