Re: Abbreviated keys for Numeric

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Abbreviated keys for Numeric
Дата
Msg-id 54E8AE8D.6030006@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Abbreviated keys for Numeric  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Список pgsql-hackers
Hi Gavin,

On 21.2.2015 06:35, Gavin Flower wrote:
> On 21/02/15 18:18, Tomas Vondra wrote:
>>
>> OK, so I've repeated the benchmarks with both patches applied, and I
>> think the results are interesting. I extended the benchmark a bit - see
>> the SQL script attached.
>>
>>    1) multiple queries
>>
>>       select percentile_disc(0) within group (order by val) from stuff
>>
>>       select count(distinct val) from stuff
>>
>>       select * from
>>         (select * from stuff order by val offset 100000000000) foo
>>
>>    2) multiple data types - int, float, text and numeric
>>
>>    3) multiple scales - 1M, 2M, 3M, 4M and 5M rows
>>
>> Each query was executed 10x, the timings were averaged. I do know some
>> of the data types don't benefit from the patches, but I included them to
>> get a sense of how noisy the results are.
>>
>> I did the measurements for
>>
>>    1) master
>>    2) master + datum_sort_abbrev.patch
>>    3) master + datum_sort_abbrev.patch + numeric_sortsup.patch
>>
>> and then computed the speedup for each type/scale combination (the
>> impact on all the queries is almost exactly the same).
>>
>> Complete results are available here: http://bit.ly/1EA4mR9
>>
>> I'll post all the summary here, although some of the numbers are about
>> the other abbreviated keys patch.
>>
>>
>> 1) datum_sort_abbrev.patch vs. master
>>
>>      scale      float      int    numeric     text
>>      ---------------------------------------------
>>      1          101%       99%       105%     404%
>>      2          101%       98%        96%      98%
>>      3          101%      101%        99%      97%
>>      4          100%      101%        98%      95%
>>      5           99%       98%        93%      95%
>>
>> 2) numeric_sortsup.patch vs. master
>>
>>      scale     float       int    numeric     text
>>      ---------------------------------------------
>>      1           97%       98%       374%     396%
>>      2          100%      101%       407%      96%
>>      3           99%      102%       407%      95%
>>      4           99%      101%       423%      92%
>>      5           95%       99%       411%      92%
>>
>>
>> I think the gains are pretty awesome - I mean, 400% speedup for Numeric
>> accross the board? Yes please!
>>
>> The gains for text are also very nice, although in this case that only
>> happens for the smallest scale (1M rows), and for larger scales it's
>> actually slower than current master :-(
>>
>> It's not just rainbows and unicorns, though. With both patches applied,
>> text sorts get even slower (up to ~8% slower than master), It also seems
>> to impact float (which gets ~5% slower, for some reason), but I don't
>> see how that could happen ... but I suspect this might be noise.
>>
>> I'll repeat the tests on another machine after the weekend, and post an
>> update whether the results are the same or significantly different.
>>
>> regards
>>
>
> What are the standard deviations?

I have checked that the results are consistent before sending the
results to the list, but I didn't want tu dump a huge pile of
stddev/min/max/... numbers into that e-mail. So I just mentioned that
the results are available in a spreadsheet (http://bit.ly/1EA4mR9).

See for example the "details" sheet with all the numbers aggregated.
That's where the numbers for the answers below come from. I'll provide
references to the columns.

For all three cases (master, datum, datum+numeric) STDDEV numbers are
~1-2% of the average. See columns T-V in the table.

> 
> Do the arithmetic means change much if you exclude the 2 fastest & 2
> slowest?

No. The change (compared to plain average) is ~1-2% of the value. See
columns W-Y in the table.

> 
> How do the arithmetic means compare to their respective medians?

I have not included medians into the current table (will do for the next
run), but all the other metrics seem quite consistent.

> Essentially, how consistent are the results, or how great is the
> noise? There may be better indicators than the ones I've suggested
> above.

I believe the results are very consistent (you may check the raw data in
the spreadsheet), but let me check after repeating this. I'll also run
the same test suite on another machine (I wonder how a different CPU
will perform).

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Bootstrap DATA is a pita
Следующее
От: Andres Freund
Дата:
Сообщение: Re: deparsing utility commands