Re: Optimizing numeric SUM() aggregate

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: Optimizing numeric SUM() aggregate
Дата
Msg-id 20160803174720.1314.6360.pgcf@coridan.postgresql.org
обсуждение исходный текст
Ответ на Optimizing numeric SUM() aggregate  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Optimizing numeric SUM() aggregate  (Andrey Borodin <borodin@octonica.com>)
Re: Optimizing numeric SUM() aggregate  (Andres Freund <andres@anarazel.de>)
Re: Optimizing numeric SUM() aggregate  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, failed
Spec compliant:           tested, failed
Documentation:            tested, failed

This is a review of a patch "Optimizing numeric SUM() aggregate" by Heikki Linnakangas

https://www.postgresql.org/message-id/flat/c0545351-a467-5b76-6d46-4840d1ea8aa4@iki.fi#c0545351-a467-5b76-6d46-4840d1ea8aa4@iki.fi
This review contains summarization of all my posts regarding this patch and a little bit of additional suggestions.

Contents & Purpose
==================
This patch improves performance of aggregates computation by delaying numeric overflow carring between NBASE-digit
arbitrary-lengtharithmetic. This is possible because 32-bit int is used for storage of every NBASE-digit, where NBASE
is10000.
 
Patch changes file numeric.c only. Folder of numeric.c does not contain README. That is why all documentation of a
patchis done in comments. I consider documentation sufficient.
 

Initial Run
===========
Patch can be applied cleanly to current HEAD. Regression tests path before and after patch application.

Performance
===========
I've tested patch with this query
CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM generate_series(1, 1000000) s;
SELECT avg(d) a , avg(d+0) s0 , avg(d+1) s1 , avg(d+2) s2, avg(d+3) s3 , avg(d+4) s4 , avg(d+5) s5, avg(d+6) s6 ,
avg(d+7)s7, avg(d+8) s8 , avg(d+9) s9 FROM avg_test;
 

Test specs were: Ubuntu 14 LTS VM, dynamic RAM, hypervisor Windows
Server 2016, SSD disk, core i5-2500. Configuration: out of the box master make.

On 10 test runs timing of select statement: AVG 3739.8 ms, STD  435.4193
With patch applied (as is) : 3017.8 ms, STD 319.893

I suppose this proves performance impact of a patch. I don't think that sum calculation was a common bottleneck, but
certainlypatch will slightly improve performance of a very huge number of queries.
 

Suggestions
===========
1. Currenlty overflow is carried every 9999 addition. I suggest that it is possibe to do it only every (INT32_MAX -
INT32_MAX/ NBASE) / (NBASE - 1) addition. Please note that with this overflow count it will be neccesary to check
whethertwo finishing carrings are required.
 
2. Aggregates and numeric regression tests do not containt any test case covering overflows. I recommend adding sum
withnumer 1 000 000 of 99 999 999 values. May be you will come up with more clever overflow testing.
 

Conclusion
==========
This patch is important and thorough work, but I'd suggest to consider some polishing on two points listed above.

The new status of this patch is: Waiting on Author

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joshua Bay
Дата:
Сообщение: Re: Way to access LSN (for each transaction) by directly talking to postgres?
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: New version numbering practices