Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id CAApHDvpximY_AEdmDx2XOx2bgf3Qx6Vd4pV728vbSnxqWecHrA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (David Rowley <dgrowleyml@gmail.com>)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On Tue, Jan 14, 2014 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Florian Pflug <fgp@phlo.org> writes:
> I think it'd be worthwile to get this into 9.4, if that's still an option,
> even if we only support COUNT.

My thought is

(1) we can certainly implement inverse transitions for COUNT() and the
integer variants of SUM(), and that alone would be a killer feature for
some people.


100% Agreed about sum(int) and count, but I've so far managed only to find problems with SUM(numeric), other inverse transition functions seem just fine with numeric types. e.g. AVG(numeric) and STDDEV(numeric), I can't produce any results that differ from the unpatched head... but perhaps someone can give me a case where things could vary?
 
I've been testing with something like:
select avg(n) over w,SUM(3::float) over w
from (values(1,10::numeric),(2,0),(3,0)) v(i,n) 
window w as (order by i rows between current row and unbounded following);

where I included the sum(float) to force the old school brute force transitions to be performed. I then compared the results to the ones given without the sum(float) tagged on the end. I've so far not found anything out of place.

(2) the float and numeric variants should be implemented under nondefault
names (I'm thinking FAST_SUM(), but bikeshed away).  People who need
extra speed and don't mind the slightly different results can alter
their queries to use these variants.

I'm not as keen on this idea, as if someone later thought of a nice way to allow the inverse functions to provide exactly the same result as if they were not used then the would become redundant legacy that would likely have to be supported forever and a day, and I know we already have things like that, but of course we want to minimise that as much as possible.

If AVG(numeric) and the stddev numeric aggregates turn out to be valid then numeric_avg_accum_inv() must remain in core code to support that and this is the exact function that is required for a user to define their own SUM(numeric) with. We could leave it at that and document what a user must do to create their own FAST_SUM then perhaps leave the including these in core decision to later based maybe on the number of complaints about SUM(numeric) being slow when used in windows with a moving frame head.

(For me) It does not seem too unreasonable to think that one day we might decide that:

SELECT AVG(n) FROM (VALUES(10::NUMERIC(10,3)),(0),(0)) v(n);

return 3.333 instead of 3.3333333333333333 like it does now.

If we ever did that then we could support these numeric inverse transitions on SUM() without having to worry about weird extra trailing zeros. Instead we'd just give the user what they asked for, when they asked for it.
Reasons like that make me think that we might be jumping the gun a little on FAST_SUM() as it could end up redundant more quickly than we might initially think.


One reason I'm thinking this is that whatever we do to ameliorate
the semantic issues is going to slow down the forward transition
function --- to no benefit unless the aggregate is being used as a
window function in a moving window.  So I'm less than convinced
that we *should* implement any of these designs in the default
aggregates, even if we get to the point where we arguably *could*
do it with little risk of functional differences.


So far there's only 1 case of possible slowdown of forward transitions and that's with numeric types.
I had to change do_numeric_accum() to add a NaN Counter increment and also change the logic so that it continues summing non NaN values even after the first NaN was encountered.

Would you see this as too big a performance risk to include in core?

If not then I think we might be able to support AVG(numeric) and STDDEV(numeric) functions as the patch sits without having to worry that there might be an extra overhead on forward transitions that include NaN values.


Florian has done some really good work researching the standards around this area. His research seems to indicate that the results should be of the same scale as the inputs, which the current patch does not do, providing you assume that the user is showing us the intended scale based on the numbers that we're working with rather than a scale specified by the column or cast. Florian's idea of how to fix the scale on inverse transition seems pretty valid to me and I think the overhead of it could be made pretty minimal. I'm just not sure that I can implement it in such a way as to make the overhead small enough to look like background noise. But I'm very willing to give it a try and see...

*** some moments pass ***

I think unless anyone has some objections I'm going to remove the inverse transition for SUM(numeric) and modify the documents to tell the user how to build their own FAST_SUM(numeric) using the built in functions to do it. I'm starting to think that playing around with resetting numeric scale will turn a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there so far, minus sum(numeric), can't go in... 

If so then there's 36 aggregate functions ticked off my "create an inverse transition function for" list! I personally think that's a pretty good win. I'd rather do this than battle and miss deadlines for 9.4. I'd find that pretty annoying.

Regards

David Rowley

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Add CREATE support to event triggers