Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id CAApHDvqkf9ZhtY7Bk9OdJdNRq3NG0iz6WVzwFsEitdRH5GS3Zw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Ответы Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug <fgp@phlo.org> wrote:
On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>> If you want to play with
>> this, I think the first step has to be to find a set of guarantees that
>> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
>> summands all have the same sign, the error is bound by C * N, where C is some
>> (machine-specific?) constant (about 1e-15 or so), and N is the number of input
>> rows. Or at least so I think from looking at SUMs over floats in descending
>> order, which I guess is the worst case. You could, for example, try to see if
>> you can find a invertibility conditions which guarantees the same, but allows
>> C to be larger. That would put a bound on the number of digits lost by the new
>> SUM(float) compared to the old one.
>
> I guess if the case is that normal (non-window) sum(float) results are undefined
> unless you add an order by to the aggregate then I guess there is no possible
> logic to put in for inverse transitions that will make them behave the same as
> an undefined behaviour.

Actually, if sum(float) was really undefined, it'd be *very* easy to provide an
inverse transition function - just make it a no-op. Heck, you could then even
make the forward transition function a no-op, since the very definition of
"undefined behaviour" is "result can be anything, including setting your house
on fire". The point is, it's *not* undefined - it's just imprecise. And the
imprecision can even be quantified, it just depends on the number of
input rows (the equal-sign requirement is mostly there to make "imprecision"
equivalent to "relative error").


My apologies, I meant to use the term nondeterministic rather than undefined. There's really not any need that I can see to turn things silly here.

My point was more that since sum(float) can give different results if it used an index scan rather than a seq scan, trying to get the inverse transition to match something that gives varying results sounds like a tricky task to take on.
 
>> > If it seems sound enough, then I may implement it in C to see how much
>> > overhead it adds to forward aggregation for floating point types, but even
>> > if it did add too much overhead to forward aggregation it might be worth
>> > allowing aggregates to have 2 forward transition functions and if the 2nd
>> > one exists then it could be used in windowing functions where the frame
>> > does not have "unbounded following".
>>
>> I don't think adding yet another type of aggregation function is the
>> solution here.
>
> Why not?
>
> I thought, if in the cases where we need to burden the forward transition
> functions with extra code to make inverse transitions possible, then why
> not have an extra transition function so that it does not slow down normal
> aggregation?
>
> My testing of sum(numeric) last night does not show any slow down by that
> extra dscale tracking code that was added, but if it did then I'd be thinking
> seriously about 2 forward transition functions to get around the problem.
> I don't understand what would be wrong with that. The core code could just
> make use of the 2nd function if it existed and inverse transitions were
> thought to be required. i.e in a window context and does not
> have UNBOUNDED PRECEDING.

First, every additional function increases the maintenance burden, and at
some point the expected benefit simply isn't worth it. IMHO at least.


There's little point in arguing for this as we've managed to narrow any forward transition over head to background noise so far, my point more was if there was no other way to maintain performance and have an inverse transition function that this may be a solution to that problem.
 
Secondly, you'd really need a second aggregate definition - usually, the
non-invertible aggregate will get away with a much smaller state representation.
Aggregates with state type "internal" could hack their way around that by
simply not using the additional parts of their state structure, but e.g.
plpgsql aggregates would still incur overhead due to repeated copying of
a unnecessary large state value. If it's possible to represent the
forward-only but not the invertible state state with a copy-by-value type,
that overhead could easily be a large part of the total overhead you paid
for having an inverse transition function.


I had imagined they keep the same state type and don't use any extra variables that are defined for the forward transition function that is invertible.
 
And lastly, we could achieve much of the benefit of a second transition
function by simply telling the forward transition function whether to
expect inverse transition function calls. We already expose things like
the aggregation memory context to C-language transition functions, so the
basic mechanism is already there. In fact, I pondered whether to do this -
but then figured I'd leave it to however needs that facility first. Though
it wouldn't be much code - basically, setting a flag in the WindowAggState,
and exporting a function to be used by aggregates to read it, similar
to what AggCheckCallContext does.


Sounds like a good idea, but what would the solution aggregate transition functions that are not written in C?
 
best regards,
Florian Pflug





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Recovery to backup point
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: extension_control_path