Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id CAApHDvouryJ8C2im48Zr7dqoBbvvkg6u+6OmxWHmn++6KL1EZA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Ответы Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jan21, 2014, at 10:53 , David Rowley <dgrowleyml@gmail.com> wrote:
> It came to me that it might be possible to implement inverse transitions
> for floating point aggregates by just detecting if precision has been
> lost during forward transitions.
>
> I've written the test to do this as:
>
> IF state.value + value = state.value AND value <> 0 THEN
>               newstate.precision_lost := true;
>               newstate.value := state.value;
>       ELSE
>               newstate.precision_lost := false;
>               newstate.value := state.value + value;
>       END IF;
>
>
> The inverse transition function checks the precision_lost and if it's true it
> returns NULL. The core code is now implemented (thanks to Florian) to
> re-aggregate when NULL is returned from the inverse transition function.

That's not sufficient, I fear. You can lose all significant digits of the value
and still have precision_lost = false afterwards. Try summing over 1e16, 1.01.
"SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet
"SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where "1.01"
would have been correct. That's still too much precision loss.

I'm quite certain that the general idea has merit, but the actual
invertibility condition is going to be more involved. If you want to play with
this, I think the first step has to be to find a set of guarantees that
SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
summands all have the same sign, the error is bound by C * N, where C is some
(machine-specific?) constant (about 1e-15 or so), and N is the number of input
rows. Or at least so I think from looking at SUMs over floats in descending
order, which I guess is the worst case. You could, for example, try to see if
you can find a invertibility conditions which guarantees the same, but allows
C to be larger. That would put a bound on the number of digits lost by the new
SUM(float) compared to the old one.


I guess if the case is that normal (non-window) sum(float) results are undefined unless you add an order by to the aggregate then I guess there is no possible logic to put in for inverse transitions that will make them behave the same as an undefined behaviour.
 
I don't have high hopes for this getting int 9.4, though.


Yeah I'm going to drop it. I need to focus on documents and performance testing.
 
> If it seems sound enough, then I may implement it in C to see how much
> overhead it adds to forward aggregation for floating point types, but even
> if it did add too much overhead to forward aggregation it might be worth
> allowing aggregates to have 2 forward transition functions and if the 2nd
> one exists then it could be used in windowing functions where the frame
> does not have "unbounded following".

I don't think adding yet another type of aggregation function is the
solution here.


Why not?
 
I thought, if in the cases where we need to burden the forward transition functions with extra code to make inverse transitions possible, then why not have an extra transition function so that it does not slow down normal aggregation?

My testing of sum(numeric) last night does not show any slow down by that extra dscale tracking code that was added, but if it did then I'd be thinking seriously about 2 forward transition functions to get around the problem. I don't understand what would be wrong with that. The core code could just make use of the 2nd function if it existed and inverse transitions were thought to be required. i.e in a window context and does not have UNBOUNDED PRECEDING.
 
Regards

David Rowley

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: Hard limit on WAL space used (because PANIC sucks)