Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id CAApHDvqgeouEk+qJETJiJyLo92fpqVFnCLq0HetDT4TCjMEPxw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
On Tue, Jan 14, 2014 at 2:00 PM, Florian Pflug <fgp@phlo.org> wrote:
On Jan10, 2014, at 22:27 , David Rowley <dgrowleyml@gmail.com> wrote:
> As the patch stands at the moment, I currently have a regression test
> which currently fails due to these extra zeros after the decimal point:
>
> -- This test currently fails due extra trailing 0 digits.
> SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
>   FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);
>
> Patched produces:
>  6.01
>  5.00
>  3.00
> Unpatched produces:
>  6.01
>  5
>  3

Hm, that's annoying. I checked the standard to see if it offers any guidance
here, but it doesn't look like it does - the standard just says that SUM() ought
to return a type with the same scale as the input type has. That's fine if
every NUMERIC type has a fixed scale, but that's not the case in postgres -
we allow values of unconstrained NUMERIC types (i.e., numeric types without an
explicitly specified precision or scale) to each define their own scale.


Thanks for digging that out in the standard and thanks for all that information you supplied here http://www.postgresql.org/message-id/0FA6C08E-2166-405B-83F7-63B196B88CA3@phlo.org too. Sorry I've not had the chance to reply yet. I was kind of hoping that the answer would be more in my favour to help with inverse transitions for sum(numeric).

 
To fix this, we'd have to track the maximum scale within the current frame.
That's easier than the general problem of providing an inverse transition
function for MAX, because AFAIK we limit the scale to at most 1000. So it'd
be sufficient to track the number of times we saw each scale, and also the
current maximum. Once we reduce the current maximum's count back to zero
in the inverse transition function, we'd scan from that value to the left to
find the next non-zero entry.


I've been thinking about this, but I had thought that the maximum dscale was bigger than 1000. The docs seem to claim 16383 here --> http://www.postgresql.org/docs/devel/static/datatype-numeric.html I'd go ahead and implement this if that number was smaller, but I'm thinking zeroing out an array of 16383 elements on first call to do_numeric_accum might be too big an overhead to write off as "background noise". If it was 20 or even 100 then I'd probably try for that. 

I think the overhead for each call after that would likely be ok as it would probably just be an operation like
state->scaleCount[X.dscale]++; which I would imagine would be a very small percentage overhead on normal aggregate functions. Of course the inverse would have to do the harder work of looping backwards over the array until it found an element with the count above 0 and setting that as the current maximum. I think this would be a winner if it wasn't for the high initial hit of zeroing that 16383 element array.. Or 1000 whichever.

 
We could also choose to ignore this, although I'm not sure I really like that.
It seems entirely OK at first sight - after all, the values all stay the same,
we just emit a different number of trailing zeros. But it still causes results
to be affected by values, even if only in the number of trailing zeros, which
lie outside the value's range. That seems like more non-determinism than as
database should show.

> With inverse transitions this query still produces correct results, it just does
> not produces the numeric in the same format as it does without performing inverse
> transitions. Personally I'd rather focus on trying to get SUM(numeric) in there
> for 9.4

I think it'd be worthwile to get this into 9.4, if that's still an option,
even if we only support COUNT.

best regards,
Florian Pflug


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeevan Chalke
Дата:
Сообщение: Re: [PATCH] Filter error log statements by sqlstate
Следующее
От: Jeevan Chalke
Дата:
Сообщение: Re: [BUGS] surprising to_timestamp behavior