Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id E5C7E416-2AC8-405B-B9B2-E8731CF742B0@phlo.org
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
On Jan16, 2014, at 09:07 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug <fgp@phlo.org> wrote:
>> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be
constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL
inputshowever they please, no? 
>
> The reason I had to track the notnullcount was because for non-strict was because:
>
> select sum(n) over (order by i rows between current row and unbounded following) from (values(1,1),(2,NULL)) v(i,n);
>
> would otherwise return
> 1
> 0
>
> sum is not strict, so I guess we need to track notnullcount for non-strict functions.
> See the code around if (peraggstate->notnullcount == 0) in retreat_windowaggregate().

Yeah, I figured that was the reason, but you can't fix it that way. See
http://www.postgresql.org/message-id/8E857D95-CBA4-4974-A238-9DD7F61BEA48@phlo.orgfor a detailed explanation why. My
2.2patch allows pairs of non-strict forward and strict inverse transfer functions exactly because of this - i.e.,
basicallyit decrees that if there's an inverse transfer function, the strict setting of the *inverse* function
determinesthe aggregates NULL behaviour. The forward transfer function is then never called 
for NULL inputs, but it *is* called with the NULL state for the first non-NULL input, and *must* then return a non-NULL
state(hence it's technically not strict, it's strict only in the inputs, not in the state). 

BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did that in 2.2. That doesn't matter for the
built-inaggregates since these aren't created with CREATE AGGREGATE, but it's obviously wrong. I'll post a fixed
patched.

>> The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before the
codethat pulls aggregatedbase up to frameheadpos using the inverse aggregation function? 

> I had a look at this and even tried moving the code to before the inverse transitions, but it looks like that would
onlywork if I added more tests to the if condition to ensure that we're not about to perform inverse transitions. To me
itjust seemed more bullet proof the way it is rather than duplicating logic and having to ensure that it stays
duplicated.But saying that I don't think I've fully got my head around why the original code is valid in the first
place.I would have imagined that it should contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that
soundslike complete rubbish then I'll put it down to my head still being fried from work. 

Ok, I get this now. That code really just is asking "is the previous row's frame the same as the current row's frame"
inthat "if" where you added movedagg. What confused me was the rather weird way that condition is expressed, which
turnsout is due to the fact that at the point of the if, we don't know the new frame's end. Now, we could use
update_frametailpos()to find that, but that's potentially costly, especially if the tuple store was spilled to disk. So
instead,the code relies on the fact that only if the frame end is "n FOLLOWING/PRECEDING" can the current row lie
withinthe previous row's frame without the two frame's ends being necessarily the same. 

For added confusion, that "if" never explicitly checks whether the frame's *heads* coincide - the previous code seems
tohave relied on the impossibility of having "aggregatedbase <= current < aggregatedupto" after re-initializing the
aggregation,because then aggregatedbase = aggregatedupto = 0. That's why you can't just move the "if" before the
"frameheadpos== aggregatedbase" check. But you can *if* you also check whether "aggregatedbase == frameheadpos" in the
if- which is clearer than relying on that rather subtle  assumption anyway. 

BTW, the your patch will also fail badly if the frame head ever moves backwards - the invariant that "aggregatedbase ==
frameheadpos"after the inverse transition loop will then be violated. Now, this should never happen, because we require
thatthe offset in "n PRECEDING/FOLLOWING" is constant, but we should probably still check for this and elog(). 

That check was implicit in old code, because it advanced the tuplestore mark, so if the frame head moved backwards,
re-scanningthe frame would have failed. That brings me to another todo - as it stands, that mark gets never advanced if
we'redoing inverse aggregation. To fix that, we need a call to WinSetMarkPosition() somewhere in
eval_windowaggregates().

After doing this things, eval_windowaggregates ended up calling initalize_windowframeaggregates at a single place
again,so I inlined it back into eval_windowaggregates().  I also made it use temp_slot_1 in the inverse aggregation
loop,since that seemed safe - the code assumes some invariants about aggregatedbase and agg_row_slow. 

Updated patch is attached (2.4_fgp).

>> Also, could we easily check whether the frames corresponding to the individual rows are all either identical or
disjoint,and don't use the inverse transfer function then? Currently, for a frame which contains either just the
currentrow, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old
frame,and then add back the new frame. 
>
> I didn't know there was a situation where this could happen. Could you give me an example query and I'll run it
throughthe debugger to have a look at what's going on. But sure, if this is possible and I understand what you mean
thenI guess it would be a good optimisation to detect this and throw away the previous results and start fresh. 

The case I had in mind there was "RANGE BETWEEN CURRENT ROW AND CURRENT ROW", i.e. each row's frame consists of all
it'sordering peers. Haven't looked into this yet. 

best regards,
Florian Pflug



Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Patch for fail-back without fresh backup
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: WAL Rate Limiting