Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id 099A79D6-5CBB-4BF2-A1D4-10F0D43933D3@phlo.org
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On Apr7, 2014, at 17:41 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> I've just finished reading through all the other patches, and they all
> look OK to me. It's mostly straightforward stuff, so despite the size
> it's hopefully all committable once the base patch goes in.

Hm, I'm starting to have second thoughts about the minmax patch. The
inverse transition functions for MIN and MAX have a non-trivial probability
of failure - they trigger a rescan whenever the value that is removed isn't
strictly smaller (respectively strictly larger) then the current maximum
(respectively minimum). Thus, whenever that happens, we both call the
inverse transition function *and* (since it fails) restart the aggregation.

For windows based on ROWS, this isn't too bad - even if we fail every second
time, we still avoid half the rescans, which should be a net win if the
average window size is > 2.

But for RANGE-based windows, more than one call of the inverse transition
function must succeed for us to save anything, since we must successfully
remove *all* peers to avoid one restart. This greatly increases the chance
that using the inverse transition function hurts rather then helps - the
situation is worse the larger the average number of peers is.

I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch
invtrans_bool - it previously was part of invtrans_minmax. Given the
performance risk involved, I think that we probably shouldn't commit
invtrans_minmax at this time. I should have brought this up earlier, but
the issue had slipped my mind :-( Sorry for that.

> I think that you're probably right that optimising
> window_gettupleslot() to eliminate the O(n^2) behaviour can be left to
> a later patch --- the existing performance benefits of this patch are
> enough to justify its inclusion IMO. It would be nice to include the
> trivial optimisation to window_gettupleslot() that I posted upthread,
> since it gives such a big improvement for only a few lines of code
> changed.

Agreed, but since it's independent from the rest of the base patch,
I kept it as a separate patch (invtrans_optimize) instead of merging it
into the base patch. It should probably be committed separately too.

> If you post a new patch set, I'll mark it as ready for committer attention.

New patch set is attached. The only difference to the previous one is that
"Forward Transitions" and "Inverse Transitions" are now scaled with nloops,
and that it includes your window_gettupleslot patch under the name
invtrans_optimize.

Your nested loop query
explain (verbose, analyse)
select * from
(values (10), (20), (30), (40)) v(x),
lateral
(select sum(i) over (rows between 4 preceding and current row)
from generate_series(1, x) i) t
now produces
                                                          QUERY PLAN
             

--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..170.06 rows=4000 width=12) (actual time=0.092..0.257 rows=100 loops=1)
Output: "*VALUES*".column1, (sum(i.i) OVER (?))
->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=4 width=4) (actual time=0.003..0.007 rows=4 loops=1)
  Output: "*VALUES*".column1
->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual time=0.027..0.055 rows=25 loops=4)
  Output: sum(i.i) OVER (?)
  Forward Transitions: 25.0
  Inverse Transitions: 20.0
  ->  Function Scan on pg_catalog.generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.013..0.015
rows=25loops=4) 
        Output: i.i
        Function Call: generate_series(1, "*VALUES*".column1)
Planning time: 0.359 ms
Total runtime: 0.544 ms

The patch dependencies are as follows:

invtrans_{optimize,docs) are independent from the rest

invtrans_{arith,bool,minmax,collecting} are pairwise independent and all
depend on invtrans_base.

As explain above, invtrans_bool is a bit problematic, since it carries
a real risk of performance regressions. It's included for completeness'
sake, and should probably not be committed at this time.

invtrans_optimize was authored by Dean Rasheed.

best regards,
Florian Pflug

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: B-Tree support function number 3 (strxfrm() optimization)
Следующее
От: Florian Pflug
Дата:
Сообщение: Re: [PATCH] Negative Transition Aggregate Functions (WIP)