Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id CAEZATCUK8une3-yAFa5D28K6ThJBVvVWsWXaaUgu3kvY-Om4TA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Ответы Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
On 1 April 2014 20:58, Florian Pflug <fgp@phlo.org> wrote:
> On Apr1, 2014, at 10:08 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 31 March 2014 01:58, Florian Pflug <fgp@phlo.org> wrote:
>>> Attached are updated patches that include the EXPLAIN changes mentioned
>>> above and updated docs.
>>>
>>
>> These patches need re-basing --- they no longer apply to HEAD.
>
> Rebased to head (554bb3beba27bf4a49edecc40f6c0f249974bc7c). I had to
> re-assign OIDs in the dependent paches again (those which actually
> add the various inverse transition functions).
>

Looking at the new explain output, that is not exactly what I was
suggesting upthread. In particular, in cases where the WindowAgg node
is executed more than once, I think that the reported transition
counts should be the averages per-execution of the node. That way the
number of transition function calls reported for the node is directly
comparable with its "rows" value. Thus I think the output should be
divided by nloops, which would be more consistent with the way other
similar values are reported in explain output (c.f.
show_instrumentation_count).

I started looking at the "arith" patch, and I got as far as looking at
the changes to count(*) and count(val), which seem reasonable. But
then I started testing performance, and I found cases where the
improvement is not nearly what I expected.

The example cited at the start of this thread is indeed orders of
magnitude faster than HEAD:

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM generate_series(1,20000) g(n);

This executes in 20ms on my box, vs 30sec on HEAD, which reflects the
change from an O(n^2) to an O(n) algorithm.

However, if both ends of the frame move, the benefits are far less impressive:

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 33ms

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 173ms

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 1467ms

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10000 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 7709ms

This is still exhibiting the behaviour of an O(n^2) algorithm.

The problem lies in window_gettupleslot() which steps one row at a
time from the last position to the new required position. So if both
ends of the frame are moving, it needs to step forwards and backwards
through the entire window, for each row processed - hence the O(n^2)
behaviour.

Looking at window_gettupleslot, there is an obvious potential
mirco-optimisation that can be made if there are multiple rows to be
skipped --- instead of calling tuplestore_gettupleslot() multiple
times, call tuplestore_advance() multiple times, then call
tuplestore_gettupleslot() once to fetch the final required tuple, thus
avoiding a lot of unnecessary tuple copying. That of course won't
eliminate the O(n^2) behaviour, but it will reduce the overall factor,
and so is probably worth doing.

However, to fix the fundamental O(n^2) problem, I think there needs to
be separate tuplestore read pointers for the head and tail of the
frame. There may also be a case for another read pointer for the
current row too, and possibly one for general purpose user-triggered
fetches. One approach might be to have up to a small number N (say 3
or 4) of read pointers, with window_gettupleslot() automatically
choosing the one nearest to the requested row. Possibly there are
better approaches. I think a little more investigation is needed.

I'm not sure how much additional work is required to sort this out,
but to me it looks more realistic to target 9.5 than 9.4, so at this
point I tend to think that the patch ought to be marked as returned
with feedback.

Thoughts?

Regards,
Dean



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: B-Tree support function number 3 (strxfrm() optimization)
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: WAL format and API changes (9.5)