Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id 607A2B7E-5446-47F9-9383-C58712EDCED2@phlo.org
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (David Rowley <dgrowleyml@gmail.com>)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> My first thought was wow, this is a big patch!
>
> I think it should probably be broken up. It might be overly ambitious
> to try to get all of this committed during this commitfest, and in any
> case, I suspect that the committer would probably choose to commit it
> in stages. Perhaps something like:
>
> <snipped>
>
> Splitting it up this way now should help to focus on getting patch 1
> correct, without being distracted by all the other aggregates that may
> or may not usefully be made to have inverse transition functions. I
> think the value of the feature has been proved, and it is good to see
> that it can be applied to so many aggregates, but let's not try to do
> it all at once.

I've now split this up into

invtrans_base: Basic machinery plus tests with SQL-language aggregates
invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like
invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR()
invtrans_collecting: ARRAY_AGG(), STRING_AGG()
invtrans_docs: Documentation

Each of these corresponds to one of the invtrans_* branches in my
github repo at https://github.com/fgp/postgres

Attached are the latest versions of these patches.

> Regarding what would be in patch 1, I've only given it a cursory look,
> but I did immediately notice a problem with the nodeWindowAgg.c
> changes --- for aggregates with non-strict transition functions,
> something equivalent to the not null counting code is needed to make
> it work correctly with FILTER. For example, the following query gives
> the wrong results with this patch:
>
> SELECT array_agg(i) FILTER (WHERE i%3=0)
>       OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
>  FROM generate_series(1,10) g(i);
>
> Perhaps it should just always count the non-skipped values added after
> removal of filtered and null values. Then you might be able to
> simplify things by getting rid of ignore_nulls from
> WindowStatePerAggData and replacing notnullcount with a more general
> valueCount to track the number of values actually added to transValue.

Ugh, very true. Though I don't see how to get rid of ignore_nulls, unless
we rewrite the filter clause to

  (filter_clause) AND arg1 IS NOT NULL AND ... argN IS NOT NULL

for strict aggregates, which would probably be much slower and wouldn't
be right if any of the args call volatile functions.

But I've generalized notnullcount to transValueCount as you suggested,
and got rid of noTransValue, since that's now equivalent to
transValueCount == 0 && transValueIsNull.

string_agg() was also severely broken - it needs to track the number of
non-NULL strings within the state, since it skips rows with a NULL
string, but not with a NULL delimiter, which means we can't mark it
strict to make nodeWindowAgg.c take care of that.

> I haven't read through nodeWindowAgg.c in any detail yet (I think it
> will take me a while to fully get my head round that code) but I think
> it needs some more test cases to verify that the various corner cases
> are covered.

Yeah, I hadn't really gotten around to doing that yet :-(

The base patch now contains tests of the various strict and non-strict
cases, both in the filtered and non-filtered cases.

The invtrans_arith patch contains David's original tests, minus the one
with the logging inverse transition function, since that's tested by the
base patch already. It doesn't test sum(interval) though, I think (though
maybe I missed it, it's quite late already...). I think it maybe should -
not sure if there are any gotchas with SUM(interval) and inverse
transition functions.

The invtrans_collecting patch contains tests of the NULL behaviour of
string_agg() and array_agg() - I hope I've covered all the cases.

The invtrans_minmax patch doesn't contain any patches yet - David, could
you provide some for these functions, and also for bool_and and bool_or?
We don't need to test every datatype, but a few would be nice.

I've also run the regression tests of invtrans_arith and
invtrans_collecting against invtrans_base (i.e, without the actual
inverse transition functions) and they still succeed, meaning the
inverse transition functions don't change the results.

Doing the same against HEAD instead of invtrans_base yields a few
diffs, but all of them look OK - the stuff the explicitly tests
inverse transition functions via those "logging" aggregations that
simply log all calls to both the forward and inverse transition
function in a long string obviously yield different results if
inverse transition functions aren't used.

best regards,
Florian Pflug




Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: pg_get_viewdefs() indentation considered harmful
Следующее
От: Shigeru Hanada
Дата:
Сообщение: Re: [Review] inherit support for foreign tables