Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

Поиск
Список
Период
Сортировка
От Nicholas White
Тема Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
Дата
Msg-id CA+=vxNYsqwLKUHBmmSEGEkQP3a2LtFfAKq32xNqCif0Mhe7WtA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls  (Jeff Davis <pgsql@j-davis.com>)
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
I've attached a revised version that fixes the issues above:

> changing a reference of the form:
>   OVER w
> into:
>   OVER (w)

Fixed (and I've updated the tests).

> It's bad form to modify a list while iterating through it.

Fixed

> We shouldn't create an arbitrary number of duplicate windows

Fixed

> Is there a problem with having two windowdefs in
> the p_windowdefs list with the same name
> ...
> You'll have to be a little careful that any other code knows that names
> can be duplicated in the list though.

I'm not sure I really can verify this - as I'm not sure how much
contrib / other third-party code has access to this data structure.
I'd prefer to be cautious and just create a child window if needed.

> I think we should get rid of the bitmapset entirely
> ...
> Instead of the bitmapset, we can keep track of two offsets

I've modified leadlag_common so it uses your suggested algorithm for
constant offsets (although it turns out you only need to keep a single
int64 index in the context). This algorithm calls
WinGetFuncArgInPartition at least twice per row, once to check whether
the current row is null (and so check if we have to move the leading /
lagged index forward) and either once to get leading / lagging value
or more than once to push the leading / lagged value forwards to the
next non-null value.
I've kept the bitmap solution for the non-constant offset case (i.e.
the random partition access case) as I believe it changes the cost of
calculating the lead / lagged values for every row in the partition to
O(partition size) - whereas a non-caching scan-the-partition solution
would be O(partition size * partition size). Is that OK?

Thanks -

Nick

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sawada Masahiko
Дата:
Сообщение: Re: Patch for fail-back without fresh backup
Следующее
От: Greg Stark
Дата:
Сообщение: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]