Re: How to enumerate partitions from a window function?

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: How to enumerate partitions from a window function?
Дата
Msg-id CAF-3MvPAo6UoKr25kJMi6OF3Czf-UC8kLwhY7gtJCFbjpuzD2w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: How to enumerate partitions from a window function?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: How to enumerate partitions from a window function?  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-general

On Thu, 3 Sep 2020 at 16:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alban Hertroys <haramrae@gmail.com> writes:
> As stated above, I want to enumerate the runs, starting at 1 and
> incrementing by 1 every time a partition from the 'run' window closes,
> Is there a way to achieve this through window functions, or do we need to
> wrap the thing in a subquery to achieve this?

I think this'll work:

select datetime, property_A, property_B
, first_value(datetime)::time over run as swap_time
, dense_rank() over (order by property_A, property_B)
, value
from process_data
window run as (partition by property_A, property_B order by datetime)
;

You can't do it with a window function over the "run" window because
no window function ever looks outside the current partition.  But
that's easy to fix by using a different window definition.  The
planner is smart enough to see that these windows are compatible
and only need one sort to be performed.

                        regards, tom lane

Thanks Tom,

That gets us close, but it ignores the order of the runs over time. I think it also reassigns the same number to later runs at the same 'day' that happen to have the same values for property_A and _B. That's some crucial information that I forgot to include.

To expand on my original example:

datetime | property_A | property_B | swap_time | run_nr | value
================================================
2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23
2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22
2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34
2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45
2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43
2020-09-03 15:11 | tea | earl grey | 15:11 | 4 | 0.23
etc.

Where the last row has the same characteristic properties as the first 2 rows (from run 1), but is in run 4 due to it having started after run 3.

The runs normally start at 1 hour before midnight, with run 1, and continue 24h from there (it's a shifted day-schedule). The above example starting at 15:06 is unlikely to occur in reality, although possible (with long downtime). That's mostly to clarify how the run numbers should function, it would require to at least partition run_nr by a date shifted 1 hour back, as long as they number their runs correctly along the time axis.

Regards,
Alban.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

В списке pgsql-general по дате отправления:

Предыдущее
От: Jehan-Guillaume de Rorthais
Дата:
Сообщение: Re: Tuchanka
Следующее
От: Lawrence Layhee
Дата:
Сообщение: Dependency problem using community repo on Redhat 7