Обсуждение: an difficult SQL

Поиск
Список
Период
Сортировка

an difficult SQL

От
Rafal Pietrak
Дата:
Hi Everybody,

I was wondering if anybody here could help me cook up a query:

1. against a list of events (like an activity log in the database). The 
list is a single table: create table events (tm timestamp, user int, 
description text).

2. of which the output would be sorted in such a way, that:
2.1 most recent event would "select" most recent events of that same 
user, and displayed in a group (of say 10) of them (in "tm" order).

2.2 going through the events back in time, first event of ANOTHER user 
selects next group, where (say 10) most recent events of that OTHER user 
is presented.

2.3 next most recent event of yet another user selects yet another group 
to display and this selection process goes on, up to a maximum of (say 
20) users/groups-of-their-events.

2.4 after that, all other events are selected in tm order.

This is to present most recent telephone activities grouped by most 
recent subscribers so that the dashboard doesn't get cluttered with 
information but allows for an overview of other activity of most recent 
users.

I tend to think, that it's a problem for a window function ... but I've 
stumbled on the problem how to limit the window "frame" to just a few 
(say 10) events within the "window" and have all the rest returned as 
"tail" of the query.

BTW: the eventlog table is big. (and partitioned).

Any help appreciated.

-R



Re: an difficult SQL

От
Thiemo Kellner
Дата:
Hi Rafal

You first could select the three users with the most recent entries with 
a windowing function 
(https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) 
putting it into a with query 
(https://www.postgresql.org/docs/15/sql-select.html), in following with 
queries I would select 2.1 to 2.3 with each a constant column with each 
a different value you later sort by. In a next with query you can select 
all the rest (except all 
https://www.postgresql.org/docs/15/sql-select.html#SQL-EXCEPT) the 
results of 2.1 to 2.3 for 2.4 also with the notorious sort column. In a 
last with query you can put together the partial results for 2.1 to 2.4 
with a union all 
(https://www.postgresql.org/docs/15/sql-select.html#SQL-UNION) and 
selecting sort by the sort column and the timestamp in the final select.

I do not know your background, however, sql is about data sets end it is 
not always easy to get ones head around thinking in sets. I hope you 
could follow my suggestions. It might not be the most efficient way but 
should work.

Kind regards

Thiemo


Am 05.11.22 um 16:10 schrieb Rafal Pietrak:
> Hi Everybody,
>
> I was wondering if anybody here could help me cook up a query:
>
> 1. against a list of events (like an activity log in the database). 
> The list is a single table: create table events (tm timestamp, user 
> int, description text).
>
> 2. of which the output would be sorted in such a way, that:
> 2.1 most recent event would "select" most recent events of that same 
> user, and displayed in a group (of say 10) of them (in "tm" order).
>
> 2.2 going through the events back in time, first event of ANOTHER user 
> selects next group, where (say 10) most recent events of that OTHER 
> user is presented.
>
> 2.3 next most recent event of yet another user selects yet another 
> group to display and this selection process goes on, up to a maximum 
> of (say 20) users/groups-of-their-events.
>
> 2.4 after that, all other events are selected in tm order.
>
> This is to present most recent telephone activities grouped by most 
> recent subscribers so that the dashboard doesn't get cluttered with 
> information but allows for an overview of other activity of most 
> recent users.
>
> I tend to think, that it's a problem for a window function ... but 
> I've stumbled on the problem how to limit the window "frame" to just a 
> few (say 10) events within the "window" and have all the rest returned 
> as "tail" of the query.
>
> BTW: the eventlog table is big. (and partitioned).
>
> Any help appreciated.
>
> -R
>
>
-- 
Signal (Safer than WhatsApp): +49 1578 7723737
Threema (Safer than WhatsApp): A76MKH3J
Handy: +49 1578 772 37 37




Re: an difficult SQL

От
Rafal Pietrak
Дата:
Hi Thiemo,

Thank you for suggestions.

W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze:
> Hi Rafal
> 
> You first could select the three users with the most recent entries with 
> a windowing function 
> (https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) 

surely I'm missing something crucial here:
select row_number() over w,* from eventlog where row_number() over w < 5 
  window w as (partition by user);
ERROR:  window functions are not allowed in WHERE

So I'm unable to pick a limited number of rows within the user 
"group-window" ranges.

Without that, I cannot proceed.

Any suggestions?

-R
PS: regarding "my sets background", yes I do "think in sets" ... as 
opposed to thinking "in functions" (like iterating procedures). I do 
prefer solutions based on set definitions.

putting it into a with query 
(https://www.postgresql.org/docs/15/sql-select.html), in following with 
queries I would select 2.1 to 2.3 with each a constant column with each 
a different value you later sort by. In a next with query you can select 
all the rest (except all 
https://www.postgresql.org/docs/15/sql-select.html#SQL-EXCEPT) the 
results of 2.1 to 2.3 for 2.4 also with the notorious sort column. In a 
last with query you can put together the partial results for 2.1 to 2.4 
with a union all 
(https://www.postgresql.org/docs/15/sql-select.html#SQL-UNION) and 
selecting sort by the sort column and the timestamp in the final select.
> 
> I do not know your background, however, sql is about data sets end it is 
> not always easy to get ones head around thinking in sets. I hope you 
> could follow my suggestions. It might not be the most efficient way but 
> should work.
> 
> Kind regards
> 
> Thiemo
> 
> 
> Am 05.11.22 um 16:10 schrieb Rafal Pietrak:
>> Hi Everybody,
>>
>> I was wondering if anybody here could help me cook up a query:
>>
>> 1. against a list of events (like an activity log in the database). 
>> The list is a single table: create table events (tm timestamp, user 
>> int, description text).
>>
>> 2. of which the output would be sorted in such a way, that:
>> 2.1 most recent event would "select" most recent events of that same 
>> user, and displayed in a group (of say 10) of them (in "tm" order).
>>
>> 2.2 going through the events back in time, first event of ANOTHER user 
>> selects next group, where (say 10) most recent events of that OTHER 
>> user is presented.
>>
>> 2.3 next most recent event of yet another user selects yet another 
>> group to display and this selection process goes on, up to a maximum 
>> of (say 20) users/groups-of-their-events.
>>
>> 2.4 after that, all other events are selected in tm order.
>>
>> This is to present most recent telephone activities grouped by most 
>> recent subscribers so that the dashboard doesn't get cluttered with 
>> information but allows for an overview of other activity of most 
>> recent users.
>>
>> I tend to think, that it's a problem for a window function ... but 
>> I've stumbled on the problem how to limit the window "frame" to just a 
>> few (say 10) events within the "window" and have all the rest returned 
>> as "tail" of the query.
>>
>> BTW: the eventlog table is big. (and partitioned).
>>
>> Any help appreciated.
>>
>> -R
>>
>>



Re: an difficult SQL

От
Erik Wienhold
Дата:
> On 06/11/2022 13:48 CET Rafal Pietrak <rafal@ztk-rp.eu> wrote:
>
> W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze:
> >
> > You first could select the three users with the most recent entries with
> > a windowing function
> > (https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS)
>
> surely I'm missing something crucial here:
> select row_number() over w,* from eventlog where row_number() over w < 5
>   window w as (partition by user);
> ERROR:  window functions are not allowed in WHERE
>
> So I'm unable to pick a limited number of rows within the user
> "group-window" ranges.
>
> Without that, I cannot proceed.
>
> Any suggestions?

Windows functions are only permitted in SELECT and ORDER BY because they are
executed after WHERE, GROUP BY, and HAVING[1].

You need a derived table to filter on row_number:

    with
    ranked as (
        select *, row_number() over w
        from eventlog
        window w as (partition by user)
    )
    select *
    from ranked
    where row_number < 5;

[1] https://www.postgresql.org/docs/15/tutorial-window.html

--
Erik



Re: an difficult SQL

От
Rafal Pietrak
Дата:
Great, with a little tweaking (to get the remaining rows ordered 
correctly), this did the job.

Thank you Erik.

BR

-R

W dniu 6.11.2022 o 15:23, Erik Wienhold pisze:
>> On 06/11/2022 13:48 CET Rafal Pietrak <rafal@ztk-rp.eu> wrote:
>>
>> W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze:
>>>
>>> You first could select the three users with the most recent entries with
>>> a windowing function
>>> (https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS)
>>
>> surely I'm missing something crucial here:
>> select row_number() over w,* from eventlog where row_number() over w < 5
>>    window w as (partition by user);
>> ERROR:  window functions are not allowed in WHERE
>>
>> So I'm unable to pick a limited number of rows within the user
>> "group-window" ranges.
>>
>> Without that, I cannot proceed.
>>
>> Any suggestions?
> 
> Windows functions are only permitted in SELECT and ORDER BY because they are
> executed after WHERE, GROUP BY, and HAVING[1].
> 
> You need a derived table to filter on row_number:
> 
>      with
>      ranked as (
>          select *, row_number() over w
>          from eventlog
>          window w as (partition by user)
>      )
>      select *
>      from ranked
>      where row_number < 5;
> 
> [1] https://www.postgresql.org/docs/15/tutorial-window.html
> 
> --
> Erik
> 
>