Обсуждение: Enhancement request: enable FIRST/LAST_value() also as a regularaggregate (not only as windowing function)

Поиск
Список
Период
Сортировка
Hi All,

I'd like to throw here an enhancement proposal to discuss/consider.
The FIRST/LAST_value() functions offer powerful lookup capabilities, eg.
here

SELECT t.* ,FIRST_value(v1)OVER(PARTITION BY gid ORDER BY v2) fv ,LAST_value(v1)OVER(PARTITION BY gid ORDER BY v2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv FROM( VALUES (1, 'b', 3),(1, 'd', 1),(1, 'a', 2) ,(2, 'x', 7),(2, 'y', 9),(2, 'z', 8),(2, 'v', 9)) t (gid, v1, v2);




but, given those values are repeating - why can't I simply use this functions as regular aggregates?

Or can I? It doesn't seem to be possible while I find this use case actually more common than in windowing context…
Am I missing some workaround here?

Anyway, Oracle is an example where both contexts are possible and I think this could provide great reference as to what I have in mind.
2) Demonstrating those functions in both contexts there:

a) as Windowing Aggregate:

 WITH t(gid, v1, v2) AS(

SELECT 1, 'b', 3 FROM dual UNION ALL

SELECT 1, 'd', 1 FROM dual UNION ALL

SELECT 1, 'a', 2 FROM dual UNION ALL

SELECT 2, 'x', 7 FROM dual UNION ALL

SELECT 2, 'y', 9 FROM dual UNION ALL

SELECT 2, 'z', 8 FROM dual UNION ALL

SELECT 2, 'v', 9 FROM dual --UNION ALL

)

SELECT t.*

     ,MIN(v1)KEEP(DENSE_RANK FIRST ORDER BY v2) OVER(PARTITION BY gid) lkp_first

     ,MIN(v1)KEEP(DENSE_RANK LAST ORDER BY v2) OVER(PARTITION BY gid) lkp_lastMin

     ,MAX(v1)KEEP(DENSE_RANK LAST ORDER BY v2) OVER(PARTITION BY gid) lkp_lastMax

 FROM t;



b) as a regular aggregate (cf. GROUP BY and two rows only in the result)

 WITH t(gid, v1, v2) AS(

SELECT 1, 'b', 3 FROM dual UNION ALL

SELECT 1, 'd', 1 FROM dual UNION ALL

SELECT 1, 'a', 2 FROM dual UNION ALL

SELECT 2, 'x', 7 FROM dual UNION ALL

SELECT 2, 'y', 9 FROM dual UNION ALL

SELECT 2, 'z', 8 FROM dual UNION ALL

SELECT 2, 'v', 9 FROM dual --UNION ALL

)

SELECT t.gid

     ,MIN(v1)KEEP(DENSE_RANK FIRST ORDER BY v2)lkp_first

     ,MIN(v1)KEEP(DENSE_RANK LAST ORDER BY v2)lkp_lastMin

     ,MAX(v1)KEEP(DENSE_RANK LAST ORDER BY v2)lkp_lastMax

 FROM t

GROUP BY t.gid;


Any chances of implementing that?

Thank you,
Kind Regards 
~Maciek
matshyeq <matshyeq@gmail.com> writes:
> I'd like to throw here an enhancement proposal to discuss/consider.
> The FIRST/LAST_value()
> functions offer powerful lookup capabilities, ...
> but, given those values are repeating - why can't I simply use this
> functions as regular aggregates?
> Or can I? It doesn't seem to be possible while I find this use case
> actually more common than in windowing context…

The reason the SQL standard doesn't allow that usage, or similar ones
for other window functions, is that the function results are undefined
unless you specify a particular ordering of the inputs ... which is
exactly what the window syntax provides.

Yeah, in most cases there are other ways where you could get the same
result, but all of them require extra syntax too.  So you might as
well use window syntax and be standards-conforming.

> Anyway, Oracle is an example where both contexts are possible

Oracle is not exactly a shining example of either SQL spec compliance
or well-thought-out features ...

            regards, tom lane


I was hoping the proposal would be judged on its merits - primarily usefulness, not necessarily conforming to the standards
I would expect postgres overlap with the SQL standard (eg. SQL:2016) doesn't differ much from other big vendors and has lots of own deviations/extensions itself.
Not a fan of Oracle myself but have to give it to them the mentioned versatility is really nice.
In terms of performance it also carries a significant difference:
being forced to work on an unaggregated result and process it further (eg. with extra subselect filtering on row_number)
vs. simple GROUP BY

Yes it would require particular ordering of the input

Currently I can do workaround with something like:

SELECT gid
      ,RIGHT(MIN(lpad(v2::varchar,10,'0')||v1),-10) pseudo_lkp_first
      ,RIGHT(MAX(lpad(v2::varchar,10,'0')||v1),-10) pseudo_lkp_last
  FROM(
VALUES (1, 'b', 3),(1, 'd', 1),(1, 'a', 2)
      ,(2, 'x', 7),(2, 'y', 9),(2, 'z', 8),(2, 'v', 9)) t (gid, v1, v2)
  GROUP BY gid;

but it's neither universal nor readable.
It's fast though.

I was hoping those FIRST/LAST lookup functions could be made available as some new Ordered-Set Aggregate Functions too…

On Tue, 24 Jul 2018 at 23:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
matshyeq <matshyeq@gmail.com> writes:
> I'd like to throw here an enhancement proposal to discuss/consider.
> The FIRST/LAST_value()
> functions offer powerful lookup capabilities, ...
> but, given those values are repeating - why can't I simply use this
> functions as regular aggregates?
> Or can I? It doesn't seem to be possible while I find this use case
> actually more common than in windowing context…

The reason the SQL standard doesn't allow that usage, or similar ones
for other window functions, is that the function results are undefined
unless you specify a particular ordering of the inputs ... which is
exactly what the window syntax provides.

Yeah, in most cases there are other ways where you could get the same
result, but all of them require extra syntax too.  So you might as
well use window syntax and be standards-conforming.

> Anyway, Oracle is an example where both contexts are possible

Oracle is not exactly a shining example of either SQL spec compliance
or well-thought-out features ...

                        regards, tom lane
On Tue, Jul 24, 2018 at 4:16 PM, matshyeq <matshyeq@gmail.com> wrote:
I'd like to throw here an enhancement proposal to discuss/consider.
The FIRST/LAST_value() functions offer powerful lookup capabilities, eg.
here

SELECT t.* ,FIRST_value(v1)OVER(PARTITION BY gid ORDER BY v2) fv ,LAST_value(v1)OVER(PARTITION BY gid ORDER BY v2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv FROM( VALUES (1, 'b', 3),(1, 'd', 1),(1, 'a', 2) ,(2, 'x', 7),(2, 'y', 9),(2, 'z', 8),(2, 'v', 9)) t (gid, v1, v2);




but, given those values are repeating - why can't I simply use this functions as regular aggregates?

Or can I? It doesn't seem to be possible while I find this use case actually more common than in windowing context…
Am I missing some workaround here?

Why not just define a custom aggregate function that does whatever you need?  I don't think it would be too hard.  e.g. for something like LAST_VALUE() just make the transition type equal to the output type and save the last value you've seen thus far as the transition value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thanks Robert!
I thought this could be nice functionality available out of the box
but yea, sure, that would work for me.
Can I make custom aggregate function that accepts WITHIN GROUP syntax?
Which language would that need to be implemented in? Would you have examples (url?)

Thank you,
Kind Regards 
~Maciek
On Thu, 26 Jul 2018 at 16:22, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jul 24, 2018 at 4:16 PM, matshyeq <matshyeq@gmail.com> wrote:
I'd like to throw here an enhancement proposal to discuss/consider.
The FIRST/LAST_value() functions offer powerful lookup capabilities, eg.
here

SELECT t.* ,FIRST_value(v1)OVER(PARTITION BY gid ORDER BY v2) fv ,LAST_value(v1)OVER(PARTITION BY gid ORDER BY v2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv FROM( VALUES (1, 'b', 3),(1, 'd', 1),(1, 'a', 2) ,(2, 'x', 7),(2, 'y', 9),(2, 'z', 8),(2, 'v', 9)) t (gid, v1, v2);




but, given those values are repeating - why can't I simply use this functions as regular aggregates?

Or can I? It doesn't seem to be possible while I find this use case actually more common than in windowing context…
Am I missing some workaround here?

Why not just define a custom aggregate function that does whatever you need?  I don't think it would be too hard.  e.g. for something like LAST_VALUE() just make the transition type equal to the output type and save the last value you've seen thus far as the transition value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
matshyeq <matshyeq@gmail.com> writes:
> Can I make custom aggregate function that accepts WITHIN GROUP syntax?

It's possible, but it would have to be written in C, I think; see
src/backend/utils/adt/orderedsetaggs.c for prototype code.  You'll
find it a lot easier to write a plain aggregate function and control
its input ordering with "last(foo order by foo)".

            regards, tom lane


On 26/07/18 20:31, Tom Lane wrote:
> matshyeq <matshyeq@gmail.com> writes:
>> Can I make custom aggregate function that accepts WITHIN GROUP syntax?
> 
> It's possible, but it would have to be written in C, I think; see
> src/backend/utils/adt/orderedsetaggs.c for prototype code.  You'll
> find it a lot easier to write a plain aggregate function and control
> its input ordering with "last(foo order by foo)".

We have examples on the wiki.
https://wiki.postgresql.org/wiki/First/last_(aggregate)
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support