Обсуждение: Subquery to select max(date) value

Поиск
Список
Период
Сортировка

Subquery to select max(date) value

От
Rich Shepard
Дата:
The query is to return the latest next_contact date for each person. Using
the max() aggregate function and modeling the example of lo_temp on page 13
of the rel. 10 manual I wrote this statement:

select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, A.next_contact)
from People as P, Organizations as O, Activities as A
where P.org_id = O.org_id and P.person_id = A.person_id and
       A.next_contact = select (max(A.next_contact) from A)
group by A.next_contact, P.person_id;

The syntax error returned by psql is:

psql:next_contact_date.sql:7: ERROR:  syntax error at or near "select"
LINE 4:       A.next_contact = select (max(A.next_contact) from A)
                                ^
and I fail to see what I've done incorrectly.

Do I need to insert DISTINCT ON in the main or sub query? If so, what is the
correct syntax to extract all desired columns from each selected row?

If this is covered in the manual please point me to the proper section; if
not, please educate me on the appropriate syntax to produce the desired
output.

TIA,

Rich


Re: Subquery to select max(date) value

От
"David G. Johnston"
Дата:
On Tue, Feb 12, 2019 at 3:24 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
> psql:next_contact_date.sql:7: ERROR:  syntax error at or near "select"
> LINE 4:       A.next_contact = select (max(A.next_contact) from A)
>                                 ^
> and I fail to see what I've done incorrectly.

You put the open parenthesis after the word select instead of before.

A.next_contact = (SELECT max(A.next_contact) FROM A)

David J.


Re: Subquery to select max(date) value [RESOLVED]

От
Rich Shepard
Дата:
On Tue, 12 Feb 2019, David G. Johnston wrote:

> You put the open parenthesis after the word select instead of before.
> A.next_contact = (SELECT max(A.next_contact) FROM A)

David.

Color me suitably embarrassed.

Thank you,

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Tue, 12 Feb 2019, Jeff Ross wrote:

> Try (select (max(A.next_contact) from A)

Thanks, Jeff.

The syntax accepted by psql is
A.next_contact = (select (max(A.next_contact)) from Activities as A)
but the date is not included in the output.

The revised statement is now:
select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, A.next_contact)
from People as P, Organizations as O, Activities as A
where P.org_id = O.org_id and P.person_id = A.person_id and
       A.next_contact = (select (max(A.next_contact)) from Activities as A)
group by A.next_contact, O.org_id, P.person_id;

This produces the first 5 colums in the outer select but no next_contact
date. When I move A.next_contact to the head of the select list each row
begins with 'infinity'.

I'm still missing the requisite knowledge.

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Tue, 12 Feb 2019, Rich Shepard wrote:

>      A.next_contact = (select (max(A.next_contact)) from Activities as A)

Errata:

The parentheses around the max aggregate are not necessary.

A.next_contact now displays at the end of each returned row as 'infinity'.

Rich


Re: Subquery to select max(date) value

От
Matt Zagrabelny
Дата:
Hey Rich,

On Tue, Feb 12, 2019 at 4:24 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
The query is to return the latest next_contact date for each person. Using
the max() aggregate function and modeling the example of lo_temp on page 13
of the rel. 10 manual I wrote this statement:

I use DISTINCT ON and ORDER BY to get the single latest value from a table with multiple date entries:


HTH,

-m

Re: Subquery to select max(date) value

От
Adrian Klaver
Дата:
On 2/12/19 2:48 PM, Rich Shepard wrote:
> On Tue, 12 Feb 2019, Rich Shepard wrote:
> 
>>      A.next_contact = (select (max(A.next_contact)) from Activities as A)
> 
> Errata:
> 
> The parentheses around the max aggregate are not necessary.
> 
> A.next_contact now displays at the end of each returned row as 'infinity'.

'infinity' is the max date, so this is what you want?

> 
> Rich
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Subquery to select max(date) value

От
Ken Tanzer
Дата:


On Tue, Feb 12, 2019 at 2:48 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Tue, 12 Feb 2019, Rich Shepard wrote:

>      A.next_contact = (select (max(A.next_contact)) from Activities as A)

Errata:

The parentheses around the max aggregate are not necessary.

A.next_contact now displays at the end of each returned row as 'infinity'.

Your subquery isn't doing anything to match on person_id, so it's going to match all the records with the highest next_contact in activities.

I think you want something more like:

A.next_contact = (select (max(A.next_contact)) from Activities as A2 WHERE A2.person_id=A.person_id)

Or, for that matter, since next_contact is all that you're drawing from activities, you can also just put it in the select:

select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name, 
(select max(A.next_contact) from Activities as A WHERE p.person_id=A.person_id)
FROM ...


Cheers,
Ken
--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Tue, 12 Feb 2019, Adrian Klaver wrote:

> 'infinity' is the max date, so this is what you want?

Adrian,

Nope. When I went to make a cup of coffee I realized that I need the other
date constraints (such as IS NOT NULL), too. I'm re-wording the statement to
put everything in the correct order. Will probably try DISTINCT ON, too, if
that makes it simpler or faster.

Thanks,

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Tue, 12 Feb 2019, Ken Tanzer wrote:

> select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name,
> (select max(A.next_contact) from Activities as A WHERE
> p.person_id=A.person_id)
> FROM ...


Ken,

Yes, cheers indeed. A bit of thinking and re-organizing resulted in a
working statement that's close to what I want:

select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name,
        (select max(A.next_contact)))
from People as P, Organizations as O, Activities as A
where P.org_id = O.org_id and P.person_id = A.person_id and
       /*A.next_contact = (select max(A.next_contact) from Activities as A) and */
       A.next_contact <= 'today' and A.next_contact > '2018-12-31' and
       A.next_contact is not null
group by A.next_contact, O.org_id, P.person_id;

The two issues I now focus on resolving are the multiple rows per person
rather than only the most recent and the date displayed at the end of each
output row. DISTINCT ON will eliminate the first issue.

Thanks,

Rich




Re: Subquery to select max(date) value

От
Ken Tanzer
Дата:

Ken,

Yes, cheers indeed. A bit of thinking and re-organizing resulted in a
working statement that's close to what I want:

select (P.person_id, P.lname, P.fname, P.direct_phone, O.org_name,
        (select max(A.next_contact)))
from People as P, Organizations as O, Activities as A
where P.org_id = O.org_id and P.person_id = A.person_id and
       /*A.next_contact = (select max(A.next_contact) from Activities as A) and */
       A.next_contact <= 'today' and A.next_contact > '2018-12-31' and
       A.next_contact is not null
group by A.next_contact, O.org_id, P.person_id;

The two issues I now focus on resolving are the multiple rows per person
rather than only the most recent and the date displayed at the end of each
output row. DISTINCT ON will eliminate the first issue.


If that's getting you what you want, then great and more power to you.  It looks like you'll only get people who have a next_contact in your target window there.  You might also consider something like this...

select 
    p.person_id,
    p.lname,
    p.fname,
    p.direct_phone,
    o.org_name,
    a.next_contact
from 
    people as p
    LEFT JOIN organizations o USING (person_id)
    LEFT JOIN (
        SELECT
            DISTINCT ON (person_id)
            person_id,
            next_contact
        FROM activities a
        -- WHERE ???
        ORDER BY person_id,next_contact DESC
    ) a USING (person_id)
;

Cheers,
Ken



--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Tue, 12 Feb 2019, Ken Tanzer wrote:

> If that's getting you what you want, then great and more power to you. It
> looks like you'll only get people who have a next_contact in your target
> window there. You might also consider something like this...

   <clip>

Ken,

I'll work with your example. This looks most promising.

What I want is a report of folks I need to contact. Some have next_contact
dates in the past when I did not act so I need to do so as long as there is
a next_contact date (no NULLs) and the prospect is active.

When I have a fully working statement I'll post it to the list so others can
read the accepted working solution.

Thanks again,

Rich


Re: Subquery to select max(date) value

От
Andrew Gierth
Дата:
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes:

 Rich> Will probably try DISTINCT ON, too, if that makes it simpler or
 Rich> faster.

You want LATERAL.

-- 
Andrew (irc:RhodiumToad)


Re: Subquery to select max(date) value

От
Jan Kohnert
Дата:
Am Dienstag, 12. Februar 2019, 23:23:49 CET schrieb Rich Shepard:
> The query is to return the latest next_contact date for each person. Using
> the max() aggregate function and modeling the example of lo_temp on page 13
> of the rel. 10 manual I wrote this statement:

You don't even need a subselect to do so:

select 
    P.person_id, P.lname, P.fname, P.direct_phone, 
    O.org_name, max(A.next_contact) as next_contact
from People as P
join Organizations as O
    on P.org_id = O.org_id
join Activities as A
    on P.person_id = A.person_id
group by 
    P.person_id, P.lname, P.fname, 
    P.direct_phone, O.org_name;

-- 
Kind regards Jan





Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Andrew Gierth wrote:

> Rich> Will probably try DISTINCT ON, too, if that makes it simpler or
> Rich> faster.
>
> You want LATERAL.

Andrew,

That's new to me so I'll read about it.

Thanks,

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Tue, 12 Feb 2019, Ken Tanzer wrote:

> If that's getting you what you want, then great and more power to you.  It
> looks like you'll only get people who have a next_contact in your target
> window there.  You might also consider something like this...
>
> select
>    p.person_id,
>    p.lname,
>    p.fname,
>    p.direct_phone,
>    o.org_name,
>    a.next_contact
> from
>    people as p
>    LEFT JOIN organizations o USING (person_id)
>    LEFT JOIN (
>        SELECT
>            DISTINCT ON (person_id)
>            person_id,
>            next_contact
>        FROM activities a
>        -- WHERE ???
>        ORDER BY person_id,next_contact DESC
>    ) a USING (person_id)
> ;

I modified this to restrict the time and ignore nulls by replacing the
question marks:

WHERE a.next_contact <= 'today' and a.next_contact > '2018-12-31' and
       a.next_contact is not null

For a reason I've not yet found, the last condition is not observed; i.e.,
those rows with null next_contact dates appear in the results. Position in
the sequence makes no difference. What might cause this?

Regards,

Rich




Re: Subquery to select max(date) value

От
Adrian Klaver
Дата:
On 2/13/19 6:28 AM, Rich Shepard wrote:
> On Tue, 12 Feb 2019, Ken Tanzer wrote:
> 
>> If that's getting you what you want, then great and more power to 
>> you.  It
>> looks like you'll only get people who have a next_contact in your target
>> window there.  You might also consider something like this...
>>
>> select
>>    p.person_id,
>>    p.lname,
>>    p.fname,
>>    p.direct_phone,
>>    o.org_name,
>>    a.next_contact
>> from
>>    people as p
>>    LEFT JOIN organizations o USING (person_id)
>>    LEFT JOIN (
>>        SELECT
>>            DISTINCT ON (person_id)
>>            person_id,
>>            next_contact
>>        FROM activities a
>>        -- WHERE ???
>>        ORDER BY person_id,next_contact DESC
>>    ) a USING (person_id)
>> ;
> 
> I modified this to restrict the time and ignore nulls by replacing the
> question marks:
> 
> WHERE a.next_contact <= 'today' and a.next_contact > '2018-12-31' and
>        a.next_contact is not null
> 
> For a reason I've not yet found, the last condition is not observed; i.e.,
> those rows with null next_contact dates appear in the results. Position in
> the sequence makes no difference. What might cause this?

The LEFT JOIN. There are rows in people for which there no records 
coming from the sub-select on activities, so the row is 'padded' with 
NULL values for the missing data.

> 
> Regards,
> 
> Rich
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Adrian Klaver wrote:

> The LEFT JOIN. There are rows in people for which there no records coming
> from the sub-select on activities, so the row is 'padded' with NULL values
> for the missing data.

Adrian,

I assume it's the inner left join. I'll trace what's happening at each step
and learn where to specify no nulls.

Thanks,

Rich


Re: Subquery to select max(date) value

От
Adrian Klaver
Дата:
On 2/13/19 7:37 AM, Rich Shepard wrote:
> On Wed, 13 Feb 2019, Adrian Klaver wrote:
> 
>> The LEFT JOIN. There are rows in people for which there no records coming
>> from the sub-select on activities, so the row is 'padded' with NULL 
>> values
>> for the missing data.
> 
> Adrian,
> 
> I assume it's the inner left join. I'll trace what's happening at each step

AFAIK there is no inner left join:
https://www.postgresql.org/docs/10/sql-select.html#SQL-FROM
"
join_type

     One of

         [ INNER ] JOIN

         LEFT [ OUTER ] JOIN

         RIGHT [ OUTER ] JOIN

         FULL [ OUTER ] JOIN

         CROSS JOIN
"

> and learn where to specify no nulls.

You can't it is the nature of the join:

"LEFT OUTER JOIN returns all rows in the qualified Cartesian product 
(i.e., all combined rows that pass its join condition), plus one copy of 
each row in the left-hand table for which there was no right-hand row 
that passed the join condition. This left-hand row is extended to the 
full width of the joined table by inserting null values for the 
right-hand columns. Note that only the JOIN clause's own condition is 
considered while deciding which rows have matches. Outer conditions are 
applied afterwards."


What you are seeing are records for which there is a person in the 
people table that either:

1) Do not have a record under that person_id in the activities table

or

2) Do not have a record that meets the date filtering you are doing on 
next_contact.

> 
> Thanks,
> 
> Rich
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Subquery to select max(date) value

От
Adrian Klaver
Дата:
On 2/13/19 7:37 AM, Rich Shepard wrote:
> On Wed, 13 Feb 2019, Adrian Klaver wrote:
> 
>> The LEFT JOIN. There are rows in people for which there no records coming
>> from the sub-select on activities, so the row is 'padded' with NULL 
>> values
>> for the missing data.
> 
> Adrian,
> 
> I assume it's the inner left join. I'll trace what's happening at each step
> and learn where to specify no nulls.

Should have been clearer in my previous post, you can get rid of the 
nulls by filtering out the entire row.

> 
> Thanks,
> 
> Rich
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Adrian Klaver wrote:

> AFAIK there is no inner left join:
> https://www.postgresql.org/docs/10/sql-select.html#SQL-FROM

Sigh. There are two LEFT JOINS in the statement. I referred to the SECOND
one as INNER. I correct my message to refer to the second of the two left
joins.

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Adrian Klaver wrote:

> Should have been clearer in my previous post, you can get rid of the nulls
> by filtering out the entire row.

Adrian,

Thank you. I'm rebuilding the statement from the inside out (which helps me
learn more SQL in the process). For example,

select * from activities where next_contact is not null and
          next_contact <= 'today' and next_contact > '2018-12-31';

works as written. This suggests that it's the placement within the two joins
that needs correcting. Now I'll add the two joined tables one at a time and
learn how to structure the whole statement.

Best regards,

Rich



Re: Subquery to select max(date) value

От
"Adrian Klaver"
Дата:
On Wed, Feb 13, 2019, at 9:06 AM, Rich Shepard wrote:
> On Wed, 13 Feb 2019, Adrian Klaver wrote:
> 
> > Should have been clearer in my previous post, you can get rid of the nulls
> > by filtering out the entire row.
> 
> Adrian,
> 
> Thank you. I'm rebuilding the statement from the inside out (which helps me
> learn more SQL in the process). For example,
> 
> select * from activities where next_contact is not null and
>           next_contact <= 'today' and next_contact > '2018-12-31';
> 
> works as written. This suggests that it's the placement within the two joins
> that needs correcting. Now I'll add the two joined tables one at a time and
> learn how to structure the whole statement.


It will work if you use it to filter after the joins are done.

> 
> Best regards,
> 
> Rich
> 
> 
>


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Adrian Klaver wrote:

> It will work if you use it to filter after the joins are done.

I'll work on finding the proper syntax to do this. Need to do more reading
and trial-and-error testing.

Regards,

Rich


Re: Subquery to select max(date) value

От
Adrian Klaver
Дата:
On 2/13/19 7:37 AM, Rich Shepard wrote:
> On Wed, 13 Feb 2019, Adrian Klaver wrote:
> 
>> The LEFT JOIN. There are rows in people for which there no records coming
>> from the sub-select on activities, so the row is 'padded' with NULL 
>> values
>> for the missing data.
> 
> Adrian,
> 
> I assume it's the inner left join. I'll trace what's happening at each step
> and learn where to specify no nulls.

create table people(person_id integer, desc_fld varchar);
create table activities(person_id integer, next_contact date);

insert into people values (1, 'contacted'), (2, 'never contacted'), (3, 
'out of range'), (4, 'contacted');

insert into activities values (1, '01/31/19'), (3, '11/01/18'), (4, 
'02/02/19');

No activities record for person_id =2, activities data set to NULL:

SELECT  * FROM people AS p LEFT JOIN activities AS a ON p.person_id = 
a.person_id;
  person_id |    desc_fld     | person_id | next_contact
-----------+-----------------+-----------+--------------
          1 | contacted       |         1 | 2019-01-31
          2 | never contacted |      NULL | NULL
          3 | out of range    |         3 | 2018-11-01
          4 | contacted       |         4 | 2019-02-02

Close to your last posted query. person_id 2 and 3 have NULL values for
activities data as there is no record for 2 and 3 is out of the date range.:

  select
    p.person_id,
    p.desc_fld,
    a.next_contact
from
    people as p
    LEFT JOIN (
        SELECT
            DISTINCT ON (person_id)
            person_id,
            next_contact
        FROM
             activities a
        WHERE
            a.next_contact <= 'today' and a.next_contact > '2018-12-31' and
            a.next_contact is not null
    ) a USING (person_id)
;

  person_id |    desc_fld     | next_contact
-----------+-----------------+--------------
          1 | contacted       | 2019-01-31
          2 | never contacted | NULL
          3 | out of range    | NULL
          4 | contacted       | 2019-02-02
(4 rows)

What I think you want:

select
    p.person_id,
    p.desc_fld,
    a.next_contact
from
    people as p
    LEFT JOIN (
        SELECT
            DISTINCT ON (person_id)
            person_id,
            next_contact
        FROM
             activities a
        WHERE
            a.next_contact <= 'today' and a.next_contact > '2018-12-31' and
            a.next_contact is not null
    ) a USING (person_id)
WHERE next_contact is not null;

  person_id | desc_fld  | next_contact
-----------+-----------+--------------
          1 | contacted | 2019-01-31
          4 | contacted | 2019-02-02





> 
> Thanks,
> 
> Rich
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Subquery to select max(date) value

От
Andrew Gierth
Дата:
>>>>> "Adrian" == Adrian Klaver <adrian.klaver@aklaver.com> writes:

 Adrian> Close to your last posted query. person_id 2 and 3 have NULL
 Adrian> values for activities data as there is no record for 2 and 3 is
 Adrian> out of the date range.:

 Adrian>  select
 Adrian>    p.person_id,
 Adrian>    p.desc_fld,
 Adrian>    a.next_contact
 Adrian> from
 Adrian>    people as p
 Adrian>    LEFT JOIN (
 Adrian>        SELECT
 Adrian>            DISTINCT ON (person_id)
 [...]
 Adrian>    ) a USING (person_id)
 Adrian> ;

DISTINCT ON with no matching ORDER BY at the _same_ query level is
non-deterministic.

Also DISTINCT ON isn't efficient. Consider instead something along the
lines of:

select p.*,
       a.*     -- for illustration
  from people p
       join lateral (select *
                       from activities a1
                      where a1.person_id = p.person_id
                        and a1.next_contact > '2018-12-31'
                        and a1.next_contact <= 'today'
                      order by a1.next_contact desc
                      limit 1) a
         on true;

(make sure to have an index on activities(person_id,next_contact))

-- 
Andrew (irc:RhodiumToad)


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Andrew Gierth wrote:

> Adrian> Close to your last posted query. person_id 2 and 3 have NULL
> Adrian> values for activities data as there is no record for 2 and 3 is
> Adrian> out of the date range.:

> DISTINCT ON with no matching ORDER BY at the _same_ query level is
> non-deterministic.
>
> Also DISTINCT ON isn't efficient. Consider instead something along the
> lines of:

Andrew/Adrian,

I again read about DISTINCT and DISTINCT ON and fully understand them. I've
also again read about JOINs; I understand them in terms of sets and _think_
that in this query the people table is the LEFT (many) while the
organizations and activities tables are the RIGHT (one) in the many-to-one
relationships. That is, for each person_id there is only one org_id and only
one next_contact that meets the three constraints.

I'm now working on understanding how the syntax in the examples you two,
Ken, and others have provided expresses the many-to-one relationships of
organization and activities to people. I have the syntax that returns the
next_date meeting the WHERE constraints to each person_id and am now
focusing on adding the additional people and organization columns to the
results. Might not be until tomorrow or Friday but I'll let you and the list
subscribes know when I have understood all your suggestions and get the
results I want from the query.

Thanks again,

Rich



Re: Subquery to select max(date) value

От
Adrian Klaver
Дата:
On 2/13/19 2:24 PM, Rich Shepard wrote:
> On Wed, 13 Feb 2019, Andrew Gierth wrote:
> 
>> Adrian> Close to your last posted query. person_id 2 and 3 have NULL
>> Adrian> values for activities data as there is no record for 2 and 3 is
>> Adrian> out of the date range.:
> 
>> DISTINCT ON with no matching ORDER BY at the _same_ query level is
>> non-deterministic.
>>
>> Also DISTINCT ON isn't efficient. Consider instead something along the
>> lines of:
> 
> Andrew/Adrian,
> 
> I again read about DISTINCT and DISTINCT ON and fully understand them. I've
> also again read about JOINs; I understand them in terms of sets and _think_
> that in this query the people table is the LEFT (many) while the
> organizations and activities tables are the RIGHT (one) in the many-to-one
> relationships. That is, for each person_id there is only one org_id and 
> only
> one next_contact that meets the three constraints.

Given a sufficiently large date range that may not be true as you may 
have contacted a given person multiple times during that range and 
generated multiple activities records.

> 
> I'm now working on understanding how the syntax in the examples you two,
> Ken, and others have provided expresses the many-to-one relationships of
> organization and activities to people. I have the syntax that returns the
> next_date meeting the WHERE constraints to each person_id and am now
> focusing on adding the additional people and organization columns to the
> results. Might not be until tomorrow or Friday but I'll let you and the 
> list
> subscribes know when I have understood all your suggestions and get the
> results I want from the query.
> 
> Thanks again,
> 
> Rich
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Adrian Klaver wrote:

> Given a sufficiently large date range that may not be true as you may have
> contacted a given person multiple times during that range and generated
> multiple activities records.

Adrian,

This is true as it has occurred. I want only the most recent activity row
associated with that person_id. (NB: while I'm really comfortable with DDL
statements my DML experience is seriously lacking and that's what I need to
improve now.)

I've just read a couple of blog posts on the LATERAL join added in 9.3 and
understand it in theory. Properly applying it to my application is now my
focus (and I need to re-read Andrew's example very closely.)

Best regards,

Rich


Re: Subquery to select max(date) value

От
Brent Wood
Дата:

I have not really followed this thread but would not a query along the lines of

select * from activity where person_id = n and timestamp = (select max(timestamp) from activity where person_id = n);

give the required answer ie, always return the latest result for the specified person_id??


Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI:  +64 (4) 3860529





Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
T +64-4-386-0529 
National Institute of Water & Atmospheric Research Ltd (NIWA)
301 Evans Bay Parade, Greta Point, Wellington
Connect with NIWA: niwa.co.nz Facebook Twitter LinkedIn Instagram
To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems.

________________________________________
From: Rich Shepard <rshepard@appl-ecosys.com>
Sent: Thursday, February 14, 2019 12:13
To: pgsql-generallists.postgresql.org
Subject: Re: Subquery to select max(date) value

On Wed, 13 Feb 2019, Adrian Klaver wrote:

> Given a sufficiently large date range that may not be true as you may have
> contacted a given person multiple times during that range and generated
> multiple activities records.

Adrian,

This is true as it has occurred. I want only the most recent activity row
associated with that person_id. (NB: while I'm really comfortable with DDL
statements my DML experience is seriously lacking and that's what I need to
improve now.)

I've just read a couple of blog posts on the LATERAL join added in 9.3 and
understand it in theory. Properly applying it to my application is now my
focus (and I need to re-read Andrew's example very closely.)

Best regards,

Rich




Вложения

Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Brent Wood wrote:

> I have not really followed this thread but would not a query along the lines of
> select * from activity where person_id = n and timestamp = (select
> max(timestamp) from activity where person_id = n);
> give the required answer ie, always return the latest result for the specified person_id??

Brent,

I don't know. What does work is this statement:

SELECT
     DISTINCT ON (person_id) person_id,
     next_contact
     FROM activities AS a
         WHERE a.next_contact is not null and a.next_contact <= 'today' and
               a.next_contact > '2018-12-31'
         ORDER BY person_id,next_contact;

which returns these results:

  person_id | next_contact 
-----------+--------------
          1 | 2019-01-14
          4 | 2019-01-14
         22 | 2019-01-14
         36 | 2019-01-03
         37 | 2019-01-14
         38 | 2019-01-21
         40 | 2019-02-11
         41 | 2019-02-11
         42 | 2019-02-11
         43 | 2019-02-11
         44 | 2019-02-11
         45 | 2019-02-11
         46 | 2019-02-11
(13 rows)

Now I'm learning how to join the people and organization table using LATERAL
join(s) so the results include names and phone numbers.

Thanks for the suggestion,

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Wed, 13 Feb 2019, Andrew Gierth wrote:

> You want LATERAL.

Andrew, et al,:

I found a couple of web pages describing the lateral join yet have not
correctly applied them. The manual's page did not help me get the correct
syntax, either. Think I'm close, however:

select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, a.next_contact
from people as p, organizations as o
     lateral
         (select a.next_contact
         from activities as a
         where a.next_contact is not null and a.next_contact <= 'today' and
               a.next_contact > '2018-12-31'
         order by person_id,next_contact);

When run I get this syntax error:

$ psql -f get_next_contact_dates.sql -d bustrac 
psql:get_next_contact_dates.sql:10: ERROR:  syntax error at or near "lateral"
LINE 3:     lateral

What am I still missing?

Regards,

Rich


Re: Subquery to select max(date) value

От
Andrew Gierth
Дата:
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes:

 Rich> I found a couple of web pages describing the lateral join yet
 Rich> have not correctly applied them. The manual's page did not help
 Rich> me get the correct syntax, either. Think I'm close, however:

 Rich> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, a.next_contact
 Rich> from people as p, organizations as o
 Rich>     lateral
 Rich>         (select a.next_contact

LATERAL (SELECT ...)   is syntactically like (SELECT ...) in that it
comes _after_ a "," in the from-clause or after a [LEFT] JOIN keyword.
Don't think of LATERAL as being a type of join, think of it as
qualifying the (SELECT ...) that follows.

 Rich>         from activities as a
 Rich>         where a.next_contact is not null and a.next_contact <= 'today' and
 Rich>               a.next_contact > '2018-12-31'

You'd want a condition here that references the "people" table; the
whole point of LATERAL is that it opens up the scope of column
references in the subquery to include those tables which are to its left
in the from-clause.

 Rich>         order by person_id,next_contact);

and I'm guessing you want that ordered by next_contact alone, possibly
with LIMIT 1 to get just the nearest following next_contact time.

-- 
Andrew (irc:RhodiumToad)


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Fri, 15 Feb 2019, Andrew Gierth wrote:

> LATERAL (SELECT ...)   is syntactically like (SELECT ...) in that it comes
> _after_ a "," in the from-clause or after a [LEFT] JOIN keyword. Don't
> think of LATERAL as being a type of join, think of it as qualifying the
> (SELECT ...) that follows.

Andrew,

Thank you. Now I understand the difference.

> Rich>         from activities as a
> Rich>         where a.next_contact is not null and a.next_contact <= 'today' and
> Rich>               a.next_contact > '2018-12-31'
>
> You'd want a condition here that references the "people" table; the whole
> point of LATERAL is that it opens up the scope of column references in the
> subquery to include those tables which are to its left in the from-clause.

And here I got it backwards, thinking the subquery could reference the
columns in the people table from the initial select.

> Rich>         order by person_id,next_contact);
>
> and I'm guessing you want that ordered by next_contact alone, possibly
> with LIMIT 1 to get just the nearest following next_contact time.

That's true. With 'discrete on' only the most recent next_contact date is
returned.

More work over the weekend on this now I have a better understanding of
lateral.

Thanks again,

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Fri, 15 Feb 2019, Andrew Gierth wrote:

> LATERAL (SELECT ...)   is syntactically like (SELECT ...) in that it
> comes _after_ a "," in the from-clause or after a [LEFT] JOIN keyword.

Andrew,

Yes, the missing ',' made a big difference.

> You'd want a condition here that references the "people" table;

Got it.

> and I'm guessing you want that ordered by next_contact alone, possibly
> with LIMIT 1 to get just the nearest following next_contact time.

Using LIMIT 1 produces only the first returned row. This statement (using
max() for next_contact) produces no error message, but also no results so I
killed the process after 30 seconds. Without a syntax error for guidance I
don't know how to proceed. I've not before run 'explain' on a query. Would
that be appropriate here?

select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name
from people as p, organizations as o,
     lateral
         (select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name,
                 max(a.next_contact)
         from people as p, organizations as o, activities as a
         where a.next_contact > '2018-12-31' and
               a.next_contact <= 'today' and
               a.next_contact is not null
         group by p.person_id, o.org_name, a.next_contact
         order by p.person_id, o.org_name, a.next_contact) sq;

Regards,

Rich


Re: Subquery to select max(date) value

От
Andrew Gierth
Дата:
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes:

 Rich> Using LIMIT 1 produces only the first returned row. This
 Rich> statement (using max() for next_contact) produces no error
 Rich> message, but also no results so I killed the process after 30
 Rich> seconds. Without a syntax error for guidance I don't know how to
 Rich> proceed. I've not before run 'explain' on a query. Would that be
 Rich> appropriate here?

Yes.

 Rich> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name
 Rich> from people as p, organizations as o,
 Rich>     lateral
 Rich>         (select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name,
 Rich>                 max(a.next_contact)
 Rich>         from people as p, organizations as o, activities as a
 Rich>         where a.next_contact > '2018-12-31' and
 Rich>               a.next_contact <= 'today' and
 Rich>               a.next_contact is not null
 Rich>         group by p.person_id, o.org_name, a.next_contact
 Rich>         order by p.person_id, o.org_name, a.next_contact) sq;

The problem here is that you have no join conditions at all, so the
result set of this query is massive. And you've duplicated many tables
inside the subquery which is not necessary or appropriate.

select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, sq.*
  from people as p
       join organizations as o on p.organization_id=o.id   -- OR WHATEVER
       cross join
         lateral (select a.next_contact
                    from activities as a
                   where a.person_id=p.person_id         --VERY IMPORTANT
                     and a.next_contact > '2018-12-31'
                     and a.next_contact <= 'today'
                     and a.next_contact is not null
                   order by a.next_contact DESC
                   limit 1) sq;

Ordering by DESC with a limit 1 is used to get the max next_contact
value rather than the smallest; this is similar to max(), but makes it
trivial to also access the other columns of the _same_ activities row
which is being selected.

-- 
Andrew (irc:RhodiumToad)


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Fri, 15 Feb 2019, Andrew Gierth wrote:

> Rich> I've not before run 'explain' on a query. Would that be
> Rich> appropriate here?
>
> Yes.

Andrew,

I'll learn how to use it.

> The problem here is that you have no join conditions at all, so the
> result set of this query is massive. And you've duplicated many tables
> inside the subquery which is not necessary or appropriate.

Got it now.

> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, sq.*
>  from people as p
>       join organizations as o on p.organization_id=o.id   -- OR WHATEVER
>       cross join
>         lateral (select a.next_contact
>                    from activities as a
>                   where a.person_id=p.person_id         --VERY IMPORTANT
>                     and a.next_contact > '2018-12-31'
>                     and a.next_contact <= 'today'
>                     and a.next_contact is not null
>                   order by a.next_contact DESC
>                   limit 1) sq;
>
> Ordering by DESC with a limit 1 is used to get the max next_contact
> value rather than the smallest; this is similar to max(), but makes it
> trivial to also access the other columns of the _same_ activities row
> which is being selected.

This puts everything in perspective and is a very valuable lesson for me as
this application has many queries of this type. You've put together all that
I've read in the manual, on this mail list thread, and on web pages. I
really appreciate your patient guidance.

Best regards,

Rich


Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Fri, 15 Feb 2019, Andrew Gierth wrote:

> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, sq.*
>  from people as p
>       join organizations as o on p.organization_id=o.id   -- OR WHATEVER
>       cross join
>         lateral (select a.next_contact
>                    from activities as a
>                   where a.person_id=p.person_id         --VERY IMPORTANT
>                     and a.next_contact > '2018-12-31'
>                     and a.next_contact <= 'today'
>                     and a.next_contact is not null
>                   order by a.next_contact DESC
>                   limit 1) sq;

After working with this query I modified it slightly to return only the
next_contact date:

select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
from people as p
      join organizations as o on p.org_id = o.org_id
      cross join
          lateral
          (select a.next_contact
          from activities as a
              where a.person_id = p.person_id and
              p.active='True' and
              a.next_contact is not null
          order by a.next_contact DESC
          limit 1) sq;

It works wellm, but the row order is not that of a.next_contact. In fact,
there seems to be no order in the returned set. The next_contact column is
in the lateral sub-query. Does this make a difference? I've no idea how to
modify the query so that returned rows are in decreasing next_contact order.

Best regards,

Rich




Re: Subquery to select max(date) value

От
Ken Tanzer
Дата:


On Thu, Mar 28, 2019 at 3:59 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:

After working with this query I modified it slightly to return only the
next_contact date:

select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
from people as p
      join organizations as o on p.org_id = o.org_id
      cross join
          lateral
          (select a.next_contact
          from activities as a
              where a.person_id = p.person_id and
              p.active='True' and
              a.next_contact is not null
          order by a.next_contact DESC
          limit 1) sq;

It works wellm, but the row order is not that of a.next_contact. In fact,
there seems to be no order in the returned set. The next_contact column is
in the lateral sub-query. Does this make a difference? I've no idea how to
modify the query so that returned rows are in decreasing next_contact order.

Best regards,

Rich

You need the ORDER BY in the outer join.  (And I don't think the one inside the lateral join is doing you any good).  Try:

...
              a.next_contact is not null
             limit 1) sq
            order by sq.next_contact DESC;
 
Cheers,
Ken



--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Subquery to select max(date) value

От
"David G. Johnston"
Дата:
On Thu, Mar 28, 2019 at 3:59 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
from people as p
      join organizations as o on p.org_id = o.org_id
      cross join
          lateral
          (select a.next_contact
          from activities as a
              where a.person_id = p.person_id and
              p.active='True' and
              a.next_contact is not null
          order by a.next_contact DESC
          limit 1) sq;

It works wellm, but the row order is not that of a.next_contact. In fact,
there seems to be no order in the returned set.
The next_contact column is
in the lateral sub-query. Does this make a difference?

Yes, if you join the result on an ordered subquery to anything you no longer have a guaranteed order for the combined relation.

select ...
from ...
join ...
cross join lateral ...
-- now add an order by for the top-level query
order by

David J.

Re: Subquery to select max(date) value

От
"David G. Johnston"
Дата:
On Thu, Mar 28, 2019 at 4:05 PM Ken Tanzer <ken.tanzer@gmail.com> wrote:

 (And I don't think the one inside the lateral join is doing you any good).  Try:

...
              a.next_contact is not null
             limit 1) sq
            order by sq.next_contact DESC;


The fact that the subquery has a LIMT 1 clause leads me to assume the ORDER BY there is quite necessary.

David J.

Re: Subquery to select max(date) value

От
Ken Tanzer
Дата:


On Thu, Mar 28, 2019 at 4:07 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Mar 28, 2019 at 4:05 PM Ken Tanzer <ken.tanzer@gmail.com> wrote:

 (And I don't think the one inside the lateral join is doing you any good).  Try:

...
              a.next_contact is not null
             limit 1) sq
            order by sq.next_contact DESC;


The fact that the subquery has a LIMT 1 clause leads me to assume the ORDER BY there is quite necessary.

David J.

Have to agree with you there.  Sorry for the brain fart!

Ken

--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Thu, 28 Mar 2019, Ken Tanzer wrote:

> You need the ORDER BY in the outer join.

Ken,

I thought so. But, ...

> (And I don't think the one inside the lateral join is doing you any good).
> Try:
> ...
>              a.next_contact is not null
>             limit 1) sq
>            order by sq.next_contact DESC;

This re-orders the returned set, but still not in chronological order.

Thanks,

Rich



Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Thu, 28 Mar 2019, David G. Johnston wrote:

> Yes, if you join the result on an ordered subquery to anything you no
> longer have a guaranteed order for the combined relation.

David,

This makes sense to me.

> select ...
> from ...
> join ...
> cross join lateral ...
> -- now add an order by for the top-level query
> order by

Tried this and did not do it correctly. Should there be two 'order by', one
in the sub-query, the other in the top-level query? This does not return the
desired order:

select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
from people as p
      join organizations as o on p.org_id = o.org_id
      cross join
          lateral
          (select a.next_contact
          from activities as a
              where a.person_id = p.person_id and
              p.active='True' and
              a.next_contact is not null
          order by a.next_contact DESC
          limit 1) sq
          order by sq.next_contact DESC;

Obviously, I'm still missing the implementation of your response.

Best regards,

Rich



Re: Subquery to select max(date) value

От
Ken Tanzer
Дата:


On Thu, Mar 28, 2019 at 4:14 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Thu, 28 Mar 2019, Ken Tanzer wrote:

> You need the ORDER BY in the outer join.

Ken,

I thought so. But, ...

> (And I don't think the one inside the lateral join is doing you any good).
> Try:
> ...
>              a.next_contact is not null
>             limit 1) sq
>            order by sq.next_contact DESC;

This re-orders the returned set, but still not in chronological order.

Really?  Is your next_contact field a date field and not a text field?  What order does it come out in?

(And I assume you saw David J's correction to the misinformation I offered.)

Cheers,
Ken
 

--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Subquery to select max(date) value

От
"David G. Johnston"
Дата:
On Thu, Mar 28, 2019 at 4:21 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
This does not return the
desired order:

It returns something at least.  If you put the output you get into a spreadsheet are you able to manually sort it the way you desire?

David J.
 

Re: Subquery to select max(date) value

От
Andrew Gierth
Дата:
>>>>> "Rich" == Rich Shepard <rshepard@appl-ecosys.com> writes:

 Rich> Tried this and did not do it correctly. Should there be two
 Rich> 'order by', one in the sub-query, the other in the top-level
 Rich> query?

Yes.

 Rich> This does not return the desired order:

 Rich> select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
 Rich> from people as p
 Rich>      join organizations as o on p.org_id = o.org_id
 Rich>      cross join
 Rich>          lateral
 Rich>          (select a.next_contact
 Rich>          from activities as a
 Rich>              where a.person_id = p.person_id and
 Rich>              p.active='True' and
 Rich>              a.next_contact is not null
 Rich>          order by a.next_contact DESC
 Rich>          limit 1) sq
 Rich>          order by sq.next_contact DESC;

That query seems correct assuming you want the result in descending
order of next_contact. How did the actual result differ from your
expectation?

-- 
Andrew (irc:RhodiumToad)



Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Thu, 28 Mar 2019, David G. Johnston wrote:

> It returns something at least. If you put the output you get into a
> spreadsheet are you able to manually sort it the way you desire?

David,

Probably, but that's not practical for the application.

Thanks,

Rich



Re: Subquery to select max(date) value

От
Rich Shepard
Дата:
On Thu, 28 Mar 2019, Ken Tanzer wrote:

> Really? Is your next_contact field a date field and not a text field? What
> order does it come out in?
>
> (And I assume you saw David J's correction to the misinformation I offered.)

Ken,

Mea culpa. the next_contact column is a date, I did apply David's
correction, and I had a couple of mistaken dates (I had entered 2018 rather
than 2019; the middleware code will check for this error and have the user
correct entries other than the current year).

Regards,

Rich



Re: Subquery to select max(date) value [RESOLVED]

От
Rich Shepard
Дата:
On Fri, 29 Mar 2019, Andrew Gierth wrote:

> That query seems correct assuming you want the result in descending order
> of next_contact. How did the actual result differ from your expectation?

Andrew,

User error: I had a couple of date typos (2018 rather than 2019) and was
thinking of generating the display in ascending, rather than decending,
order.

Thinking again about what I need in terms of query results I made a couple
of changes. Now the query selects the most recent next_contact date >= a
week ago today and into the future:

select p.person_id, p.lname, p.fname, p.direct_phone, p.active, o.org_name, sq.*
from people as p
      join organizations as o on p.org_id = o.org_id
      cross join
          lateral
          (select a.next_contact
          from activities as a
              where a.person_id = p.person_id and
              p.active='True' and
              a.next_contact >= current_date - interval '7' day and
              a.next_contact is not null
          order by a.next_contact ASC
          limit 1) sq
          order by sq.next_contact ASC;

The results are what I need and allow me to not miss a date as long as I run
the query at least once a week.

My thanks to you, David J., and Ken for the valuable lessons.

Best regards,

Rich