Обсуждение: limiting join results
I am doing a query with a 3-way join. The join and select are working
fine. However, what I want is to select only the row with the
smallest timediff for each distinct t1.date.
This is the query (simplified):
select t1.date, t1.parent,
t1.id, t2.id, t3.id, t3.date,
(t3.date - t1.date) as timediff
from (t1 LEFT JOIN t2
ON t1.parent = t2.id)
LEFT JOIN t3 ON t2.page = t3.page
where
t3.date < t1.date and
t3.event_type = 'page' and
t1.user_id = '61516' and
order by t1.date, timediff;
Here are my results:
t1.date | parent | t1.id | t2.id | t3.id |
t3.date | timediff
------------------------+--------+--------+--------+--------+---------
---------------+----------
2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 |
2002-11-14 14:46:11-08 | 00:00:22
2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 |
2002-11-14 15:33:50-08 | 00:00:11
2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 |
2002-11-14 15:33:40-08 | 00:00:21
2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 |
2002-11-14 14:46:35-08 | 00:47:26
2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 |
2002-11-14 14:46:11-08 | 00:47:50
(5 rows)
What I want are only the first two rows. However, I don't know how
many distinct t1.date values I will have. Using DISTINCT doesn't seem
to change the output, and I'm not convinced it would keep the correct
row if it did.
My normal habit is to clean up the results in perl, but it seems to
me that I should be able to do it in the SQL query and be a bit
cleaner.
Thank you for your assistance.
Elaine Lindelef
On Friday 15 November 2002 12:14 am, Elaine Lindelef wrote: > I am doing a query with a 3-way join. The join and select are working > fine. However, what I want is to select only the row with the > smallest timediff for each distinct t1.date. > > This is the query (simplified): > > select t1.date, t1.parent, > t1.id, t2.id, t3.id, t3.date, > (t3.date - t1.date) as timediff > from (t1 LEFT JOIN t2 > ON t1.parent = t2.id) > LEFT JOIN t3 ON t2.page = t3.page > where > t3.date < t1.date and > t3.event_type = 'page' and > t1.user_id = '61516' and > order by t1.date, timediff; > > Here are my results: > > t1.date | parent | t1.id | t2.id | t3.id | > t3.date | timediff > ------------------------+--------+--------+--------+--------+--------- > ---------------+---------- > 2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 | > 2002-11-14 14:46:11-08 | 00:00:22 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 | > 2002-11-14 15:33:50-08 | 00:00:11 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 | > 2002-11-14 15:33:40-08 | 00:00:21 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 | > 2002-11-14 14:46:35-08 | 00:47:26 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 | > 2002-11-14 14:46:11-08 | 00:47:50 > (5 rows) > > What I want are only the first two rows. However, I don't know how > many distinct t1.date values I will have. Using DISTINCT doesn't seem > to change the output, and I'm not convinced it would keep the correct > row if it did. > > My normal habit is to clean up the results in perl, but it seems to > me that I should be able to do it in the SQL query and be a bit > cleaner. > > Thank you for your assistance. > > Elaine Lindelef You try cursor begin; declare cursor c1 for select ....; fetch forward 2 from c1; commit; regards Haris Peco
On Thu, 14 Nov 2002, Elaine Lindelef wrote: > I am doing a query with a 3-way join. The join and select are working > fine. However, what I want is to select only the row with the > smallest timediff for each distinct t1.date. > > This is the query (simplified): > > select t1.date, t1.parent, > t1.id, t2.id, t3.id, t3.date, > (t3.date - t1.date) as timediff > from (t1 LEFT JOIN t2 > ON t1.parent = t2.id) > LEFT JOIN t3 ON t2.page = t3.page > where > t3.date < t1.date and > t3.event_type = 'page' and > t1.user_id = '61516' and > order by t1.date, timediff; If you don't mind a postgres specific solution, I think select distinct on (t1.date) t1.date, ... may give you what you want. A real SQL solution is a bit more involved, I think you need to do a subselect with a group by.
Elaine Lindelef wrote:
> I am doing a query with a 3-way join. The join and select are working
> fine. However, what I want is to select only the row with the smallest
> timediff for each distinct t1.date.
First of all, you are using left joins. I don't think that's what you
want. A left join says to include all matching rows from t1, even if
there's no matching row in t2; it makes a fake t2 with all nulls if
necessary to match the t1 with. (Not a terribly good explanation. If you
need better, do a web search or I can try again.) Likewise for t2 and
t3. But you are discarding those with the t3.date comparison, because
"t3.date < t1.date" will always be false if t3.date is null, as it would
be for the extra stuff from the left join. You probably want an inner
join (the normal kind), which is a lot less expensive also.
And the way I typically say the smallest/largest/whateverest something
is with a not exists clause. Something that says "there is no record
with a smaller timediff than this one and matching it otherwise."
I'll use a simpler example (only one table in the not exists) to
demonstrate that with. Let's just say I'm interested in finding the
oldest employee in each department of some business.
create table department (
department_id serial primary key,
name varchar(50) not null
);
create table employee (
employee_id serial primary key,
department_id integer references department not null,
name varchar(50) not null,
when_born date
);
select department.name as department_name,
employee.name as employee_name,
when_born
from employee natural join department
where not exists
(select 'x'
from employee as older_employee
where employee.department_id
= older_employee.department_id
and older_employee.when_born < employee.when_born);
In other words, show me the department name, employee name, and birth
date of every employee for whom there is no older employee in the same
department.
Does that help?
Scott
Elaine, Are you sure of the where clause? (t3.date - t1.date) should be negative if t3.date < t1.date! JLL Elaine Lindelef wrote: > > I am doing a query with a 3-way join. The join and select are working > fine. However, what I want is to select only the row with the > smallest timediff for each distinct t1.date. > > This is the query (simplified): > > select t1.date, t1.parent, > t1.id, t2.id, t3.id, t3.date, > (t3.date - t1.date) as timediff > from (t1 LEFT JOIN t2 > ON t1.parent = t2.id) > LEFT JOIN t3 ON t2.page = t3.page > where > t3.date < t1.date and > t3.event_type = 'page' and > t1.user_id = '61516' and > order by t1.date, timediff; > > Here are my results: > > t1.date | parent | t1.id | t2.id | t3.id | > t3.date | timediff > ------------------------+--------+--------+--------+--------+--------- > ---------------+---------- > 2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 | > 2002-11-14 14:46:11-08 | 00:00:22 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 | > 2002-11-14 15:33:50-08 | 00:00:11 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 | > 2002-11-14 15:33:40-08 | 00:00:21 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 | > 2002-11-14 14:46:35-08 | 00:47:26 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 | > 2002-11-14 14:46:11-08 | 00:47:50 > (5 rows) > > What I want are only the first two rows. However, I don't know how > many distinct t1.date values I will have. Using DISTINCT doesn't seem > to change the output, and I'm not convinced it would keep the correct > row if it did. > > My normal habit is to clean up the results in perl, but it seems to > me that I should be able to do it in the SQL query and be a bit > cleaner. > > Thank you for your assistance. > > Elaine Lindelef > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Thanks all! Those suggestions worked, and your help is deeply appreciated. Elaine