Обсуждение: Retrieve most recent 1 record from joined table

Поиск

Список

Период

Сортировка

Retrieve most recent 1 record from joined table

От

agharta

Дата:

22 августа 2014 г., 11:04:52

Hi all,
This is my first question, don't hate me please if it is the wrong place.

Mi question is relative seimple, but goes me crazy.

First, create some example tables:

----
create table table1(    t1_ID SERIAL primary key,    t1_value text
);

create table table2(    t2_ID SERIAL primary key,    t2_value text,    t1_id integer
);


create table table3(    t3_ID SERIAL primary key,    t3_value text,    t3_date timestamp,    t2_id integer
);

ALTER TABLE table2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON 
DELETE CASCADE;
ALTER TABLE table3 ADD FOREIGN KEY (t2_id) REFERENCES table2 (t2_ID) ON 
DELETE CASCADE;

CREATE INDEX IDX_TABLE1_T1_value ON table1 (t1_value);
CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);
CREATE INDEX IDX_TABLE3_T3_value ON table3 (t3_value);
CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);
-----

As you can see, table3 is referenced to table2 and table2 is referenced 
to table1

Next, insert some data on tables:

--one millon records into table1
insert into table1(t1_value) select md5(random()::text) from (select 
generate_series(1,1000000) as a) as c;

--100.000 records on table2 where t1_id is a random number between 1 and 
10.001. This guarantees many records pointing to same table1 t1_id
insert into table2(t2_value, t1_id) select md5(random()::text), 
trunc((random()+1)*10000)::integer from (select 
generate_series(1,100000) as a) as c;

--again, 1.022.401 records on table3 where t2_id is a random number 
between 1 and 10.001. This guarantee many records pointing to same 
table2 t2_id.
-- random numbers (1.022.401) are generated by generated_series function 
with timestamp (1 minute) interval
insert into table3(t3_value, t2_id, t3_date) select md5(random()::text), 
trunc((random()+1)*10000)::integer, c.date_val from (select 
generate_series(timestamp '2013-01-01',timestamp '2014-12-12', interval 
'1 minute') as date_val) as c;


So, now we should have table3 with many rows per table2(t2_id) and many 
rows in table2 with same t1_id.


Now, the question:


Joining the tables, how to get ONLY most recent record per table3(t3_date)??

Query example:

select * from table1 as t1
inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp 
'2014-08-20')
order by t3.t2_id, t3.t3_date desc

produces to me this dataset (extract)


t1_id       t1_value      t2_id      t2_value           t1_id      
t3_id         t3_value                             t3_date                            t2_id
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    857683 
0fb6cb380522dd83b6ac4beba2c6e98f    2014-08-19 14:42:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    857088 
9c3481bfc0bdea51e62b338a1777cde6    2014-08-19 04:47:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    853208 
253dc2083e70dd9e276867f43889973f    2014-08-16 12:07:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    851237 
970acf901c4232c178b5dbeda4e44ac8    2014-08-15 03:16:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    847436 
cb8d52255eaa850f9d7f203092a2ce13    2014-08-12 11:55:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    819339 
d9e36ad622b1db499b3f623cdd46a811    2014-07-23 23:38:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    818022 
b9d13239f522e3a22f20d36ea6dab8ad    2014-07-23 01:41:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    803046 
c36164f3077894a1986c4922dfb632ec    2014-07-12 16:05:00.0    10020
17098    74127bc80ca759678892c957b6a34fc7    10020 
9182c1f48cd008e31d781abc45723a10    17098    788129 
32c966feab2212a29f86bebbaa6dfec9    2014-07-02 07:28:00.0    10020


As you can see, there are many t3_id  per single t2_id.

I need the same dataset, but i want only the most recent (one) record 
per table3, and, of course, directly in the join rule (in production i 
have a long and complex query).

My personal solution (very slow, i can't use it):

select * from table1 as t1
inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
inner join table3 t3 on (
t3.t3_id = (select t3_id from table3 where t2_id = t2.t2_id and t3_date 
<= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only)
and t3.t2_id = t2.t2_id
)
order by t3.t2_id, t3.t3_date desc


it gives me the right result, but performances are poor........

Same bad performance result with a function that performs the table3 query.

Suggestions?

Thanks to anyone who can answer to me!!!!

Best regards,

Agharta

Re: Retrieve most recent 1 record from joined table

От

Brice André

Дата:

22 августа 2014 г., 11:13:19

<div dir="ltr">Try to use a "GROUP BY " clause in your select, coupled with an aggregate function (MAX in your case ?).
Notethat you should not need any subquery, so a single select with inner join should be OK.<br /></div><div
class="gmail_extra"><br/><br /><div class="gmail_quote">2014-08-22 10:05 GMT+02:00 agharta <span dir="ltr"><<a
href="mailto:agharta82@gmail.com"target="_blank">agharta82@gmail.com</a>></span>:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi all,<br /> This is my
firstquestion, don't hate me please if it is the wrong place.<br /><br /> Mi question is relative seimple, but goes me
crazy.<br/><br /> First, create some example tables:<br /><br /> ----<br /> create table table1(<br />     t1_ID SERIAL
primarykey,<br />     t1_value text<br /> );<br /><br /> create table table2(<br />     t2_ID SERIAL primary key,<br />
   t2_value text,<br />     t1_id integer<br /> );<br /><br /><br /> create table table3(<br />     t3_ID SERIAL
primarykey,<br />     t3_value text,<br />     t3_date timestamp,<br />     t2_id integer<br /> );<br /><br /> ALTER
TABLEtable2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON DELETE CASCADE;<br /> ALTER TABLE table3 ADD FOREIGN
KEY(t2_id) REFERENCES table2 (t2_ID) ON DELETE CASCADE;<br /><br /> CREATE INDEX IDX_TABLE1_T1_value ON table1
(t1_value);<br/> CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);<br /> CREATE INDEX IDX_TABLE3_T3_value ON
table3(t3_value);<br /> CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);<br /> -----<br /><br /> As you can see,
table3is referenced to table2 and table2 is referenced to table1<br /><br /> Next, insert some data on tables:<br /><br
/>--one millon records into table1<br /> insert into table1(t1_value) select md5(random()::text) from (select
generate_series(1,1000000)as a) as c;<br /><br /> --100.000 records on table2 where t1_id is a random number between 1
and10.001. This guarantees many records pointing to same table1 t1_id<br /> insert into table2(t2_value, t1_id) select
md5(random()::text),trunc((random()+1)*10000)::<u></u>integer from (select generate_series(1,100000) as a) as c;<br
/><br/> --again, 1.022.401 records on table3 where t2_id is a random number between 1 and 10.001. This guarantee many
recordspointing to same table2 t2_id.<br /> -- random numbers (1.022.401) are generated by generated_series function
withtimestamp (1 minute) interval<br /> insert into table3(t3_value, t2_id, t3_date) select md5(random()::text),
trunc((random()+1)*10000)::<u></u>integer,c.date_val from (select generate_series(timestamp '2013-01-01',timestamp
'2014-12-12',interval '1 minute') as date_val) as c;<br /><br /><br /> So, now we should have table3 with many rows per
table2(t2_id)and many rows in table2 with same t1_id.<br /><br /><br /> Now, the question:<br /><br /><br /> Joining
thetables, how to get ONLY most recent record per table3(t3_date)??<br /><br /> Query example:<br /><br /> select *
fromtable1 as t1<br /> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )<br /> inner join
table3t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp '2014-08-20')<br /> order by t3.t2_id, t3.t3_date
desc<br/><br /> produces to me this dataset (extract)<br /><br /><br /> t1_id       t1_value      t2_id      t2_value 
        t1_id      t3_id         t3_value                             t3_date                            t2_id<br />
17098   74127bc80ca759678892c957b6a34f<u></u>c7    10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    857683
0fb6cb380522dd83b6ac4beba2c6e9<u></u>8f   2014-08-19 14:42:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    857088
9c3481bfc0bdea51e62b338a1777cd<u></u>e6   2014-08-19 04:47:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    853208
253dc2083e70dd9e276867f4388997<u></u>3f   2014-08-16 12:07:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    851237
970acf901c4232c178b5dbeda4e44a<u></u>c8   2014-08-15 03:16:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    847436
cb8d52255eaa850f9d7f203092a2ce<u></u>13   2014-08-12 11:55:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    819339
d9e36ad622b1db499b3f623cdd46a8<u></u>11   2014-07-23 23:38:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    818022
b9d13239f522e3a22f20d36ea6dab8<u></u>ad   2014-07-23 01:41:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    803046
c36164f3077894a1986c4922dfb632<u></u>ec   2014-07-12 16:05:00.0    10020<br /> 17098   
74127bc80ca759678892c957b6a34f<u></u>c7   10020 9182c1f48cd008e31d781abc45723a<u></u>10    17098    788129
32c966feab2212a29f86bebbaa6dfe<u></u>c9   2014-07-02 07:28:00.0    10020<br /><br /><br /> As you can see, there are
manyt3_id  per single t2_id.<br /><br /> I need the same dataset, but i want only the most recent (one) record per
table3,and, of course, directly in the join rule (in production i have a long and complex query).<br /><br /> My
personalsolution (very slow, i can't use it):<br /><br /> select * from table1 as t1<br /> inner join table2 t2 on
(t1.t1_id= t2.t1_id and t2.t2_value like('%ab%') )<br /> inner join table3 t3 on (<br /> t3.t3_id = (select t3_id from
table3where t2_id = t2.t2_id and t3_date <= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only)<br
/>and t3.t2_id = t2.t2_id<br /> )<br /> order by t3.t2_id, t3.t3_date desc<br /><br /><br /> it gives me the right
result,but performances are poor........<br /><br /> Same bad performance result with a function that performs the
table3query.<br /><br /> Suggestions?<br /><br /> Thanks to anyone who can answer to me!!!!<br /><br /> Best
regards,<br/><br /> Agharta<span class="HOEnZb"><font color="#888888"><br /><br /><br /><br /><br /><br /><br /><br
/><br/><br /><br /><br /> -- <br /> Sent via pgsql-sql mailing list (<a href="mailto:pgsql-sql@postgresql.org"
target="_blank">pgsql-sql@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-sql"
target="_blank">http://www.postgresql.org/<u></u>mailpref/pgsql-sql</a><br/></font></span></blockquote></div><br
/></div>

Re: Retrieve most recent 1 record from joined table

От

agharta

Дата:

22 августа 2014 г., 11:47:24

<div class="moz-cite-prefix">Hi André,<br /> Thanks a lot for you quickly reply!<br /><br /> Your solution is right,
butI'm sorry because i cannot apply a group by with max in my production query: it does other group by and sums.<br
/><br/> I've try another solution that gives me a better performances:<br /><br /> WITH filtered_data AS (<br />     
   select * from (<br />             SELECT *,  rank() OVER (PARTITION BY t2_id ORDER BY t3_date DESC) as rank FROM
table3<br/>             where t3_date <= timestamp '2014-08-20'<br />          ) xx where xx.rank =1<br /> )<br />
select* from table1 as t1<br /> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )<br /> inner
joinfiltered_data t3 on (t3.t2_id = t2.t2_id)<br /> order by t3.t2_id, t3.t3_date desc<br /><br /><br /> So, can it be
theright way to solve my problem? There are some side-effects? Could exist a better solution, aka  "1.000.000$
question";-)  ? <br /><br /><br /> Thanks again for your support!<br /><br /> Best regards,<br /><br /> Agharta<br
/><br/><br /><br /> On 08/22/2014 10:13 AM, Brice André wrote:<br /></div><blockquote
cite="mid:CAOBG12kfHhS=F-8p1SmpYd+2efDPBY2zUO3TDDEPQrbR-YRQ7w@mail.gmail.com"type="cite"><div dir="ltr">Try to use a
"GROUPBY " clause in your select, coupled with an aggregate function (MAX in your case ?). Note that you should not
needany subquery, so a single select with inner join should be OK.<br /></div><div class="gmail_extra"><br /><br /><div
class="gmail_quote">2014-08-2210:05 GMT+02:00 agharta <span dir="ltr"><<a href="mailto:agharta82@gmail.com"
moz-do-not-send="true"target="_blank">agharta82@gmail.com</a>></span>:<br /><blockquote class="gmail_quote"
style="margin:00 0           .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi all,<br /> This is my first
question,don't hate me please if it is the wrong place.<br /><br /> Mi question is relative seimple, but goes me
crazy.<br/><br /> First, create some example tables:<br /><br /> ----<br /> create table table1(<br />     t1_ID SERIAL
primarykey,<br />     t1_value text<br /> );<br /><br /> create table table2(<br />     t2_ID SERIAL primary key,<br />
   t2_value text,<br />     t1_id integer<br /> );<br /><br /><br /> create table table3(<br />     t3_ID SERIAL
primarykey,<br />     t3_value text,<br />     t3_date timestamp,<br />     t2_id integer<br /> );<br /><br /> ALTER
TABLEtable2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON DELETE CASCADE;<br /> ALTER TABLE table3 ADD FOREIGN
KEY(t2_id) REFERENCES table2 (t2_ID) ON DELETE CASCADE;<br /><br /> CREATE INDEX IDX_TABLE1_T1_value ON table1
(t1_value);<br/> CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);<br /> CREATE INDEX IDX_TABLE3_T3_value ON
table3(t3_value);<br /> CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);<br /> -----<br /><br /> As you can see,
table3is referenced to table2 and table2 is referenced to table1<br /><br /> Next, insert some data on tables:<br /><br
/>--one millon records into table1<br /> insert into table1(t1_value) select md5(random()::text) from (select
generate_series(1,1000000)as a) as c;<br /><br /> --100.000 records on table2 where t1_id is a random number between 1
and10.001. This guarantees many records pointing to same table1 t1_id<br /> insert into table2(t2_value, t1_id) select
md5(random()::text),trunc((random()+1)*10000)::integer from (select generate_series(1,100000) as a) as c;<br /><br />
--again,1.022.401 records on table3 where t2_id is a random number between 1 and 10.001. This guarantee many records
pointingto same table2 t2_id.<br /> -- random numbers (1.022.401) are generated by generated_series function with
timestamp(1 minute) interval<br /> insert into table3(t3_value, t2_id, t3_date) select md5(random()::text),
trunc((random()+1)*10000)::integer,c.date_val from (select generate_series(timestamp '2013-01-01',timestamp
'2014-12-12',interval '1 minute') as date_val) as c;<br /><br /><br /> So, now we should have table3 with many rows per
table2(t2_id)and many rows in table2 with same t1_id.<br /><br /><br /> Now, the question:<br /><br /><br /> Joining
thetables, how to get ONLY most recent record per table3(t3_date)??<br /><br /> Query example:<br /><br /> select *
fromtable1 as t1<br /> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )<br /> inner join
table3t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp '2014-08-20')<br /> order by t3.t2_id, t3.t3_date
desc<br/><br /> produces to me this dataset (extract)<br /><br /><br /> t1_id       t1_value      t2_id      t2_value 
        t1_id      t3_id         t3_value                             t3_date                            t2_id<br />
17098   74127bc80ca759678892c957b6a34fc7    10020 9182c1f48cd008e31d781abc45723a10    17098    857683
0fb6cb380522dd83b6ac4beba2c6e98f   2014-08-19 14:42:00.0    10020<br /> 17098    74127bc80ca759678892c957b6a34fc7   
100209182c1f48cd008e31d781abc45723a10    17098    857088 9c3481bfc0bdea51e62b338a1777cde6    2014-08-19 04:47:00.0   
10020<br/> 17098    74127bc80ca759678892c957b6a34fc7    10020 9182c1f48cd008e31d781abc45723a10    17098    853208
253dc2083e70dd9e276867f43889973f   2014-08-16 12:07:00.0    10020<br /> 17098    74127bc80ca759678892c957b6a34fc7   
100209182c1f48cd008e31d781abc45723a10    17098    851237 970acf901c4232c178b5dbeda4e44ac8    2014-08-15 03:16:00.0   
10020<br/> 17098    74127bc80ca759678892c957b6a34fc7    10020 9182c1f48cd008e31d781abc45723a10    17098    847436
cb8d52255eaa850f9d7f203092a2ce13   2014-08-12 11:55:00.0    10020<br /> 17098    74127bc80ca759678892c957b6a34fc7   
100209182c1f48cd008e31d781abc45723a10    17098    819339 d9e36ad622b1db499b3f623cdd46a811    2014-07-23 23:38:00.0   
10020<br/> 17098    74127bc80ca759678892c957b6a34fc7    10020 9182c1f48cd008e31d781abc45723a10    17098    818022
b9d13239f522e3a22f20d36ea6dab8ad   2014-07-23 01:41:00.0    10020<br /> 17098    74127bc80ca759678892c957b6a34fc7   
100209182c1f48cd008e31d781abc45723a10    17098    803046 c36164f3077894a1986c4922dfb632ec    2014-07-12 16:05:00.0   
10020<br/> 17098    74127bc80ca759678892c957b6a34fc7    10020 9182c1f48cd008e31d781abc45723a10    17098    788129
32c966feab2212a29f86bebbaa6dfec9   2014-07-02 07:28:00.0    10020<br /><br /><br /> As you can see, there are many
t3_id per single t2_id.<br /><br /> I need the same dataset, but i want only the most recent (one) record per table3,
and,of course, directly in the join rule (in production i have a long and complex query).<br /><br /> My personal
solution(very slow, i can't use it):<br /><br /> select * from table1 as t1<br /> inner join table2 t2 on (t1.t1_id =
t2.t1_idand t2.t2_value like('%ab%') )<br /> inner join table3 t3 on (<br /> t3.t3_id = (select t3_id from table3 where
t2_id= t2.t2_id and t3_date <= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only)<br /> and
t3.t2_id= t2.t2_id<br /> )<br /> order by t3.t2_id, t3.t3_date desc<br /><br /><br /> it gives me the right result, but
performancesare poor........<br /><br /> Same bad performance result with a function that performs the table3 query.<br
/><br/> Suggestions?<br /><br /> Thanks to anyone who can answer to me!!!!<br /><br /> Best regards,<br /><br />
Agharta<spanclass="HOEnZb"><font color="#888888"><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br
/>-- <br /> Sent via pgsql-sql mailing list (<a href="mailto:pgsql-sql@postgresql.org" moz-do-not-send="true"
target="_blank">pgsql-sql@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-sql"moz-do-not-send="true"
target="_blank">http://www.postgresql.org/mailpref/pgsql-sql</a><br/></font></span></blockquote></div><br
/></div></blockquote><br/>

Re: Retrieve most recent 1 record from joined table

От

Vik Fearing

Дата:

23 августа 2014 г., 15:17:06

On 08/22/2014 10:05 AM, agharta wrote:
> Hi all,
> This is my first question, don't hate me please if it is the wrong place.

This is the right place.  See below for my answer.


> Mi question is relative seimple, but goes me crazy.
> 
> First, create some example tables:
> 
> ----
> create table table1(
>     t1_ID SERIAL primary key,
>     t1_value text
> );
> 
> create table table2(
>     t2_ID SERIAL primary key,
>     t2_value text,
>     t1_id integer
> );
> 
> 
> create table table3(
>     t3_ID SERIAL primary key,
>     t3_value text,
>     t3_date timestamp,
>     t2_id integer
> );
> 
> ALTER TABLE table2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON
> DELETE CASCADE;
> ALTER TABLE table3 ADD FOREIGN KEY (t2_id) REFERENCES table2 (t2_ID) ON
> DELETE CASCADE;
> 
> CREATE INDEX IDX_TABLE1_T1_value ON table1 (t1_value);
> CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);
> CREATE INDEX IDX_TABLE3_T3_value ON table3 (t3_value);
> CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);
> -----
> 
> As you can see, table3 is referenced to table2 and table2 is referenced
> to table1
> 
> Next, insert some data on tables:
> 
> --one millon records into table1
> insert into table1(t1_value) select md5(random()::text) from (select
> generate_series(1,1000000) as a) as c;
> 
> --100.000 records on table2 where t1_id is a random number between 1 and
> 10.001. This guarantees many records pointing to same table1 t1_id
> insert into table2(t2_value, t1_id) select md5(random()::text),
> trunc((random()+1)*10000)::integer from (select
> generate_series(1,100000) as a) as c;
> 
> --again, 1.022.401 records on table3 where t2_id is a random number
> between 1 and 10.001. This guarantee many records pointing to same
> table2 t2_id.
> -- random numbers (1.022.401) are generated by generated_series function
> with timestamp (1 minute) interval
> insert into table3(t3_value, t2_id, t3_date) select md5(random()::text),
> trunc((random()+1)*10000)::integer, c.date_val from (select
> generate_series(timestamp '2013-01-01',timestamp '2014-12-12', interval
> '1 minute') as date_val) as c;
> 
> 
> So, now we should have table3 with many rows per table2(t2_id) and many
> rows in table2 with same t1_id.
> 
> 
> Now, the question:
> 
> 
> Joining the tables, how to get ONLY most recent record per
> table3(t3_date)??
> 
> Query example:
> 
> select * from table1 as t1
> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
> inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp
> '2014-08-20')
> order by t3.t2_id, t3.t3_date desc
> 
> produces to me this dataset (extract)
> 
> 
> t1_id       t1_value      t2_id      t2_value           t1_id     
> t3_id         t3_value                             t3_date             
>               t2_id
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    857683
> 0fb6cb380522dd83b6ac4beba2c6e98f    2014-08-19 14:42:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    857088
> 9c3481bfc0bdea51e62b338a1777cde6    2014-08-19 04:47:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    853208
> 253dc2083e70dd9e276867f43889973f    2014-08-16 12:07:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    851237
> 970acf901c4232c178b5dbeda4e44ac8    2014-08-15 03:16:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    847436
> cb8d52255eaa850f9d7f203092a2ce13    2014-08-12 11:55:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    819339
> d9e36ad622b1db499b3f623cdd46a811    2014-07-23 23:38:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    818022
> b9d13239f522e3a22f20d36ea6dab8ad    2014-07-23 01:41:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    803046
> c36164f3077894a1986c4922dfb632ec    2014-07-12 16:05:00.0    10020
> 17098    74127bc80ca759678892c957b6a34fc7    10020
> 9182c1f48cd008e31d781abc45723a10    17098    788129
> 32c966feab2212a29f86bebbaa6dfec9    2014-07-02 07:28:00.0    10020
> 
> 
> As you can see, there are many t3_id  per single t2_id.
> 
> I need the same dataset, but i want only the most recent (one) record
> per table3, and, of course, directly in the join rule (in production i
> have a long and complex query).
> 
> My personal solution (very slow, i can't use it):
> 
> select * from table1 as t1
> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
> inner join table3 t3 on (
> t3.t3_id = (select t3_id from table3 where t2_id = t2.t2_id and t3_date
> <= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only)
> and t3.t2_id = t2.t2_id
> )
> order by t3.t2_id, t3.t3_date desc
> 
> 
> it gives me the right result, but performances are poor........
> 
> Same bad performance result with a function that performs the table3 query.
> 
> Suggestions?
> 
> Thanks to anyone who can answer to me!!!!

You didn't say what version of PostgreSQL you're using.  The following
solution works with 9.3.

select *
from table1 as t1
join table2 as t2 on t2.t1_id = t1.t1_id and t2.t2_value like '%ab%'
join lateral (select *             from table3             where t2_id = t2.t2_id               and t3_date <=
'2014-08-20'            order by t3_date desc             limit 1) as t3 on true
 
order by t3.t2_id, t3.t3_date desc;

You will need the following indexes for it to give good performance:

create index on t2 using gist (t2_value gist_trgm_ops);
create index on t3 using btree (t2_id, t3_date desc);

That first index will require the pg_trgm extension that you can get with:

create extension pg_trgm;


If you are not using 9.3, I can give you a (much) more complicated
version that will perform well, but ideally you'll want to use the above.
-- 
Vik

Re: Retrieve most recent 1 record from joined table

От

Thomas Kellerer

Дата:

25 августа 2014 г., 08:59:42

agharta schrieb am 22.08.2014 um 10:05:
> Joining the tables, how to get ONLY most recent record per table3(t3_date)??
> 
> Query example:
> 
> select * from table1 as t1
> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
> inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp '2014-08-20')
> order by t3.t2_id, t3.t3_date desc
> 

This seems to be slightly faster, especially with the following index:
 create index idx_t3_combined on table3 (t2_id, t3_date desc, t3_id);

select *
from table1 as t1 join table2 t2 on t1.t1_id = t2.t1_id and t2.t2_value like '%ab%' join (    select distinct on
(t2_id)t3_id,            t3_date,           t2_id    from table3    order by t2_id, t3_date desc ) t3 on t3.t2_id =
t2.t2_id
 
order by t3.t2_id, t3.t3_date desc
;

I also had to increase the work_mem in order to avoid disk based sorting for the joins

Re: Retrieve most recent 1 record from joined table

От

agharta

Дата:

27 августа 2014 г., 12:03:29

On 08/23/2014 02:15 PM, Vik Fearing wrote:
> On 08/22/2014 10:05 AM, agharta wrote:
>> Hi all,
>> This is my first question, don't hate me please if it is the wrong place.
> This is the right place.  See below for my answer.
>
>
>> Mi question is relative seimple, but goes me crazy.
>>
>> First, create some example tables:
>>
>> ----
>> create table table1(
>>      t1_ID SERIAL primary key,
>>      t1_value text
>> );
>>
>> create table table2(
>>      t2_ID SERIAL primary key,
>>      t2_value text,
>>      t1_id integer
>> );
>>
>>
>> create table table3(
>>      t3_ID SERIAL primary key,
>>      t3_value text,
>>      t3_date timestamp,
>>      t2_id integer
>> );
>>
>> ALTER TABLE table2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON
>> DELETE CASCADE;
>> ALTER TABLE table3 ADD FOREIGN KEY (t2_id) REFERENCES table2 (t2_ID) ON
>> DELETE CASCADE;
>>
>> CREATE INDEX IDX_TABLE1_T1_value ON table1 (t1_value);
>> CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);
>> CREATE INDEX IDX_TABLE3_T3_value ON table3 (t3_value);
>> CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);
>> -----
>>
>> As you can see, table3 is referenced to table2 and table2 is referenced
>> to table1
>>
>> Next, insert some data on tables:
>>
>> --one millon records into table1
>> insert into table1(t1_value) select md5(random()::text) from (select
>> generate_series(1,1000000) as a) as c;
>>
>> --100.000 records on table2 where t1_id is a random number between 1 and
>> 10.001. This guarantees many records pointing to same table1 t1_id
>> insert into table2(t2_value, t1_id) select md5(random()::text),
>> trunc((random()+1)*10000)::integer from (select
>> generate_series(1,100000) as a) as c;
>>
>> --again, 1.022.401 records on table3 where t2_id is a random number
>> between 1 and 10.001. This guarantee many records pointing to same
>> table2 t2_id.
>> -- random numbers (1.022.401) are generated by generated_series function
>> with timestamp (1 minute) interval
>> insert into table3(t3_value, t2_id, t3_date) select md5(random()::text),
>> trunc((random()+1)*10000)::integer, c.date_val from (select
>> generate_series(timestamp '2013-01-01',timestamp '2014-12-12', interval
>> '1 minute') as date_val) as c;
>>
>>
>> So, now we should have table3 with many rows per table2(t2_id) and many
>> rows in table2 with same t1_id.
>>
>>
>> Now, the question:
>>
>>
>> Joining the tables, how to get ONLY most recent record per
>> table3(t3_date)??
>>
>> Query example:
>>
>> select * from table1 as t1
>> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
>> inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp
>> '2014-08-20')
>> order by t3.t2_id, t3.t3_date desc
>>
>> produces to me this dataset (extract)
>>
>>
>> t1_id       t1_value      t2_id      t2_value           t1_id
>> t3_id         t3_value                             t3_date
>>                t2_id
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    857683
>> 0fb6cb380522dd83b6ac4beba2c6e98f    2014-08-19 14:42:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    857088
>> 9c3481bfc0bdea51e62b338a1777cde6    2014-08-19 04:47:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    853208
>> 253dc2083e70dd9e276867f43889973f    2014-08-16 12:07:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    851237
>> 970acf901c4232c178b5dbeda4e44ac8    2014-08-15 03:16:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    847436
>> cb8d52255eaa850f9d7f203092a2ce13    2014-08-12 11:55:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    819339
>> d9e36ad622b1db499b3f623cdd46a811    2014-07-23 23:38:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    818022
>> b9d13239f522e3a22f20d36ea6dab8ad    2014-07-23 01:41:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    803046
>> c36164f3077894a1986c4922dfb632ec    2014-07-12 16:05:00.0    10020
>> 17098    74127bc80ca759678892c957b6a34fc7    10020
>> 9182c1f48cd008e31d781abc45723a10    17098    788129
>> 32c966feab2212a29f86bebbaa6dfec9    2014-07-02 07:28:00.0    10020
>>
>>
>> As you can see, there are many t3_id  per single t2_id.
>>
>> I need the same dataset, but i want only the most recent (one) record
>> per table3, and, of course, directly in the join rule (in production i
>> have a long and complex query).
>>
>> My personal solution (very slow, i can't use it):
>>
>> select * from table1 as t1
>> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
>> inner join table3 t3 on (
>> t3.t3_id = (select t3_id from table3 where t2_id = t2.t2_id and t3_date
>> <= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only)
>> and t3.t2_id = t2.t2_id
>> )
>> order by t3.t2_id, t3.t3_date desc
>>
>>
>> it gives me the right result, but performances are poor........
>>
>> Same bad performance result with a function that performs the table3 query.
>>
>> Suggestions?
>>
>> Thanks to anyone who can answer to me!!!!
> You didn't say what version of PostgreSQL you're using.  The following
> solution works with 9.3.
>
> select *
> from table1 as t1
> join table2 as t2 on t2.t1_id = t1.t1_id and t2.t2_value like '%ab%'
> join lateral (select *
>                from table3
>                where t2_id = t2.t2_id
>                  and t3_date <= '2014-08-20'
>                order by t3_date desc
>                limit 1) as t3 on true
> order by t3.t2_id, t3.t3_date desc;
>
> You will need the following indexes for it to give good performance:
>
> create index on t2 using gist (t2_value gist_trgm_ops);
> create index on t3 using btree (t2_id, t3_date desc);
>
> That first index will require the pg_trgm extension that you can get with:
>
> create extension pg_trgm;
>
>
> If you are not using 9.3, I can give you a (much) more complicated
> version that will perform well, but ideally you'll want to use the above.


Hi Vik,

Sorry for late, i was testing your solution and.....what that performance!!!


I'm using 9.3, i've don't say it, my mistake.


Using join lateral with a lot of analize command to see where put the 
right index, performance becomes amazing!!


Thanks again for the priceless suggestion!!!


.....join lateral.... what a beautiful keywords :-)

Cheers,

Agharta

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Retrieve most recent 1 record from joined table

Retrieve most recent 1 record from joined table

Re: Retrieve most recent 1 record from joined table

Re: Retrieve most recent 1 record from joined table

Re: Retrieve most recent 1 record from joined table

Re: Retrieve most recent 1 record from joined table

Re: Retrieve most recent 1 record from joined table