Обсуждение: Window function sort order help
<div class="WordSection1"><p class="MsoNormal">Having trouble getting a window function to sort correctly. <p class="MsoNormal"> <pclass="MsoNormal">Given this data<p class="MsoNormal"> <p class="MsoNormal"> consumer_id | move_date | history_timestamp<p class="MsoNormal">-------------+------------+----------------------------<p class="MsoNormal"> 12345| 2008-01-05 | 2007-12-11 06:02:26.842171<p class="MsoNormal"> 12345| 2008-02-29 | 2008-02-0507:22:38.04067<p class="MsoNormal"> 12345| 2008-02-29 | 2008-07-11 09:03:42.44044<p class="MsoNormal"> 23456| 2009-01-01 | 2008-11-12 07:33:32.656658 <--<p class="MsoNormal"> 23456| 2009-01-28| 2008-11-14 01:57:40.264335<p class="MsoNormal"> 23456| 2009-01-01 | 2008-12-04 17:14:20.279999 <--<pclass="MsoNormal"> 23456| 2009-01-01 | 2008-12-31 00:33:37.204968<p class="MsoNormal"> 23456| 2009-01-01| 2011-06-08 04:16:41.646521<p class="MsoNormal"> 34567| 2010-05-07 | 2010-06-08 05:14:43.842172<p class="MsoNormal"> <pclass="MsoNormal">I’m trying to get the timestamp when the consumer last changed their move_date. (Noticeconsumer_id 23456 set their move_date to 2009-01-01 then changed and then changed it back. In the end, I want thetimestamp from when they changed it to 2009-01-01 the second time.)<p class="MsoNormal"> <p class="MsoNormal">My thoughtwas to do an intermediary step to find the timestamp for each time it switched. From there I can grab the max(timestamp)for each consumer.<p class="MsoNormal"> <p class="MsoNormal">Here’s my query. <p class="MsoNormal">selectch.consumer_id, ch.move_date,ch.history_timestamp, min(ch.history_timestamp) over w as start_time<pclass="MsoNormal"> from consumer_hist ch<p class="MsoNormal">window w as (partition by ch.consumer_id,ch.move_date order by ch.consumer_id,ch.history_timestamp asc)<p class="MsoNormal"> order by ch.consumer_id, ch.history_timestamp asc<p class="MsoNormal"> <p class="MsoNormal"> <p class="MsoNormal">I expect the followingresult<p class="MsoNormal">consumer_id | move_date | history_timestamp | start_time<p class="MsoNormal">-------------+------------+----------------------------+----------------------------<p class="MsoNormal"> 12345 | 2008-01-05 | 2007-12-11 06:02:26.842171 | 2007-12-11 06:02:26.842171<p class="MsoNormal"> 12345 | 2008-02-29 | 2008-02-05 07:22:38.04067 | 2008-02-05 07:22:38.04067<p class="MsoNormal"> 12345 | 2008-02-29 | 2008-07-11 09:03:42.44044 | 2008-02-05 07:22:38.04067<p class="MsoNormal"> 23456 | 2009-01-01 | 2008-11-12 07:33:32.656658 | 2008-11-12 07:33:32.656658 <--<p class="MsoNormal"> 23456 | 2009-01-28 | 2008-11-14 01:57:40.264335 | 2008-11-14 01:57:40.264335<p class="MsoNormal"> 23456 | 2009-01-01 | 2008-12-04 17:14:20.279999 | 2008-12-04 17:14:20.279999 <--<p class="MsoNormal"> 23456 | 2009-01-01 | 2008-12-31 00:33:37.204968 | 2008-12-04 17:14:20.279999<p class="MsoNormal"> 23456 | 2009-01-01 | 2011-06-08 04:16:41.646521 | 2008-12-04 17:14:20.279999<p class="MsoNormal"> 34567 | 2010-05-07 | 2010-06-08 05:14:43.842172 | 2010-06-08 05:14:43.842172<p class="MsoNormal"> <pclass="MsoNormal">But the windowing function appears to be doing the partitioning first then the orderby. <p class="MsoNormal"> <p class="MsoNormal">So instead, I’m getting<p class="MsoNormal"> <p class="MsoNormal">consumer_id| move_date | history_timestamp | start_time<p class="MsoNormal">-------------+------------+----------------------------+----------------------------<p class="MsoNormal"> 12345 | 2008-01-05 | 2007-12-11 06:02:26.842171 | 2007-12-11 06:02:26.842171<p class="MsoNormal"> 12345 | 2008-02-29 | 2008-02-05 07:22:38.04067 | 2008-02-05 07:22:38.04067<p class="MsoNormal"> 12345 | 2008-02-29 | 2008-07-11 09:03:42.44044 | 2008-02-05 07:22:38.04067<p class="MsoNormal"> 23456 | 2009-01-01 | 2008-11-12 07:33:32.656658 | 2008-11-12 07:33:32.656658 <--<p class="MsoNormal"> 23456 | 2009-01-28 | 2008-11-14 01:57:40.264335 | 2008-11-14 01:57:40.264335<p class="MsoNormal"> 23456 | 2009-01-01 | 2008-12-04 17:14:20.279999 | 2008-11-12 07:33:32.656658 <--<p class="MsoNormal"> 23456 | 2009-01-01 | 2008-12-31 00:33:37.204968 | 2008-11-12 07:33:32.656658<p class="MsoNormal"> 23456 | 2009-01-01 | 2011-06-08 04:16:41.646521 | 2008-11-12 07:33:32.656658<p class="MsoNormal"> 34567 | 2010-05-07 | 2010-06-08 05:14:43.842172 | 2010-06-08 05:14:43.842172<p class="MsoNormal"> <pclass="MsoNormal">Any suggestions to get the order by to occur first then the partition by or maybethere another approach that I could use?<p class="MsoNormal"> <p class="MsoNormal">Dianna</div>
2011/9/13 Dianna Harter <dharter@mynewplace.com>: Hi, > [snip] > Any suggestions to get the order by to occur first then the partition by or > maybe there another approach that I could use? I tried to write the query without using the window: SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , MIN( `Y`.`history_timestamp` ) AS `start_time` FROM `Table` AS `X` JOIN `Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id` AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >= `Y`.`history_timestamp` LEFT JOIN `Table` AS `Z` ON `X`.`consumer_id` = `Z`.`consumer_id` AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >= `Z`.`history_timestamp` AND `Y`.`history_timestamp` <= `Z`.`history_timestamp` WHERE `Z`.`consumer_id` IS NULL GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC To avoid the LEFT JOIN, you can move the control in the sub-query: SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , MIN( `Y`.`history_timestamp` ) AS `start_time` FROM `Table` AS `X` JOIN `Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id` AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >= `Y`.`history_timestamp` WHERE NOT EXISTS ( SELECT * FROM `Table` AS `Z` WHERE `X`.`consumer_id` = `Z`.`consumer_id` AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >= `Z`.`history_timestamp` AND `Y`.`history_timestamp` <= `Z`.`history_timestamp` ) GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC With Y I select history_timestamp preceding the current row with the same move_date. With Z I verify that no changes have occurred to move_date between X.history_timestamp and Y.history_timestamp. Sorry, I responded with the wrong email address.. Nicoletta
Nicoletta Maia, 14.09.2011 10:30: > SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , > MIN( `Y`.`history_timestamp` ) AS `start_time` > FROM > `Table` AS `X` > JOIN > `Table` AS `Y` > ON `X`.`consumer_id` = `Y`.`consumer_id` > AND `X`.`move_date` = `Y`.`move_date` > AND `X`.`history_timestamp`>= `Y`.`history_timestamp` > WHERE NOT EXISTS ( > SELECT * > FROM `Table` AS `Z` > WHERE `X`.`consumer_id` = `Z`.`consumer_id` > AND `X`.`move_date`<> `Z`.`move_date` > AND `X`.`history_timestamp`>= `Z`.`history_timestamp` > AND `Y`.`history_timestamp`<= `Z`.`history_timestamp` > ) > GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` > ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC > > With Y I select history_timestamp preceding the current row with the > same move_date. > With Z I verify that no changes have occurred to move_date between > X.history_timestamp and Y.history_timestamp. > That is not a valid PostgreSQL SQL statement. Postgres does not use "backticks" for quoting, it uses the standard double quotes. Thomas
Thank you. Interesting solution. Unfortunately, it's performance is not very good, since it involves joining a large table3 times. I do have a solution that uses a temp table, but I was trying to rework it into a single query to improveperformance. Thank you again for your help. Dianna 2011/9/13 nicoletta maia : [snip] > I tried to write the query without using the window: > > SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , > MIN( `Y`.`history_timestamp` ) AS `start_time` > FROM > `Table` AS `X` > JOIN > `Table` AS `Y` > ON `X`.`consumer_id` = `Y`.`consumer_id` > AND `X`.`move_date` = `Y`.`move_date` > AND `X`.`history_timestamp` >= `Y`.`history_timestamp` > LEFT JOIN > `Table` AS `Z` > ON `X`.`consumer_id` = `Z`.`consumer_id` > AND `X`.`move_date` <> `Z`.`move_date` > AND `X`.`history_timestamp` >= `Z`.`history_timestamp` > AND `Y`.`history_timestamp` <= `Z`.`history_timestamp` > WHERE `Z`.`consumer_id` IS NULL > GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` > ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC
2011/9/14 Dianna Harter <dharter@mynewplace.com>: > [snip] > Thank you again for your help. You're welcome. Sorry about the syntax, but I could only try on MySQL... and I was curious about the problem :) Nicoletta
Dianna Harter <dharter@mynewplace.com> wrote: > Having trouble getting a window function to sort correctly. > Given this data > consumer_id | move_date | history_timestamp > -------------+------------+---------------------------- > 12345| 2008-01-05 | 2007-12-11 06:02:26.842171 > 12345| 2008-02-29 | 2008-02-05 07:22:38.04067 > 12345| 2008-02-29 | 2008-07-11 09:03:42.44044 > 23456| 2009-01-01 | 2008-11-12 07:33:32.656658 <-- > 23456| 2009-01-28 | 2008-11-14 01:57:40.264335 > 23456| 2009-01-01 | 2008-12-04 17:14:20.279999 <-- > 23456| 2009-01-01 | 2008-12-31 00:33:37.204968 > 23456| 2009-01-01 | 2011-06-08 04:16:41.646521 > 34567| 2010-05-07 | 2010-06-08 05:14:43.842172 > I'm trying to get the timestamp when the consumer last changed their move_date. (Notice consumer_id 23456 set their move_dateto 2009-01-01 then changed and then changed it back. In the end, I want the timestamp from when they changed itto 2009-01-01 the second time.) > My thought was to do an intermediary step to find the timestamp for each time it switched. From there I can grab the max(timestamp)for each consumer. > [...] > Any suggestions to get the order by to occur first then the partition by or maybe there another approach that I could use? If I understand the question correctly, try: | SELECT DISTINCT ON (consumer_id) consumer_id, move_date, history_timestamp | FROM (SELECT consumer_id, move_date, history_timestamp, | LAG(move_date) OVER (PARTITION BY consumer_id ORDER BY consumer_id, history_timestamp) AS previous_move_date | FROM consumer_hist) AS SubQuery | WHERE move_date IS DISTINCT FROM previous_move_date | ORDER BY consumer_id, history_timestamp DESC; Tim
2011/9/13 Dianna Harter <dharter@mynewplace.com>: Hi, > [snip] > Any suggestions to get the order by to occur first then the partition by or > maybe there another approach that I could use? I tried to write the query without using the window: SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , MIN( `Y`.`history_timestamp` ) AS `start_time` FROM`Table` AS `X` JOIN`Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id` AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >= `Y`.`history_timestamp` LEFT JOIN`Table` AS `Z` ON `X`.`consumer_id`= `Z`.`consumer_id` AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >= `Z`.`history_timestamp` AND `Y`.`history_timestamp` <= `Z`.`history_timestamp` WHERE `Z`.`consumer_id` IS NULL GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC To avoid the LEFT JOIN, you can move the control in the sub-query: SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ,MIN( `Y`.`history_timestamp` ) AS `start_time` FROM`Table` AS `X` JOIN`Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id` AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >= `Y`.`history_timestamp` WHERE NOT EXISTS (SELECT *FROM `Table` AS `Z`WHERE `X`.`consumer_id` = `Z`.`consumer_id` AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >= `Z`.`history_timestamp` AND `Y`.`history_timestamp` <= `Z`.`history_timestamp`) GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC With Y I select history_timestamp preceding the current row with the same move_date. With Z I verify that no changes have occurred to move_date between X.history_timestamp and Y.history_timestamp. Ciao! Nicoletta
Thank you. It works perfect. Dianna Tim Landscheidt < tim@tim-landscheidt.de > wrote: > If I understand the question correctly, try: > > | SELECT DISTINCT ON (consumer_id) consumer_id, move_date, history_timestamp > | FROM (SELECT consumer_id, move_date, history_timestamp, > | LAG(move_date) OVER (PARTITION BY consumer_id ORDER BY consumer_id, history_timestamp) AS previous_move_date > | FROM consumer_hist) AS SubQuery > | WHERE move_date IS DISTINCT FROM previous_move_date > | ORDER BY consumer_id, history_timestamp DESC; > > Tim