Обсуждение: Window function sort order help
Having trouble getting a window function to sort correctly.
Given this data
consumer_id | move_date | history_timestamp
-------------+------------+----------------------------
12345| 2008-01-05 | 2007-12-11 06:02:26.842171
12345| 2008-02-29 | 2008-02-05 07:22:38.04067
12345| 2008-02-29 | 2008-07-11 09:03:42.44044
23456| 2009-01-01 | 2008-11-12 07:33:32.656658 <--
23456| 2009-01-28 | 2008-11-14 01:57:40.264335
23456| 2009-01-01 | 2008-12-04 17:14:20.279999 <--
23456| 2009-01-01 | 2008-12-31 00:33:37.204968
23456| 2009-01-01 | 2011-06-08 04:16:41.646521
34567| 2010-05-07 | 2010-06-08 05:14:43.842172
I’m trying to get the timestamp when the consumer last changed their move_date. (Notice consumer_id 23456 set their move_date to 2009-01-01 then changed and then changed it back. In the end, I want the timestamp from when they changed it to 2009-01-01 the second time.)
My thought was to do an intermediary step to find the timestamp for each time it switched. From there I can grab the max(timestamp) for each consumer.
Here’s my query.
select ch.consumer_id, ch.move_date,ch.history_timestamp, min(ch.history_timestamp) over w as start_time
from consumer_hist ch
window w as (partition by ch.consumer_id, ch.move_date order by ch.consumer_id,ch.history_timestamp asc)
order by ch.consumer_id , ch.history_timestamp asc
I expect the following result
consumer_id | move_date | history_timestamp | start_time
-------------+------------+----------------------------+----------------------------
12345 | 2008-01-05 | 2007-12-11 06:02:26.842171 | 2007-12-11 06:02:26.842171
12345 | 2008-02-29 | 2008-02-05 07:22:38.04067 | 2008-02-05 07:22:38.04067
12345 | 2008-02-29 | 2008-07-11 09:03:42.44044 | 2008-02-05 07:22:38.04067
23456 | 2009-01-01 | 2008-11-12 07:33:32.656658 | 2008-11-12 07:33:32.656658 <--
23456 | 2009-01-28 | 2008-11-14 01:57:40.264335 | 2008-11-14 01:57:40.264335
23456 | 2009-01-01 | 2008-12-04 17:14:20.279999 | 2008-12-04 17:14:20.279999 <--
23456 | 2009-01-01 | 2008-12-31 00:33:37.204968 | 2008-12-04 17:14:20.279999
23456 | 2009-01-01 | 2011-06-08 04:16:41.646521 | 2008-12-04 17:14:20.279999
34567 | 2010-05-07 | 2010-06-08 05:14:43.842172 | 2010-06-08 05:14:43.842172
But the windowing function appears to be doing the partitioning first then the order by.
So instead, I’m getting
consumer_id | move_date | history_timestamp | start_time
-------------+------------+----------------------------+----------------------------
12345 | 2008-01-05 | 2007-12-11 06:02:26.842171 | 2007-12-11 06:02:26.842171
12345 | 2008-02-29 | 2008-02-05 07:22:38.04067 | 2008-02-05 07:22:38.04067
12345 | 2008-02-29 | 2008-07-11 09:03:42.44044 | 2008-02-05 07:22:38.04067
23456 | 2009-01-01 | 2008-11-12 07:33:32.656658 | 2008-11-12 07:33:32.656658 <--
23456 | 2009-01-28 | 2008-11-14 01:57:40.264335 | 2008-11-14 01:57:40.264335
23456 | 2009-01-01 | 2008-12-04 17:14:20.279999 | 2008-11-12 07:33:32.656658 <--
23456 | 2009-01-01 | 2008-12-31 00:33:37.204968 | 2008-11-12 07:33:32.656658
23456 | 2009-01-01 | 2011-06-08 04:16:41.646521 | 2008-11-12 07:33:32.656658
34567 | 2010-05-07 | 2010-06-08 05:14:43.842172 | 2010-06-08 05:14:43.842172
Any suggestions to get the order by to occur first then the partition by or maybe there another approach that I could use?
Dianna
2011/9/13 Dianna Harter <dharter@mynewplace.com>:
Hi,
> [snip]
> Any suggestions to get the order by to occur first then the partition by or
> maybe there another approach that I could use?
I tried to write the query without using the window:
SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , MIN( `Y`.`history_timestamp` ) AS
`start_time`
FROM `Table` AS `X` JOIN `Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id`
AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >=
`Y`.`history_timestamp` LEFT JOIN `Table` AS `Z` ON `X`.`consumer_id` =
`Z`.`consumer_id` AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >=
`Z`.`history_timestamp` AND `Y`.`history_timestamp` <= `Z`.`history_timestamp`
WHERE `Z`.`consumer_id` IS NULL
GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp`
ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC
To avoid the LEFT JOIN, you can move the control in the sub-query:
SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , MIN( `Y`.`history_timestamp` ) AS
`start_time`
FROM `Table` AS `X` JOIN `Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id`
AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >=
`Y`.`history_timestamp`
WHERE NOT EXISTS ( SELECT * FROM `Table` AS `Z` WHERE `X`.`consumer_id` = `Z`.`consumer_id`
AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >= `Z`.`history_timestamp`
AND `Y`.`history_timestamp` <= `Z`.`history_timestamp` )
GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp`
ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC
With Y I select history_timestamp preceding the current row with the
same move_date.
With Z I verify that no changes have occurred to move_date between
X.history_timestamp and Y.history_timestamp.
Sorry, I responded with the wrong email address..
Nicoletta
Nicoletta Maia, 14.09.2011 10:30: > SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , > MIN( `Y`.`history_timestamp` ) AS `start_time` > FROM > `Table` AS `X` > JOIN > `Table` AS `Y` > ON `X`.`consumer_id` = `Y`.`consumer_id` > AND `X`.`move_date` = `Y`.`move_date` > AND `X`.`history_timestamp`>= `Y`.`history_timestamp` > WHERE NOT EXISTS ( > SELECT * > FROM `Table` AS `Z` > WHERE `X`.`consumer_id` = `Z`.`consumer_id` > AND `X`.`move_date`<> `Z`.`move_date` > AND `X`.`history_timestamp`>= `Z`.`history_timestamp` > AND `Y`.`history_timestamp`<= `Z`.`history_timestamp` > ) > GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` > ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC > > With Y I select history_timestamp preceding the current row with the > same move_date. > With Z I verify that no changes have occurred to move_date between > X.history_timestamp and Y.history_timestamp. > That is not a valid PostgreSQL SQL statement. Postgres does not use "backticks" for quoting, it uses the standard double quotes. Thomas
Thank you. Interesting solution. Unfortunately, it's performance is not very good, since it involves joining a large table3 times. I do have a solution that uses a temp table, but I was trying to rework it into a single query to improveperformance. Thank you again for your help. Dianna 2011/9/13 nicoletta maia : [snip] > I tried to write the query without using the window: > > SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , > MIN( `Y`.`history_timestamp` ) AS `start_time` > FROM > `Table` AS `X` > JOIN > `Table` AS `Y` > ON `X`.`consumer_id` = `Y`.`consumer_id` > AND `X`.`move_date` = `Y`.`move_date` > AND `X`.`history_timestamp` >= `Y`.`history_timestamp` > LEFT JOIN > `Table` AS `Z` > ON `X`.`consumer_id` = `Z`.`consumer_id` > AND `X`.`move_date` <> `Z`.`move_date` > AND `X`.`history_timestamp` >= `Z`.`history_timestamp` > AND `Y`.`history_timestamp` <= `Z`.`history_timestamp` > WHERE `Z`.`consumer_id` IS NULL > GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` > ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC
2011/9/14 Dianna Harter <dharter@mynewplace.com>: > [snip] > Thank you again for your help. You're welcome. Sorry about the syntax, but I could only try on MySQL... and I was curious about the problem :) Nicoletta
Dianna Harter <dharter@mynewplace.com> wrote: > Having trouble getting a window function to sort correctly. > Given this data > consumer_id | move_date | history_timestamp > -------------+------------+---------------------------- > 12345| 2008-01-05 | 2007-12-11 06:02:26.842171 > 12345| 2008-02-29 | 2008-02-05 07:22:38.04067 > 12345| 2008-02-29 | 2008-07-11 09:03:42.44044 > 23456| 2009-01-01 | 2008-11-12 07:33:32.656658 <-- > 23456| 2009-01-28 | 2008-11-14 01:57:40.264335 > 23456| 2009-01-01 | 2008-12-04 17:14:20.279999 <-- > 23456| 2009-01-01 | 2008-12-31 00:33:37.204968 > 23456| 2009-01-01 | 2011-06-08 04:16:41.646521 > 34567| 2010-05-07 | 2010-06-08 05:14:43.842172 > I'm trying to get the timestamp when the consumer last changed their move_date. (Notice consumer_id 23456 set their move_dateto 2009-01-01 then changed and then changed it back. In the end, I want the timestamp from when they changed itto 2009-01-01 the second time.) > My thought was to do an intermediary step to find the timestamp for each time it switched. From there I can grab the max(timestamp)for each consumer. > [...] > Any suggestions to get the order by to occur first then the partition by or maybe there another approach that I could use? If I understand the question correctly, try: | SELECT DISTINCT ON (consumer_id) consumer_id, move_date, history_timestamp | FROM (SELECT consumer_id, move_date, history_timestamp, | LAG(move_date) OVER (PARTITION BY consumer_id ORDER BY consumer_id, history_timestamp) AS previous_move_date | FROM consumer_hist) AS SubQuery | WHERE move_date IS DISTINCT FROM previous_move_date | ORDER BY consumer_id, history_timestamp DESC; Tim
2011/9/13 Dianna Harter <dharter@mynewplace.com>: Hi, > [snip] > Any suggestions to get the order by to occur first then the partition by or > maybe there another approach that I could use? I tried to write the query without using the window: SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` , MIN( `Y`.`history_timestamp` ) AS `start_time` FROM`Table` AS `X` JOIN`Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id` AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >= `Y`.`history_timestamp` LEFT JOIN`Table` AS `Z` ON `X`.`consumer_id`= `Z`.`consumer_id` AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >= `Z`.`history_timestamp` AND `Y`.`history_timestamp` <= `Z`.`history_timestamp` WHERE `Z`.`consumer_id` IS NULL GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC To avoid the LEFT JOIN, you can move the control in the sub-query: SELECT `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ,MIN( `Y`.`history_timestamp` ) AS `start_time` FROM`Table` AS `X` JOIN`Table` AS `Y` ON `X`.`consumer_id` = `Y`.`consumer_id` AND `X`.`move_date` = `Y`.`move_date` AND `X`.`history_timestamp` >= `Y`.`history_timestamp` WHERE NOT EXISTS (SELECT *FROM `Table` AS `Z`WHERE `X`.`consumer_id` = `Z`.`consumer_id` AND `X`.`move_date` <> `Z`.`move_date` AND `X`.`history_timestamp` >= `Z`.`history_timestamp` AND `Y`.`history_timestamp` <= `Z`.`history_timestamp`) GROUP BY `X`.`consumer_id`, `X`.`move_date` , `X`.`history_timestamp` ORDER BY `X`.`consumer_id`, `X`.`history_timestamp` ASC With Y I select history_timestamp preceding the current row with the same move_date. With Z I verify that no changes have occurred to move_date between X.history_timestamp and Y.history_timestamp. Ciao! Nicoletta
Thank you. It works perfect. Dianna Tim Landscheidt < tim@tim-landscheidt.de > wrote: > If I understand the question correctly, try: > > | SELECT DISTINCT ON (consumer_id) consumer_id, move_date, history_timestamp > | FROM (SELECT consumer_id, move_date, history_timestamp, > | LAG(move_date) OVER (PARTITION BY consumer_id ORDER BY consumer_id, history_timestamp) AS previous_move_date > | FROM consumer_hist) AS SubQuery > | WHERE move_date IS DISTINCT FROM previous_move_date > | ORDER BY consumer_id, history_timestamp DESC; > > Tim