Large Table - Slow Window Functions (Better Approach?)

Поиск
Список
Период
Сортировка
От Jeff Adams - NOAA Affiliate
Тема Large Table - Slow Window Functions (Better Approach?)
Дата
Msg-id CA+BdxK9ko0L=_2aEimOsOUiOadxG3wE8TyzT7dOms9FF0ORyOA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Large Table - Slow Window Functions (Better Approach?)
Re: Large Table - Slow Window Functions (Better Approach?)
Список pgsql-performance

Greetings,

 

I have a large table (~90 million rows) containing vessel positions. In addition to a column that contains the location information (the_geom), the table also contains two columns that are used to uniquely identify the vessel (mmsi and name) and a column containing the Unix time (epoch) at which the position information was logged. I frequently need to assign records to vessel transits. To do this, I currently create a CTE that uses a Window function (partitioning the data by mmsi and name ordered by epoch) to examine the time that has elapsed between successive position reports for individual vessels. For every position record for a vessel (as identified using mmsi and name), if the time elapsed between the current position record and the previous record (using the lag function) is less than or equal to 2 hours, I assign the record a value of 0 to a CTE column named tr_index. If the time elapsed is greater than 2 hours, I assign the record a value of 1 to the tr_index column. I then use the CTE to generate transit numbers by summing the values in the tr_index field across a Window that also partitions the data by mmsi and name and is ordered by epoch. This works, but is very slow (hours). The table is indexed (multi-column index on mmsi, name and index on epoch). Does anyone see a way to get what I am after in a more efficient manner. What I am after is an assignment of transit number to vessels' position records based on whether the records were within two hours of each other. The SQL that I used is provided below. Any advice would be greatly appreciated...

 

WITH

cte_01 AS

(

SELECT

a.id,

a.mmsi,

a.name,

a.epoch,

a.the_geom

CASE

  WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1

  ELSE 0

END AS tr_index 

FROM table a

WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)

)

 

 

SELECT

a.id,

a.mmsi,

a.name,

a.epoch,

a.the_geom,

1 + sum(a.tr_index) OVER w AS transit,

a.active

FROM cte_01 a

WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)

--
Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: sniff test on some PG 8.4 numbers
Следующее
От: Jon Nelson
Дата:
Сообщение: Re: sniff test on some PG 8.4 numbers