Re: Large Table - Slow Window Functions (Better Approach?)

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Large Table - Slow Window Functions (Better Approach?)
Дата
Msg-id CAFj8pRCfZFVD6yE1D_4U1mEfmn9v34E6j=315b1g-9d5Pvcs=w@mail.gmail.com
обсуждение исходный текст
Ответ на Large Table - Slow Window Functions (Better Approach?)  (Jeff Adams - NOAA Affiliate <jeff.adams@noaa.gov>)
Ответы Re: Large Table - Slow Window Functions (Better Approach?)
Список pgsql-performance
Hello

you can try procedural solution - use a cursor over ordered data in
plpgsql and returns table

Regards

Pavel Stehule

2013/3/11 Jeff Adams - NOAA Affiliate <jeff.adams@noaa.gov>:
> Greetings,
>
>
>
> I have a large table (~90 million rows) containing vessel positions. In
> addition to a column that contains the location information (the_geom), the
> table also contains two columns that are used to uniquely identify the
> vessel (mmsi and name) and a column containing the Unix time (epoch) at
> which the position information was logged. I frequently need to assign
> records to vessel transits. To do this, I currently create a CTE that uses a
> Window function (partitioning the data by mmsi and name ordered by epoch) to
> examine the time that has elapsed between successive position reports for
> individual vessels. For every position record for a vessel (as identified
> using mmsi and name), if the time elapsed between the current position
> record and the previous record (using the lag function) is less than or
> equal to 2 hours, I assign the record a value of 0 to a CTE column named
> tr_index. If the time elapsed is greater than 2 hours, I assign the record a
> value of 1 to the tr_index column. I then use the CTE to generate transit
> numbers by summing the values in the tr_index field across a Window that
> also partitions the data by mmsi and name and is ordered by epoch. This
> works, but is very slow (hours). The table is indexed (multi-column index on
> mmsi, name and index on epoch). Does anyone see a way to get what I am after
> in a more efficient manner. What I am after is an assignment of transit
> number to vessels' position records based on whether the records were within
> two hours of each other. The SQL that I used is provided below. Any advice
> would be greatly appreciated...
>
>
>
> WITH
>
> cte_01 AS
>
> (
>
> SELECT
>
> a.id,
>
> a.mmsi,
>
> a.name,
>
> a.epoch,
>
> a.the_geom
>
> CASE
>
>   WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
>
>   ELSE 0
>
> END AS tr_index
>
> FROM table a
>
> WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>
> )
>
>
>
>
>
> SELECT
>
> a.id,
>
> a.mmsi,
>
> a.name,
>
> a.epoch,
>
> a.the_geom,
>
> 1 + sum(a.tr_index) OVER w AS transit,
>
> a.active
>
> FROM cte_01 a
>
> WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>
>
>
> --
> Jeff


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jon Nelson
Дата:
Сообщение: Re: sniff test on some PG 8.4 numbers
Следующее
От: Jeff Adams - NOAA Affiliate
Дата:
Сообщение: Re: Large Table - Slow Window Functions (Better Approach?)