Обсуждение: triple self-join crawling

Поиск
Список
Период
Сортировка

triple self-join crawling

От
T E Schmitz
Дата:
The following self join of a table containing 5800 records is crawling:

CREATE TABLE history
(    stock VARCHAR(30) NOT NULL,    day date NOT NULL,    open NUMERIC (6,1) NOT NULL,    high NUMERIC (6,1) NOT NULL,
 low NUMERIC (6,1) NOT NULL,    close NUMERIC (6,1) NOT NULL,    volume NUMERIC (12) NOT NULL,    PRIMARY KEY
(stock,day)
);


SELECT    history.stock, history.day, history.high, history.low,    MAX(past_week.high) AS week_high,
MAX(past_month.high)AS month_high
 
FROM history
INNER JOIN history AS past_month ON (past_month.stock = history.stock 
AND past_month.day < history.day AND past_month.day >= (history.day - 30))
INNER JOIN history AS past_week  ON (past_week.stock   = 
past_month.stock AND past_week.day < history.day AND past_week.day >= 
(history.day - 7))
GROUP BY  history.stock, history.day, history.high, history.low
ORDER BY history.stock, history.day DESC


How can I speed this up?


-- 


Regards,

Tarlika Elisabeth Schmitz


Re: triple self-join crawling

От
Andrew Sullivan
Дата:
Define "crawling".  Also, please post EXPLAIN and, if feasible,
EXPLAIN ANALYSE output for your case.

A

On Sun, Mar 18, 2007 at 07:51:28PM +0000, T E Schmitz wrote:
> The following self join of a table containing 5800 records is crawling:
> 
> CREATE TABLE history
> (
>     stock VARCHAR(30) NOT NULL,
>     day date NOT NULL,
>     open NUMERIC (6,1) NOT NULL,
>     high NUMERIC (6,1) NOT NULL,
>     low NUMERIC (6,1) NOT NULL,
>     close NUMERIC (6,1) NOT NULL,
>     volume NUMERIC (12) NOT NULL,
>     PRIMARY KEY (stock,day)
> );
> 
> 
> SELECT
>     history.stock, history.day, history.high, history.low,
>     MAX(past_week.high) AS week_high,
>     MAX(past_month.high) AS month_high
> FROM history
> INNER JOIN history AS past_month ON (past_month.stock = history.stock 
> AND past_month.day < history.day AND past_month.day >= (history.day - 30))
> INNER JOIN history AS past_week  ON (past_week.stock   = 
> past_month.stock AND past_week.day < history.day AND past_week.day >= 
> (history.day - 7))
> GROUP BY  history.stock, history.day, history.high, history.low
> ORDER BY history.stock, history.day DESC
> 
> 
> How can I speed this up?
> 
> 
> -- 
> 
> 
> Regards,
> 
> Tarlika Elisabeth Schmitz
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>               http://www.postgresql.org/docs/faq

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Unfortunately reformatting the Internet is a little more painful 
than reformatting your hard drive when it gets out of whack.    --Scott Morris


Re: triple self-join crawling

От
T E Schmitz
Дата:
T E Schmitz wrote:
> The following self join of a table containing 5800 records is crawling:
> 
> SELECT
>     history.stock, history.day, history.high, history.low,
>     MAX(past_week.high) AS week_high,
>     MAX(past_month.high) AS month_high
> FROM history
> INNER JOIN history AS past_month ON (past_month.stock = history.stock 
> AND past_month.day < history.day AND past_month.day >= (history.day - 30))
> INNER JOIN history AS past_week  ON (past_week.stock   = 
> past_month.stock AND past_week.day < history.day AND past_week.day >= 
> (history.day - 7))
> GROUP BY  history.stock, history.day, history.high, history.low
> ORDER BY history.stock, history.day DESC

Things improved hugely when I changed the JOIN clauses:

LEFT OUTER JOIN history AS past_month ON (past_month.stock = 
history.stock        AND past_month.day >= (history.day - 30) AND 
past_month.day < history.day)
LEFT OUTER JOIN history AS past_week  ON (past_week.stock   = 
past_month.stock AND past_week.day  =past_month.day AND past_week.day >= (history.day - 7))


-- 


Regards,

Tarlika Elisabeth Schmitz


Re: triple self-join crawling

От
T E Schmitz
Дата:
T E Schmitz wrote:
> T E Schmitz wrote:
> 
> Things improved hugely when I changed the JOIN clauses:

see explain analyze below - can this be improved further?

> 
> LEFT OUTER JOIN history AS past_month ON (past_month.stock = 
> history.stock        AND past_month.day >= (history.day - 30) AND 
> past_month.day < history.day)
> LEFT OUTER JOIN history AS past_week  ON (past_week.stock   = 
> past_month.stock AND past_week.day  =past_month.day AND past_week.day >= 
> (history.day - 7))

QUERY PLAN
GroupAggregate  (cost=1370368.19..1435888.88 rows=5801 width=56) (actual 
time=11945.030..13163.156 rows=5801 loops=1)  ->  Sort  (cost=1370368.19..1379715.86 rows=3739067 width=56) (actual 
time=11944.753..12462.623 rows=120117 loops=1)        Sort Key: history.stock, history."day", history.high, history.low
      ->  Hash Left Join  (cost=160.02..391554.63 rows=3739067 
 
width=56) (actual time=52.746..3778.409 rows=120117 loops=1)              Hash Cond: ((("outer".stock)::text = 
("inner".stock)::text) AND ("outer"."day" = "inner"."day"))              Join Filter: ("inner"."day" >= ("outer"."day"
-7))              ->  Nested Loop Left Join  (cost=0.00..204441.26 
 
rows=3739067 width=57) (actual time=0.077..2313.375 rows=120117 loops=1)                    Join Filter:
(("inner".stock)::text= 
 
("outer".stock)::text)                    ->  Seq Scan on history  (cost=0.00..131.01 
rows=5801 width=34) (actual time=0.016..34.845 rows=5801 loops=1)                    ->  Index Scan using
idx_history_dayon history 
 
past_month  (cost=0.00..22.32 rows=645 width=23) (actual 
time=0.020..0.185 rows=21 loops=5801)                          Index Cond: ((past_month."day" >= 
("outer"."day" - 30)) AND (past_month."day" < "outer"."day"))              ->  Hash  (cost=131.01..131.01 rows=5801
width=23)
 
(actual time=52.608..52.608 rows=5801 loops=1)                    ->  Seq Scan on history past_week 
(cost=0.00..131.01 rows=5801 width=23) (actual time=0.010..25.110 
rows=5801 loops=1)

Total runtime: 13187.729 ms
-- 


Regards,

Tarlika Elisabeth Schmitz


Re: triple self-join crawling

От
T E Schmitz
Дата:
Andrew Sullivan wrote:
> Define "crawling".  Also, please post EXPLAIN and, if feasible,

Total runtime: 191430.537 ms


> EXPLAIN ANALYSE output for your case.
> 
> On Sun, Mar 18, 2007 at 07:51:28PM +0000, T E Schmitz wrote:
> 
>>The following self join of a table containing 5800 records is crawling:
>>
>>
>>SELECT
>>    history.stock, history.day, history.high, history.low,
>>    MAX(past_week.high) AS week_high,
>>    MAX(past_month.high) AS month_high
>>FROM history
>>INNER JOIN history AS past_month ON (past_month.stock = history.stock 
>>AND past_month.day < history.day AND past_month.day >= (history.day - 30))
>>INNER JOIN history AS past_week  ON (past_week.stock   = 
>>past_month.stock AND past_week.day < history.day AND past_week.day >= 
>>(history.day - 7))
>>GROUP BY  history.stock, history.day, history.high, history.low
>>ORDER BY history.stock, history.day DESC

QUERY PLAN
GroupAggregate  (cost=1372725715.71..1414901439.85 rows=5801 width=56) 
(actual time=185248.481..191393.056 rows=5800 loops=1)  ->  Sort  (cost=1372725715.71..1378750806.73 rows=2410036407 
width=56) (actual time=185247.212..187763.008 rows=584203 loops=1)        Sort Key: history.stock, history."day",
history.high,history.low        ->  Nested Loop  (cost=0.00..132245248.41 rows=2410036407 
 
width=56) (actual time=0.124..148136.464 rows=584203 loops=1)              Join Filter: (("inner".stock)::text =
("outer".stock)::text)             ->  Nested Loop  (cost=0.00..555931.84 rows=3739067 
 
width=53) (actual time=0.087..137531.941 rows=28147 loops=1)                    Join Filter: (("outer"."day" >=
("inner"."day"- 
 
7)) AND (("outer".stock)::text = ("inner".stock)::text))                    ->  Index Scan using history_pkey on
history
 
past_week  (cost=0.00..266.62 rows=5801 width=23) (actual 
time=0.034..43.147 rows=5801 loops=1)                    ->  Index Scan using idx_history_day on history 
(cost=0.00..57.11 rows=1934 width=34) (actual time=0.021..12.709 
rows=2900 loops=5801)                          Index Cond: ("outer"."day" < history."day")              ->  Index Scan
usingidx_history_day on history 
 
past_month  (cost=0.00..22.32 rows=645 width=23) (actual 
time=0.019..0.170 rows=21 loops=28147)                    Index Cond: ((past_month."day" < "outer"."day") AND 
(past_month."day" >= ("outer"."day" - 30)))
Total runtime: 191430.537 ms

-- 


Regards,

Tarlika Elisabeth Schmitz


Re: triple self-join crawling

От
Martin Marques
Дата:
T E Schmitz escribió:
> 
> QUERY PLAN
> GroupAggregate  (cost=1370368.19..1435888.88 rows=5801 width=56) (actual 
> time=11945.030..13163.156 rows=5801 loops=1)
>   ->  Sort  (cost=1370368.19..1379715.86 rows=3739067 width=56) (actual 
> time=11944.753..12462.623 rows=120117 loops=1)
>         Sort Key: history.stock, history."day", history.high, history.low
>         ->  Hash Left Join  (cost=160.02..391554.63 rows=3739067 
> width=56) (actual time=52.746..3778.409 rows=120117 loops=1)
>               Hash Cond: ((("outer".stock)::text = 
> ("inner".stock)::text) AND ("outer"."day" = "inner"."day"))
>               Join Filter: ("inner"."day" >= ("outer"."day" - 7))
>               ->  Nested Loop Left Join  (cost=0.00..204441.26 
> rows=3739067 width=57) (actual time=0.077..2313.375 rows=120117 loops=1)

It's estimating >3M, but it finds 120K rows.

>                     Join Filter: (("inner".stock)::text = 
> ("outer".stock)::text)
>                     ->  Seq Scan on history  (cost=0.00..131.01 
> rows=5801 width=34) (actual time=0.016..34.845 rows=5801 loops=1)
>                     ->  Index Scan using idx_history_day on history 
> past_month  (cost=0.00..22.32 rows=645 width=23) (actual 
> time=0.020..0.185 rows=21 loops=5801)
>                           Index Cond: ((past_month."day" >= 
> ("outer"."day" - 30)) AND (past_month."day" < "outer"."day"))
>               ->  Hash  (cost=131.01..131.01 rows=5801 width=23) (actual 
> time=52.608..52.608 rows=5801 loops=1)
>                     ->  Seq Scan on history past_week (cost=0.00..131.01 
> rows=5801 width=23) (actual time=0.010..25.110 rows=5801 loops=1)
> 
> Total runtime: 13187.729 ms

Try running a vacuum analyze on the database (or at least the tables 
which differ in rows estimated and actual (history for example))

-- 
select 'mmarques' || '@' || 'unl.edu.ar' AS email;
---------------------------------------------------------
Martín Marqués          |   Programador, DBA
Centro de Telemática    |     Administrador               Universidad Nacional                    del Litoral
---------------------------------------------------------