Re: performance help

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: performance help
Дата
Msg-id 581.964808242@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: performance help  (Ernie <ernie.cline@ipgdirect.com>)
Список pgsql-general
Ernie <ernie.cline@ipgdirect.com> writes:
> This query is very fast.
>
> cw=# SELECT distinct n.news_id, headline, link, to_char(created,
> 'mm-dd-yyyyhh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id =
> nc.news_id AND
> cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id
> cw-#
> in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5
> 462,
> cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc;
>
> Unique    (cost=60322.14..60559.66 rows=1900 width=48)
>   ->  Sort  (cost=60322.14..60322.14 rows=19001 width=48)
>     ->  Nested Loop  (cost=0.00..58651.80 rows=19001 width=48)
>           ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
> rows=1898 width=36)
>           ->  Index Scan using news_cat_news_id on news_cat nc
> (cost=0.00..30.53 rows=10 width=12)
>
> And here is the query I'm having problems with (slow)
> cw=# SELECT distinct n.news_id, headline, link,
> cw-# to_char(created, 'mm-dd-yyyy hh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc
> cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND
> cw-# nc.code_id in(4261,4182) ORDER BY created desc;
>
> Unique    (cost=35162.58..35181.27 rows=150 width=48)
>   ->  Sort  (cost=35162.58..35162.58 rows=1496 width=48)
>     ->  Nested Loop  (cost=0.00..35083.71 rows=1496 width=48)
>           ->  Index Scan using news_cat_code_id, news_cat_code_id on
> news_cat nc  (cost=0.00..55.31 rows=79 width=12)
>           ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
> rows=1898 width=36)

The difference evidently is which table is scanned as the outside of the
nested loop.  The first plan says "scan news_article_summary
sequentially, and for each row that passes the WHERE clauses that
mention only that table, probe into news_cat_news_id for the row(s)
that match by news_id; then check the remaining WHERE clauses on the
combined row(s)."  The second plan says "probe into news_cat_news_id
for the row(s) that have the requested code_id values, and for each
one scan news_article_summary sequentially to locate matching rows."

If there are a lot of matches for nc.code_id in(4261,4182) then the
second plan would scan news_article_summary many times, which'd account
for it being slow.

Are news_id and code_id unique columns?  I am guessing from context
that at least news_id might be.  The planner does not seem to know
that, judging from its row-count guesses.  Perhaps all you need is
a VACUUM ANALYZE so that the planner becomes aware the column is
unique.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Erich
Дата:
Сообщение: PG vs. Oracle for larger databases
Следующее
От: Bruce Momjian
Дата:
Сообщение: Manual pages added to book