Re: hashed subplan 5000x slower than two sequential operations

От: Marc Mamin
Тема: Re: hashed subplan 5000x slower than two sequential operations
Дата: ,
Msg-id: C4DAC901169B624F933534A26ED7DF3103E915FC@JENMAIL01.ad.intershop.net
(см: обсуждение, исходный текст)
Ответ на: hashed subplan 5000x slower than two sequential operations  (Bryce Nesbitt)
Ответы: Re: hashed subplan 5000x slower than two sequential operations  (Bryce Nesbitt)
Список: pgsql-performance

Скрыть дерево обсуждения

hashed subplan 5000x slower than two sequential operations  (Bryce Nesbitt, )
 Re: hashed subplan 5000x slower than two sequential operations  (Shrirang Chitnis, )
  Re: hashed subplan 5000x slower than two sequential operations  (Bryce Nesbitt, )
  Re: hashed subplan 5000x slower than two sequential operations  (Tom Lane, )
   Re: hashed subplan 5000x slower than two sequential operations  (Pavel Stehule, )
 Re: hashed subplan 5000x slower than two sequential operations  ("Marc Mamin", )
  Re: hashed subplan 5000x slower than two sequential operations  (Bryce Nesbitt, )
 Re: hashed subplan 5000x slower than two sequential operations  ("Marc Mamin", )
  Re: hashed subplan 5000x slower than two sequential operations  (Bryce Nesbitt, )
   Re: hashed subplan 5000x slower than two sequential operations  (masterchief, )
    Re: hashed subplan 5000x slower than two sequential operations  (Віталій Тимчишин, )


Another point: would a conditionl index help ?

on articles (context_key) where indexed

regards,

-----Ursprüngliche Nachricht-----
Von: im Auftrag von Marc Mamin
Gesendet: Mi 12/8/2010 9:06
An: Shrirang Chitnis; Bryce Nesbitt;
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential operations



Hello,

are the table freshly analyzed, with a sufficient default_statistics_target ?

You may try to get a better plan while rewriting the query as an UNION to get rid of the OR clause.
Something like (not tested):

SELECT contexts.context_key
FROM contexts
     JOIN articles
     ON (articles.context_key=contexts.context_key)
WHERE contexts.parent_key = 392210
AND articles.indexed


UNION
SELECT context_key
FROM
(
  SELECT contexts.context_key
  FROM contexts JOIN collection_data ON ( contexts.context_key = collection_data .context_key)
  WHERE collection_data.collection_context_key = 392210)
) foo JOIN articles ON (foo.context_key=contexts.context_key)
WHERE articles.indexed
;


I've had one similar problem where there was no way for the planner to notice that the query would systematically return very few rows. Here, my last resort was to disable some planner methods within the given transaction.

regards,

Marc Mamin

-----Ursprüngliche Nachricht-----
Von: im Auftrag von Shrirang Chitnis
Gesendet: Mi 12/8/2010 8:05
An: Bryce Nesbitt;
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

Bryce,

The two queries are different:

You are looking for contexts.context_key in first query

WHERE (contexts.parent_key = 392210
      OR contexts.context_key IN
         (SELECT collection_data.context_key
         FROM collection_data
          WHERE collection_data.collection_context_key = 392210)


but second query has context.parent_key

WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))

Is the contexts.context_key an indexed field? contexts.parent_key certainly seems to be.


HTH,


Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services


Office: (866) 808-0935 Ext: 39210

www.hovservices.com

The information contained in this message, including any attachments, is attorney privileged and/or confidential information intended only for the use of the individual or entity named as addressee.  The review, dissemination, distribution or copying of this communication by or to anyone other than the intended addressee is strictly prohibited.  If you have received this communication in error, please immediately notify the sender by replying to the message and destroy all copies of the original message.


-----Original Message-----
From: [mailto:] On Behalf Of Bryce Nesbitt
Sent: Thursday, December 09, 2010 12:24 AM
To:
Subject: [PERFORM] hashed subplan 5000x slower than two sequential operations

Can you help me understand how to optimize the following.  There's a
subplan which in this case returns 3 rows,
but it is really expensive:



=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
     JOIN articles
     ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
      OR contexts.context_key IN
         (SELECT collection_data.context_key
         FROM collection_data
          WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
                                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
    Hash Cond: (articles.context_key = contexts.context_key)
    ->  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
          Filter: indexed
    ->  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
          ->  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
                Filter: ((parent_key = 392210) OR (hashed subplan))
                SubPlan
                  ->  Index Scan using collection_data_context_key_index
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
                        Index Cond: (collection_context_key = 392210)
  Total runtime: 6002.976 ms
(11 rows)


=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
     JOIN articles
     ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))
AND articles.indexed
;
                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=14.35..1863.85 rows=94 width=4) (actual
time=0.098..1.038 rows=57 loops=1)
    ->  Bitmap Heap Scan on contexts  (cost=14.35..572.57 rows=288
width=4) (actual time=0.079..0.274 rows=59 loops=1)
          Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[])))
          ->  BitmapOr  (cost=14.35..14.35 rows=288 width=0) (actual
time=0.066..0.066 rows=0 loops=1)
                ->  Bitmap Index Scan on parent_key_idx
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)
                      Index Cond: (parent_key = 392210)
                ->  Bitmap Index Scan on parent_key_idx
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87
loops=1)
                      Index Cond: (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[]))
    ->  Index Scan using article_key_idx on articles  (cost=0.00..4.47
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)
          Index Cond: (articles.context_key = contexts.context_key)
          Filter: articles.indexed
  Total runtime: 1.166 ms
(12 rows)

production=> explain analyze SELECT contexts.context_key
FROM contexts
     JOIN articles
     ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
      OR contexts.context_key IN
         (SELECT collection_data.context_key
         FROM collection_data
          WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;


=========================================================================
# select version();
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)


--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



В списке pgsql-performance по дате сообщения:

От: Robert Haas
Дата:
Сообщение: Re: Performance under contention
От: Scott Marlowe
Дата:
Сообщение: Re: Hardware recommendations