Very inefficient query plan with disjunction in WHERE clause

Поиск
Список
Период
Сортировка
Hi all,

I've been staring at this for hours (if not days). Any hints are appreciated! At first I thought "there must be a way
tomake postgresql perform on this thing", but i've lost hope that pgsql actually can deal with it.. 

The query is:

SELECT DISTINCT
  posrel.pos,
  questions.number,
  questions.title,
  answers.number,
  answers.title,
  answers.body
FROM
  bar_portals portals,
  ONLY bar_insrel insrel,
  bar_faq faq,
  ONLY bar_posrel posrel,
  bar_questions questions,
  bar_insrel insrel0,
  bar_answers answers
WHERE
    portals.number=2534202
  AND
    (portals.number=insrel.snumber AND faq.number=insrel.dnumber)
  AND
    (faq.number=posrel.snumber AND questions.number=posrel.dnumber AND posrel.rnumber=780)
  AND
    (
        (questions.number=insrel0.dnumber AND answers.number=insrel0.snumber AND insrel0.dir<>1)
      OR
        (questions.number=insrel0.snumber AND answers.number=insrel0.dnumber)
    )
  AND
    (
          portals.owner NOT IN ('notinuse','productie')
      AND insrel.owner NOT IN ('notinuse','productie')
      AND faq.owner NOT IN ('notinuse','productie')
      AND posrel.owner NOT IN ('notinuse','productie')
      AND questions.owner NOT IN ('notinuse','productie')
      AND insrel0.owner NOT IN ('notinuse','productie')
      AND answers.owner NOT IN ('notinuse','productie')
    )
ORDER BY posrel.pos ASC;

A whole mouth full. Basically, we have portals, which are linked through insrel with faq's, which are linked through
posrelto questions and finally these questions are linked to answers by insrel again. 

Insrel is a horrible table that links together every object in this system, not just faq's and such. Posrel is similar
(actuallycontains a subset of insrel). 

Now, the explain analyze output:

                                                                                                     QUERY PLAN
                                                                                            

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=784671.21..784674.86 rows=209 width=252) (actual time=46493.975..46494.052 rows=19 loops=1)
   ->  Sort  (cost=784671.21..784671.73 rows=209 width=252) (actual time=46493.969..46493.987 rows=19 loops=1)
         Sort Key: posrel.pos, questions.number, questions.title, answers.number, answers.title, answers.body
         ->  Nested Loop  (cost=1.74..784663.15 rows=209 width=252) (actual time=3467.099..46493.621 rows=19 loops=1)
               Join Filter: (((questions.number = insrel0.dnumber) AND (answers.number = insrel0.snumber) AND
(insrel0.dir<> 1)) OR ((questions.number = insrel0.snumber) AND (answers.number = insrel0.dnumber))) 
               ->  Nested Loop  (cost=0.00..6769.08 rows=181978 width=252) (actual time=1.962..6548.477 rows=1732040
loops=1)
                     ->  Nested Loop  (cost=0.00..282.30 rows=2 width=33) (actual time=1.287..7.785 rows=19 loops=1)
                           ->  Nested Loop  (cost=0.00..277.16 rows=18 width=8) (actual time=1.030..5.504 rows=19
loops=1)
                                 ->  Nested Loop  (cost=0.00..182.63 rows=1 width=8) (actual time=0.921..4.445 rows=1
loops=1)
                                       ->  Nested Loop  (cost=0.00..147.05 rows=64 width=4) (actual time=0.060..3.222
rows=269loops=1) 
                                             ->  Seq Scan on foo_portals portals  (cost=0.00..1.27 rows=1 width=4)
(actualtime=0.017..0.027 rows=1 loops=1) 
                                                   Filter: ((number = 2534202) AND ("owner" <> ALL
('{notinuse,productie}'::text[])))
                                             ->  Index Scan using foo_insrel_snumber_idx on foo_insrel insrel
(cost=0.00..145.14rows=64 width=8) (actual time=0.038..2.673 rows=269 loops=1) 
                                                   Index Cond: (snumber = 2534202)
                                                   Filter: ("owner" <> ALL ('{notinuse,productie}'::text[]))
                                       ->  Index Scan using foo_faq_main_idx on foo_faq faq  (cost=0.00..0.54 rows=1
width=4)(actual time=0.003..0.003 rows=0 loops=269) 
                                             Index Cond: (faq.number = insrel.dnumber)
                                             Filter: ("owner" <> ALL ('{notinuse,productie}'::text[]))
                                 ->  Index Scan using foo_posrel_snumber_owner_idx on foo_posrel posrel
(cost=0.00..93.97rows=45 width=12) (actual time=0.105..0.984 rows=19 loops=1) 
                                       Index Cond: (faq.number = posrel.snumber)
                                       Filter: (rnumber = 780)
                           ->  Index Scan using foo_questions_main_idx on foo_questions questions  (cost=0.00..0.27
rows=1width=29) (actual time=0.106..0.112 rows=1 loops=19) 
                                 Index Cond: (questions.number = posrel.dnumber)
                                 Filter: ("owner" <> ALL ('{notinuse,productie}'::text[]))
                     ->  Seq Scan on foo_answers answers  (cost=0.00..2333.50 rows=90989 width=219) (actual
time=0.061..162.481rows=91160 loops=19) 
                           Filter: ("owner" <> ALL ('{notinuse,productie}'::text[]))
               ->  Bitmap Heap Scan on foo_insrel insrel0  (cost=1.74..4.25 rows=1 width=12) (actual time=0.020..0.020
rows=0loops=1732040) 
                     Recheck Cond: (((questions.number = insrel0.dnumber) AND (answers.number = insrel0.snumber)) OR
((answers.number= insrel0.dnumber) AND (questions.number = insrel0.snumber))) 
                     Filter: ("owner" <> ALL ('{notinuse,productie}'::text[]))
                     ->  BitmapOr  (cost=1.74..1.74 rows=1 width=0) (actual time=0.018..0.018 rows=0 loops=1732040)
                           ->  Bitmap Index Scan on foo_inresl_dnumber_snumber_idx  (cost=0.00..0.87 rows=1 width=0)
(actualtime=0.007..0.007 rows=0 loops=1732040) 
                                 Index Cond: ((questions.number = insrel0.dnumber) AND (answers.number =
insrel0.snumber))
                           ->  Bitmap Index Scan on foo_inresl_dnumber_snumber_idx  (cost=0.00..0.87 rows=1 width=0)
(actualtime=0.008..0.008 rows=0 loops=1732040) 
                                 Index Cond: ((answers.number = insrel0.dnumber) AND (questions.number =
insrel0.snumber))
 Total runtime: 46494.271 ms
(35 rows)

As you can see, 46 seconds. Not quite impressive.

Now, when I split up the OR in two distinct queries, everything is nice and fast. Both queries run in sub-second time.

Anyway, back to the query plan. The plan nicely starts with portals, insrel, faq, posrel and questions. But then it
proposessomething totally stupid, it seems it does a cartesian product of those 19-odd questions it got so far and the
91000answers (the seq scan on foo_answers), and then goes to check for each of the pairs in that big set whether the
disjunctionholds. 

Why would it do this? And more importantly: how do i get it to do it in a smarter way; eg. by retrieving for each of
thequestions the corresponding answers by traversing the insrel disjunction using indices etc.. 

My guess (and i'd like to know whether i'm 'warm' or 'cold' here) is that insrel is such a messed up table, that
postgresfails to extract representative statistics from a small sample. The most important correlation is probably in
snumber/dnumber,but the characteristics of this correlation vary wildly: there might be one snumber with a million
dnumbersand a million snumbers with only one dnumber. 

I've tried bigger statistics targets on the insrel dnumber/snumber columns, even maxed it out to 1000, but that did not
help.

Anyway, any hints on getting this beast perform (without rewriting the query, that's not possible in this case due to
thequery being generated by an ORM) are welcome. I'm starting to think it is impossible, and that postgresql just
doesn'twork for this particular query+dataset. 


Thanks,

Koen

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Peter Sheats"
Дата:
Сообщение: Best way to load test a postgresql server
Следующее
От: Richard Huxton
Дата:
Сообщение: Re: Using index for bitwise operations?