Обсуждение: severe performance issue with planner

Поиск
Список
Период
Сортировка

severe performance issue with planner

От
"Eric Brown"
Дата:
First let me explain the situation:

I came into the #postgresql irc channel talking about this problem, and
someone advised me to use this mailing list (so i'm not just wasting your
time i hope).  I'm not sure how to describe the problem fully, so I'll start
by explaining what my database does, a little about the table structure,
then an example of a problematic query, and some other information that
might be relevant.

My database is a Chinese-English/English-Chinese dictionary.  It lets users
search Chinese words by any character in the word, or any sequence of
characters (starting anywhere the user wants).  Characters are often
searched by their pinyin values (a romanization of the sounds).  The
dictionary has an average word length of 2 (but it sucks), but there are
also many words of length 4 and 6 characters.  So it wouldn't be uncommon to
search for something like "a1 la1 bo yu" (arabic).  There are also some very
long words with 12 or more characters (where my problem becomes more
pronounced).

That being said, the most important table here is the words table:
              Table "public.words"
    Column    |         Type         | Modifiers
--------------+----------------------+-----------
wid          | integer              | not null
sequence     | smallint             | not null
variant      | smallint             | not null
char_count   | smallint             | not null
unicode      | character varying(5) | not null
pinyin       | character varying(8) | not null
simpvar      | character varying(5) |
zvar         | character varying(5) |
compatvar    | character varying(5) |
def_exists   | boolean              | not null
num_variants | smallint             |
page_order   | integer              |
pinyins      | character varying    |
unicodes     | character varying    |
Indexes:
    "words2_pkey" primary key, btree (wid, variant, "sequence")
    "page_index" btree (page_order)
    "pinyin_index" btree (pinyin)
    "unicode_index" btree (unicode)


The best example of the problem I have when using this table is this query:

SELECT
    w8.wid,
    w8.variant,
    w8.num_variants,
    sum_text(w8.unicode) as unicodes,
    sum_text(w8.pinyin) as pinyins
FROM
    words as w0,     words as w1,
    words as w2,     words as w3,
    words as w4,     words as w5,
    words as w6,     words as w7,
    words as w8
WHERE
    w0.wid > 0 AND
    w0.pinyin = 'zheng4' AND
    w0.def_exists = 't' AND
    w0.sequence = 0 AND
    w1.wid = w0.wid AND
    w1.pinyin LIKE 'fu_' AND
    w1.variant = w0.variant AND
    w1.sequence = (w0.sequence + 1) AND
    w2.wid = w1.wid AND
    w2.pinyin LIKE 'ji_' AND
    w2.variant = w1.variant AND
    w2.sequence = (w1.sequence + 1) AND
    w3.wid = w2.wid AND
    w3.pinyin LIKE 'guan_' AND
    w3.variant = w2.variant AND
    w3.sequence = (w2.sequence + 1) AND
    w4.wid = w3.wid AND
    w4.pinyin LIKE 'kai_' AND
    w4.variant = w3.variant AND
    w4.sequence = (w3.sequence + 1) AND
    w5.wid = w4.wid AND
    w5.pinyin LIKE 'fang_' AND
    w5.variant = w4.variant AND
    w5.sequence = (w4.sequence + 1) AND
    w6.wid = w5.wid AND
    w6.pinyin LIKE 'xi_' AND
    w6.variant = w5.variant AND
    w6.sequence = (w5.sequence + 1) AND
    w7.wid = w6.wid AND
    w7.pinyin LIKE 'tong_' AND
    w7.variant = w6.variant AND
    w7.sequence = (w6.sequence + 1) AND
    w8.wid = w7.wid AND
    w8.variant = w7.variant
GROUP BY
    w8.wid,
    w8.variant,
    w8.num_variants,
    w8.page_order ,
    w0.sequence ,
    w1.sequence ,
    w2.sequence ,
    w3.sequence ,
    w4.sequence ,
    w5.sequence ,
    w6.sequence ,
    w7.sequence
ORDER BY
    w8.page_order;

(phew!)

with the default geqo_threshold of 11, this query takes 3155ms on my machine
(a 1ghz athlon with 384 megs of pc133 ram).  This is very very long.

if i first do prepare blah as SELECT ....., then run execute blah, the time
goes down to about 275ms  (i had been running this query a lot, and did a
vacuum update before all this).

the ouput from EXPLAIN ANALYZE :


                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=54.13..54.14 rows=1 width=43) (actual time=315.357..315.357
rows=1 loops=1)
   Sort Key: w8.page_order
   ->  HashAggregate  (cost=54.12..54.12 rows=1 width=43) (actual
time=315.328..315.330 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..54.08 rows=1 width=43) (actual
time=6.229..314.566 rows=12 loops=1)
               Join Filter: (("outer".wid = "inner".wid) AND
("outer".variant = "inner".variant) AND ("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
               ->  Nested Loop  (cost=0.00..48.07 rows=1 width=83) (actual
time=6.088..279.745 rows=12 loops=1)
                     Join Filter: (("inner"."sequence" = ("outer"."sequence"
+ 1)) AND ("outer"."sequence" = ("inner"."sequence" + 1)))
                     ->  Nested Loop  (cost=0.00..42.05 rows=1 width=75)
(actual time=5.980..278.602 rows=12 loops=1)
                           ->  Nested Loop  (cost=0.00..36.04 rows=1
width=48) (actual time=5.910..278.280 rows=1 loops=1)
                                 Join Filter: (("inner".variant =
"outer".variant) AND ("inner".wid = "outer".wid))
                                 ->  Nested Loop  (cost=0.00..30.04 rows=1
width=40) (actual time=3.465..275.137 rows=1 loops=1)
                                       Join Filter: ("inner"."sequence" =
("outer"."sequence" + 1))
                                       ->  Nested Loop  (cost=0.00..24.03
rows=1 width=32) (actual time=3.408..275.045 rows=1 loops=1)
                                             Join Filter:
("outer"."sequence" = ("inner"."sequence" + 1))
                                             ->  Nested Loop
(cost=0.00..18.00 rows=1 width=24) (actual time=3.350..274.948 rows=1
loops=1)
                                                   ->  Nested Loop
(cost=0.00..11.99 rows=1 width=16) (actual time=3.295..274.678 rows=6
loops=1)
                                                         Join Filter:
(("inner".wid = "outer".wid) AND ("inner".variant = "outer".variant) AND
("inner"."sequence" = ("outer"."sequence" + 1)))
                                                         ->  Index Scan
using pinyin_index on words w4  (cost=0.00..5.98 rows=1 width=8) (actual
time=0.090..1.222 rows=165 loops=1)
                                                               Index Cond:
(((pinyin)::text >= 'kai'::character varying) AND ((pinyin)::text <
'kaj'::character varying))
                                                               Filter:
((pinyin)::text ~~ 'kai_'::text)
                                                         ->  Index Scan
using pinyin_index on words w5  (cost=0.00..5.98 rows=1 width=8) (actual
time=0.017..1.380 rows=259 loops=165)
                                                               Index Cond:
(((pinyin)::text >= 'fang'::character varying) AND ((pinyin)::text <
'fanh'::character varying))
                                                               Filter:
((pinyin)::text ~~ 'fang_'::text)
                                                   ->  Index Scan using
words2_pkey on words w1  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.032..0.037 rows=0 loops=6)
                                                         Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
                                                         Filter:
((pinyin)::text ~~ 'fu_'::text)
                                             ->  Index Scan using
words2_pkey on words w0  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.033..0.068 rows=1 loops=1)
                                                   Index Cond: (("outer".wid
= w0.wid) AND (w0.wid > 0) AND ("outer".variant = w0.variant))
                                                   Filter: (((pinyin)::text
= 'zheng4'::text) AND (def_exists = true) AND ("sequence" = 0))
                                       ->  Index Scan using words2_pkey on
words w2  (cost=0.00..6.00 rows=1 width=8) (actual time=0.029..0.060 rows=1
loops=1)
                                             Index Cond: ((w2.wid =
"outer".wid) AND (w2.variant = "outer".variant))
                                             Filter: ((pinyin)::text ~~
'ji_'::text)
                                 ->  Index Scan using pinyin_index on words
w7  (cost=0.00..5.98 rows=1 width=8) (actual time=0.030..2.573 rows=338
loops=1)
                                       Index Cond: (((pinyin)::text >=
'tong'::character varying) AND ((pinyin)::text < 'tonh'::character varying))
                                       Filter: ((pinyin)::text ~~
'tong_'::text)
                           ->  Index Scan using words2_pkey on words w8
(cost=0.00..5.99 rows=1 width=27) (actual time=0.029..0.130 rows=12 loops=1)
                                 Index Cond: ((w8.wid = "outer".wid) AND
(w8.variant = "outer".variant))
                     ->  Index Scan using words2_pkey on words w6
(cost=0.00..6.00 rows=1 width=8) (actual time=0.040..0.060 rows=1 loops=12)
                           Index Cond: ((w6.wid = "outer".wid) AND
(w6.variant = "outer".variant))
                           Filter: ((pinyin)::text ~~ 'xi_'::text)
               ->  Index Scan using pinyin_index on words w3
(cost=0.00..5.98 rows=1 width=8) (actual time=0.023..2.312 rows=304
loops=12)
                     Index Cond: (((pinyin)::text >= 'guan'::character
varying) AND ((pinyin)::text < 'guao'::character varying))
                     Filter: ((pinyin)::text ~~ 'guan_'::text)
Total runtime: 316.493 ms
(44 rows)

Time: 3167.853 ms



As you can see, the two run times there are quite different... The person I
spoke to in the irc channel said this all indicated a poor planning time,
and I think I agree.  Yesterday I tried setting geqo_threshold to 7 instead
of the default of 11, and it seemed to help a little, but the running times
were still extremely high.

I guess I do have a question in addition to just wanting to notify the right
people of this problem:  Since a lot of my queries are similar to this one
(but not similar enough to allow me to use one or two of them over and over
with different parameters), is there any way for me to reorganize or rewrite
the queries so that the planner doesn't take so long?  (I would hate to have
to take all of this out of the db's hands and iterate in code myself...)

If you guys are optimistic about someone being able to fix this problem in
pgsql, I will just wait for the bug fix.


Thanks for listening :)  let me know if you need any more information (oh
yea, this is on Linux, version 7.4.1)

_________________________________________________________________
Create a Job Alert on MSN Careers and enter for a chance to win $1000!
http://msn.careerbuilder.com/promo/kaday.htm?siteid=CBMSN_1K&sc_extcmp=JS_JASweep_MSNHotm2


Re: severe performance issue with planner

От
Christopher Kings-Lynne
Дата:
> if i first do prepare blah as SELECT ....., then run execute blah, the
> time goes down to about 275ms  (i had been running this query a lot, and
> did a vacuum update before all this).

If you make it an SQL stored procedure, you get the speed up of the
PREPARE command, without having to prepare manually all the time.

Chris


Re: severe performance issue with planner

От
Tom Lane
Дата:
"Eric Brown" <bigwhitecow@hotmail.com> writes:
> [ planning a 9-table query takes too long ]

See http://www.postgresql.org/docs/7.4/static/explicit-joins.html
for some useful tips.

            regards, tom lane

Re: severe performance issue with planner

От
Greg Stark
Дата:
The other posts about using explicit joins and using stored procedures are
both good points. But I have a few other comments to make:

"Eric Brown" <bigwhitecow@hotmail.com> writes:

> WHERE
>     w0.wid > 0 AND
>     w0.pinyin = 'zheng4' AND
>     w0.def_exists = 't' AND
>     w0.sequence = 0 AND
>     w1.wid = w0.wid AND
>     w1.pinyin LIKE 'fu_' AND
>     w1.variant = w0.variant AND
>     w1.sequence = (w0.sequence + 1) AND

I'm not sure it'll help the planner, but w0.sequence+1 is always just going to
be 1, and so on with the others. I think the planner might be able to figure
that out but the plan doesn't seem to show it doing so. I'm not sure it would
help the plan though.

Similarly you have w1.wid=w0.wid and w2.wid=w1.wid and w3.wid=w2.wid etc. And
also with the "variant" column. You might be able to get this planned better
by writing it as a join from w0 to all the others rather than a chain of
w0->w1->w2->... Again I'm not sure; you would have to experiment.


But I wonder if there isn't a way to do this in a single pass using an
aggregate. I'm not sure I understand the schema exactly, but perhaps something
like this?

select w8.wid,
       w8.variant,
       w8.num_variants,
       sum_text(w8.unicode) as unicodes,
       sum_text(w8.pinyin) as pinyins
  from (
        select wid,variant,
          from words
         where (sequence = 0 and pinyin = 'zheng4')
            OR (sequence = 1 and pinyin like 'ji_')
            OR (sequence = 2 and pinyin like 'guan_')
            OR (sequence = 3 and pinyin like 'kai_')
            OR (sequence = 4 and pinyin like 'fang_')
            OR (sequence = 5 and pinyin like 'xi_')
            OR (sequence = 6 and pinyin like 'tong_')
            OR (sequence = 7 and pinyin like 'fu_')
        group by wid,variant
        having count(*) = 8
       ) as w
  join words as w8 using (wid,variant)

This might be helped by having an index on <sequence,pinyin> but it might not
even need it.


--
greg

Re: severe performance issue with planner

От
Greg Stark
Дата:
Sorry, I forgot a key clause there:

Greg Stark <gsstark@MIT.EDU> writes:

> select w8.wid,
>        w8.variant,
>        w8.num_variants,
>        sum_text(w8.unicode) as unicodes,
>        sum_text(w8.pinyin) as pinyins
>   from (
>         select wid,variant,
>           from words
>          where (sequence = 0 and pinyin = 'zheng4')
>             OR (sequence = 1 and pinyin like 'ji_')
>             OR (sequence = 2 and pinyin like 'guan_')
>             OR (sequence = 3 and pinyin like 'kai_')
>             OR (sequence = 4 and pinyin like 'fang_')
>             OR (sequence = 5 and pinyin like 'xi_')
>             OR (sequence = 6 and pinyin like 'tong_')
>             OR (sequence = 7 and pinyin like 'fu_')
>         group by wid,variant
>         having count(*) = 8
>        ) as w
>   join words as w8 using (wid,variant)

   where w8.sequence = 8

Or perhaps that ought to be

   join words as w8 on (    w8.wid=w.wid
                        and w8.variant=w.variant
                        and w8.sequence = 8)

or even

   join (select * from words where sequence = 8) as w8 using (wid,variant)


I think they should all be equivalent though.




--
greg

Re: severe performance issue with planner

От
Kris Jurka
Дата:

On Thu, 11 Mar 2004, Tom Lane wrote:

> "Eric Brown" <bigwhitecow@hotmail.com> writes:
> > [ planning a 9-table query takes too long ]
>
> See http://www.postgresql.org/docs/7.4/static/explicit-joins.html
> for some useful tips.
>

Is this the best answer we've got?  For me with an empty table this query
takes 4 seconds to plan, is that the expected planning time?  I know I've
got nine table queries that don't take that long.

Setting geqo_threshold less than 9, it takes 1 second to plan.  Does this
indicate that geqo_threshold is set too high, or is it a tradeoff between
planning time and plan quality?  If the planning time is so high because
the are a large number of possible join orders, should geqo_threhold be
based on the number of possible plans somehow instead of the number of
tables involved?

Kris Jurka


Re: severe performance issue with planner

От
"Eric Brown"
Дата:
ok, at the advice of jurka, I will post new results:

Here's the query as I have changed it now:
SELECT
w8.wid,
w8.variant,
w8.num_variants,
sum_text(w8.unicode) as unicodes,
sum_text(w8.pinyin) as pinyins
FROM
(words as w8 JOIN
(words as w7 JOIN
(words as w6 JOIN
(words as w5 JOIN
(words as w4 JOIN
(words as w3 JOIN
(words as w2 JOIN
(words as w0 JOIN words as w1

ON(w1.wid = w0.wid AND w1.variant = w0.variant AND w1.sequence = w0.sequence
+ 1 AND w1.pinyin LIKE 'fu_'))
ON(w2.wid = w1.wid AND w2.variant = w1.variant AND w2.sequence = w1.sequence
+ 1 AND w2.pinyin LIKE 'ji_'))
ON(w3.wid = w2.wid AND w3.variant = w2.variant AND w3.sequence = w2.sequence
+ 1 AND w3.pinyin LIKE 'guan_'))
ON(w4.wid = w3.wid AND w4.variant = w3.variant AND w4.sequence = w3.sequence
+ 1 AND w4.pinyin LIKE 'kai_'))
ON(w5.wid = w4.wid AND w5.variant = w4.variant AND w5.sequence = w4.sequence
+ 1 AND w5.pinyin LIKE 'fang_'))
ON(w6.wid = w5.wid AND w6.variant = w5.variant AND w6.sequence = w5.sequence
+ 1 AND w6.pinyin LIKE 'xi_'))
ON(w7.wid = w6.wid AND w7.variant = w6.variant AND w7.sequence = w6.sequence
+ 1 AND w7.pinyin LIKE 'tong_'))
ON(w8.wid = w7.wid AND w8.variant = w7.variant))


WHERE
w0.wid > 0 AND
w0.pinyin = 'zheng4' AND
w0.def_exists = 't' AND
w0.sequence = 0
GROUP BY
w8.wid,
w8.variant,
w8.num_variants,
w8.page_order,
w0.sequence ,
w1.sequence ,
w2.sequence ,
w3.sequence ,
w4.sequence ,
w5.sequence ,
w6.sequence ,
w7.sequence
ORDER BY
w8.page_order;


And here's the output of explain analyze:



                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=54.26..54.26 rows=1 width=43) (actual time=14.916..14.917 rows=1
loops=1)
   Sort Key: w8.page_order
   ->  HashAggregate  (cost=54.24..54.25 rows=1 width=43) (actual
time=14.891..14.892 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..54.10 rows=4 width=43) (actual
time=3.676..14.446 rows=12 loops=1)
               ->  Nested Loop  (cost=0.00..48.09 rows=1 width=64) (actual
time=3.638..14.269 rows=1 loops=1)
                     Join Filter: ("outer"."sequence" = ("inner"."sequence"
+ 1))
                     ->  Nested Loop  (cost=0.00..42.06 rows=1 width=56)
(actual time=3.581..14.181 rows=1 loops=1)
                           Join Filter: (("inner".wid = "outer".wid) AND
("inner".variant = "outer".variant) AND ("inner"."sequence" =
("outer"."sequence" + 1)) AND ("outer"."sequence" = ("inner"."sequence" +
1)))
                           ->  Nested Loop  (cost=0.00..36.05 rows=1
width=48) (actual time=2.152..12.443 rows=1 loops=1)
                                 Join Filter: (("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
                                 ->  Nested Loop  (cost=0.00..30.03 rows=1
width=40) (actual time=2.104..12.368 rows=1 loops=1)
                                       Join Filter: (("outer".variant =
"inner".variant) AND ("outer".wid = "inner".wid) AND ("inner"."sequence" =
("outer"."sequence" + 1)))
                                       ->  Nested Loop  (cost=0.00..24.02
rows=1 width=32) (actual time=2.040..11.226 rows=1 loops=1)
                                             Join Filter:
("outer"."sequence" = ("inner"."sequence" + 1))
                                             ->  Nested Loop
(cost=0.00..18.00 rows=1 width=24) (actual time=1.979..11.147 rows=1
loops=1)
                                                   Join Filter:
(("inner".variant = "outer".variant) AND ("inner".wid = "outer".wid))
                                                   ->  Nested Loop
(cost=0.00..12.00 rows=1 width=16) (actual time=0.258..8.765 rows=1 loops=1)
                                                         ->  Index Scan
using pinyin_index on words w3  (cost=0.00..5.99 rows=1 width=8) (actual
time=0.084..2.399 rows=304 loops=1)
                                                               Index Cond:
(((pinyin)::text >= 'guan'::character varying) AND ((pinyin)::text <
'guao'::character varying))
                                                               Filter:
((pinyin)::text ~~ 'guan_'::text)
                                                         ->  Index Scan
using words2_pkey on words w1  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.018..0.018 rows=0 loops=304)
                                                               Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
                                                               Filter:
((pinyin)::text ~~ 'fu_'::text)
                                                   ->  Index Scan using
pinyin_index on words w7  (cost=0.00..5.99 rows=1 width=8) (actual
time=0.025..1.863 rows=338 loops=1)
                                                         Index Cond:
(((pinyin)::text >= 'tong'::character varying) AND ((pinyin)::text <
'tonh'::character varying))
                                                         Filter:
((pinyin)::text ~~ 'tong_'::text)
                                             ->  Index Scan using
words2_pkey on words w6  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.037..0.052 rows=1 loops=1)
                                                   Index Cond: ((w6.wid =
"outer".wid) AND (w6.variant = "outer".variant))
                                                   Filter: ((pinyin)::text
~~ 'xi_'::text)
                                       ->  Index Scan using pinyin_index on
words w4  (cost=0.00..5.99 rows=1 width=8) (actual time=0.028..0.874
rows=165 loops=1)
                                             Index Cond: (((pinyin)::text >=
'kai'::character varying) AND ((pinyin)::text < 'kaj'::character varying))
                                             Filter: ((pinyin)::text ~~
'kai_'::text)
                                 ->  Index Scan using words2_pkey on words
w2  (cost=0.00..6.00 rows=1 width=8) (actual time=0.023..0.047 rows=1
loops=1)
                                       Index Cond: (("outer".wid = w2.wid)
AND ("outer".variant = w2.variant))
                                       Filter: ((pinyin)::text ~~
'ji_'::text)
                           ->  Index Scan using pinyin_index on words w5
(cost=0.00..5.99 rows=1 width=8) (actual time=0.025..1.436 rows=259 loops=1)
                                 Index Cond: (((pinyin)::text >=
'fang'::character varying) AND ((pinyin)::text < 'fanh'::character varying))
                                 Filter: ((pinyin)::text ~~ 'fang_'::text)
                     ->  Index Scan using words2_pkey on words w0
(cost=0.00..6.01 rows=1 width=8) (actual time=0.030..0.058 rows=1 loops=1)
                           Index Cond: (("outer".wid = w0.wid) AND (w0.wid >
0) AND ("outer".variant = w0.variant))
                           Filter: (((pinyin)::text = 'zheng4'::text) AND
(def_exists = true) AND ("sequence" = 0))
               ->  Index Scan using words2_pkey on words w8
(cost=0.00..6.00 rows=1 width=27) (actual time=0.019..0.103 rows=12 loops=1)
                     Index Cond: ((w8.wid = "outer".wid) AND (w8.variant =
"outer".variant))
Total runtime: 15.987 ms
(44 rows)

Time: 838.446 ms


As you can see, there still appears to be some issue with planning taking
~820ms, while the actual query should only take 16ms (and as you can see i
used the "folding" technique which takes some load off of the planner.


Here is the old query/analysis as run right now:
SELECT
   w8.wid,
   w8.variant,
   w8.num_variants,
   sum_text(w8.unicode) as unicodes,
   sum_text(w8.pinyin) as pinyins
FROM
   words as w0,     words as w1,
   words as w2,     words as w3,
   words as w4,     words as w5,
   words as w6,     words as w7,
   words as w8
WHERE
   w0.wid > 0 AND
   w0.pinyin = 'zheng4' AND
   w0.def_exists = 't' AND
   w0.sequence = 0 AND
   w1.wid = w0.wid AND
   w1.pinyin LIKE 'fu_' AND
   w1.variant = w0.variant AND
   w1.sequence = (w0.sequence + 1) AND
   w2.wid = w1.wid AND
   w2.pinyin LIKE 'ji_' AND
   w2.variant = w1.variant AND
   w2.sequence = (w1.sequence + 1) AND
   w3.wid = w2.wid AND
   w3.pinyin LIKE 'guan_' AND
   w3.variant = w2.variant AND
   w3.sequence = (w2.sequence + 1) AND
   w4.wid = w3.wid AND
   w4.pinyin LIKE 'kai_' AND
   w4.variant = w3.variant AND
   w4.sequence = (w3.sequence + 1) AND
   w5.wid = w4.wid AND
   w5.pinyin LIKE 'fang_' AND
   w5.variant = w4.variant AND
   w5.sequence = (w4.sequence + 1) AND
   w6.wid = w5.wid AND
   w6.pinyin LIKE 'xi_' AND
   w6.variant = w5.variant AND
   w6.sequence = (w5.sequence + 1) AND
   w7.wid = w6.wid AND
   w7.pinyin LIKE 'tong_' AND
   w7.variant = w6.variant AND
   w7.sequence = (w6.sequence + 1) AND
   w8.wid = w7.wid AND
   w8.variant = w7.variant
GROUP BY
   w8.wid,
   w8.variant,
   w8.num_variants,
   w8.page_order ,
   w0.sequence ,
   w1.sequence ,
   w2.sequence ,
   w3.sequence ,
   w4.sequence ,
   w5.sequence ,
   w6.sequence ,
   w7.sequence
ORDER BY
   w8.page_order;

wepy=> EXPLAIN ANALYZE wepy=> explain ANALYZE
                                                                     QUERY
PLAN
wepy-> wepy->

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
wepy-> wepy->  Sort  (cost=54.15..54.16 rows=1 width=43) (actual
time=1043.148..1043.149 rows=1 loops=1)
wepy-> wepy->    Sort Key: w8.page_order
wepy-> wepy->    ->  HashAggregate  (cost=54.14..54.14 rows=1 width=43)
(actual time=1043.121..1043.122 rows=1 loops=1)
wepy-> wepy->          ->  Nested Loop  (cost=0.00..54.10 rows=1 width=43)
(actual time=9.627..1042.565 rows=12 loops=1)
wepy-> wepy->                Join Filter: (("inner"."sequence" =
("outer"."sequence" + 1)) AND ("outer"."sequence" = ("inner"."sequence" +
1)))
wepy-> wepy->                ->  Nested Loop  (cost=0.00..48.08 rows=1
width=83) (actual time=9.557..1041.784 rows=12 loops=1)
wepy-> wepy->                      Join Filter: (("outer".wid = "inner".wid)
AND ("outer".variant = "inner".variant) AND ("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
wepy-> wepy->                      ->  Nested Loop  (cost=0.00..11.99 rows=1
width=16) (actual time=3.152..290.176 rows=6 loops=1)
wepy-> wepy->                            Join Filter: (("inner".wid =
"outer".wid) AND ("inner".variant = "outer".variant) AND ("inner"."sequence"
= ("outer"."sequence" + 1)))
wepy-> wepy->                            ->  Index Scan using pinyin_index
on words w4  (cost=0.00..5.99 rows=1 width=8) (actual time=0.084..1.233
rows=165 loops=1)
wepy-> wepy->                                  Index Cond: (((pinyin)::text
 >= 'kai'::character varying) AND ((pinyin)::text < 'kaj'::character
varying))
wepy-> wepy->                                  Filter: ((pinyin)::text ~~
'kai_'::text)
wepy-> wepy->                            ->  Index Scan using pinyin_index
on words w5  (cost=0.00..5.99 rows=1 width=8) (actual time=0.018..1.411
rows=259 loops=165)
wepy-> wepy->                                  Index Cond: (((pinyin)::text
 >= 'fang'::character varying) AND ((pinyin)::text < 'fanh'::character
varying))
wepy-> wepy->                                  Filter: ((pinyin)::text ~~
'fang_'::text)
wepy-> wepy->                      ->  Nested Loop  (cost=0.00..36.06 rows=1
width=67) (actual time=4.500..125.184 rows=12 loops=6)
wepy-> wepy->                            ->  Nested Loop  (cost=0.00..30.05
rows=1 width=40) (actual time=4.446..125.003 rows=1 loops=6)
wepy-> wepy->                                  Join Filter:
("inner"."sequence" = ("outer"."sequence" + 1))
wepy-> wepy->                                  ->  Nested Loop
(cost=0.00..24.03 rows=1 width=32) (actual time=4.391..124.920 rows=1
loops=6)
wepy-> wepy->                                        ->  Nested Loop
(cost=0.00..18.01 rows=1 width=24) (actual time=4.339..124.781 rows=3
loops=6)
wepy-> wepy->                                              Join Filter:
(("inner".variant = "outer".variant) AND ("inner".wid = "outer".wid) AND
("inner"."sequence" = ("outer"."sequence" + 1)))
wepy-> wepy->                                              ->  Nested Loop
(cost=0.00..12.00 rows=1 width=16) (actual time=0.154..10.898 rows=18
loops=6)
wepy-> wepy->                                                    ->  Index
Scan using pinyin_index on words w3  (cost=0.00..5.99 rows=1 width=8)
(actual time=0.027..2.358 rows=304 loops=6)
wepy-> wepy->                                                          Index
Cond: (((pinyin)::text >= 'guan'::character varying) AND ((pinyin)::text <
'guao'::character varying))
wepy-> wepy->
Filter: ((pinyin)::text ~~ 'guan_'::text)
wepy-> wepy->                                                    ->  Index
Scan using words2_pkey on words w6  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.025..0.025 rows=0 loops=1824)
wepy-> wepy->                                                          Index
Cond: ((w6.wid = "outer".wid) AND (w6.variant = "outer".variant))
wepy-> wepy->
Filter: ((pinyin)::text ~~ 'xi_'::text)
wepy-> wepy->                                              ->  Index Scan
using pinyin_index on words w7  (cost=0.00..5.99 rows=1 width=8) (actual
time=0.017..5.788 rows=338 loops=108)
wepy-> wepy->                                                    Index Cond:
(((pinyin)::text >= 'tong'::character varying) AND ((pinyin)::text <
'tonh'::character varying))
wepy-> wepy->                                                    Filter:
((pinyin)::text ~~ 'tong_'::text)
wepy-> wepy->                                        ->  Index Scan using
words2_pkey on words w0  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.026..0.035 rows=0 loops=18)
wepy-> wepy->                                              Index Cond:
(("outer".wid = w0.wid) AND (w0.wid > 0) AND ("outer".variant = w0.variant))
wepy-> wepy->                                              Filter:
(((pinyin)::text = 'zheng4'::text) AND (def_exists = true) AND ("sequence" =
0))
wepy-> wepy->                                  ->  Index Scan using
words2_pkey on words w1  (cost=0.00..6.00 rows=1 width=8) (actual
time=0.021..0.047 rows=1 loops=6)
wepy-> wepy->                                        Index Cond:
(("outer".wid = w1.wid) AND ("outer".variant = w1.variant))
wepy-> wepy->                                        Filter: ((pinyin)::text
~~ 'fu_'::text)
wepy-> wepy->                            ->  Index Scan using words2_pkey on
words w8  (cost=0.00..6.00 rows=1 width=27) (actual time=0.021..0.085
rows=12 loops=6)
wepy-> wepy->                                  Index Cond: ((w8.wid =
"outer".wid) AND (w8.variant = "outer".variant))
wepy-> wepy->                ->  Index Scan using words2_pkey on words w2
(cost=0.00..6.00 rows=1 width=8) (actual time=0.021..0.046 rows=1 loops=12)
wepy-> wepy->                      Index Cond: ((w2.wid = "outer".wid) AND
(w2.variant = "outer".variant))
wepy-> wepy->                      Filter: ((pinyin)::text ~~ 'ji_'::text)
wepy-> wepy->  Total runtime: 1044.239 ms
wepy-> wepy-> (43 rows)
wepy-> wepy->
wepy-> wepy-> Time: 3939.938 ms



Even the total runtime has improved dramatically with the use of explicit
joins and folding.  However, I still need to cut the query time down much
more... My last resort is to dig through the source code and see if I can
understand how the planner works, but I suspect it's thousands of lines of
code :\

BTW, I have since upgraded to 7.4.2 (from 7.4.1), and you can verify these
planner times even with an empty table.  The table spec is below:

             Table "public.words"
   Column    |         Type         | Modifiers
--------------+----------------------+-----------
wid          | integer              | not null
sequence     | smallint             | not null
variant      | smallint             | not null
char_count   | smallint             | not null
unicode      | character varying(5) | not null
pinyin       | character varying(8) | not null
simpvar      | character varying(5) |
zvar         | character varying(5) |
compatvar    | character varying(5) |
def_exists   | boolean              | not null
num_variants | smallint             |
page_order   | integer              |
pinyins      | character varying    |
unicodes     | character varying    |
Indexes:
   "words2_pkey" primary key, btree (wid, variant, "sequence")
   "page_index" btree (page_order)
   "pinyin_index" btree (pinyin)
   "unicode_index" btree (unicode)

_________________________________________________________________
Get rid of annoying pop-up ads with the new MSN Toolbar � FREE!
http://clk.atdmt.com/AVE/go/onm00200414ave/direct/01/


Re: severe performance issue with planner

От
Tom Lane
Дата:
"Eric Brown" <bigwhitecow@hotmail.com> writes:
> Here's the query as I have changed it now:

Now that you've switched to JOIN syntax, you can cut the planning time
to nil by setting join_collapse_limit to 1.  See
http://www.postgresql.org/docs/7.4/static/explicit-joins.html

            regards, tom lane