Обсуждение: Query Optimization with Partitioned Tables

Поиск
Список
Период
Сортировка

Query Optimization with Partitioned Tables

От
Kong Mansatiansin
Дата:
I recently shrunk over 90% of two huge tables on our production database
by partitioning 3-month recent data into 3 separate partitions each and
got rid of the rest of the data.  Despite the smaller sizes, our queries
now run slower.  The following complex query/view with 8 joined tables,
especially, takes over 3 times longer than it used to be.

Questions:
 * What is the best approach to hint the query planner to come up with
an optimal plan?
 * With join_collapse_limit=1, I was able to rearrange the join order in
the way I think is optimal, but still unable to make the plan use my
only predicate as a filter for selective rows in the first place.  How
to make the query planner use the last filter of the below plan as the
first action, which is what I believe to be the optimal plan?

Note: I just learned from this change that with the 3x2 additional
tables from the 2 new partitioned table the query of 8 becomes one with
14 joined tables and crosses the default geqo_threshold of 12.  This
causes the query to crawl over an hour, instead of a few seconds.  When
I set geqo_threshold to 14, it finishes in 30 seconds, which is still
over 3 times longer than it used to.

Release: PostgreSQL Server 8.2.5
Partitioned Tables:
  xjob_message
  xjob_message_nv

Query:
SET geqo_threshold = 16;
SET
SET join_collapse_limit = 1;
SET
EXPLAIN ANALYZE
 SELECT xtx.xid_seqid, xtx.xid, xtx.xid_date
  , tpl.name AS triplet, att.name AS attribute
  , mnv.value, xjb.xjob_seqid, xjb.xid_ref
  , xjb.start_time, xjb.end_time
  , xjb.acct_seqid, xjb.stacker
  , xjb.dest_phone_ref, xjb.src_phone_ref
  , src.phone_number AS src_phone, dst.phone_number AS dst_phone
 FROM xids xtx JOIN
  (xjob  xjb JOIN
    ((xjob_message msg JOIN
       xjob_message_nv mnv
        ON (msg.xjob_message_seqid = mnv.xjob_message_ref)
     ) JOIN lvalue_lu tpl ON (msg.triplet_ref = tpl.name_id)
      JOIN lvalue_lu att ON (mnv.name_id = att.name_id)
    ) ON (xjb.xjob_seqid = msg.xjob_ref)
    JOIN phonenumbers src ON (xjb.src_phone_ref = src.phone_seqid)
   JOIN phonenumbers dst ON (xjb.dest_phone_ref = dst.phone_seqid)
  ) ON (xtx.xid_seqid = xjb.xid_ref)
 WHERE xid = '2010.08.24.06.59.46-3232238101-010';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=315359340.99..511370148619.48 rows=191260 width=750)
(actual time=2577633.855..2757154.912 rows=36 loops=1)
   Hash Cond: (xjb.xid_ref = xtx.xid_seqid)
   ->  Hash Join  (cost=315359329.18..504126347706.90 rows=1931679730182
width=696) (actual time=1880998.113..2726039.838 rows=26788491 loops=1)
         Hash Cond: (xjb.dest_phone_ref = dst.phone_seqid)
         ->  Hash Join  (cost=315324005.11..467907317441.91
rows=1931679730182 width=681) (actual time=1878605.989..2650974.533
rows=26788491 loops=1)
               Hash Cond: (xjb.src_phone_ref = src.phone_seqid)
               ->  Hash Join  (cost=315288681.04..431688287176.93
rows=1931679730182 width=666) (actual time=1876430.631..2574354.447
rows=26889080 loops=1)
                     Hash Cond: (msg.xjob_ref = xjb.xjob_seqid)
                     ->  Hash Join  (cost=308498699.12..104201231123.46
rows=1931679730182 width=560) (actual time=695399.794..1180175.001
rows=26889080 loops=1)
                           Hash Cond: (mnv.name_id = att.name_id)
                           ->  Hash Join
(cost=308498681.31..76156965143.36 rows=2327324976123 width=550) (actual
time=695399.139..1112463.270 rows=26889080 loops=1)
                                 Hash Cond: (msg.triplet_ref =
tpl.name_id)
                                 ->  Merge Join
(cost=308498663.49..42368692881.83 rows=2804005995329 width=540) (actual
time=695398.629..1045250.141 rows=26889080 loops=1)
                                       Merge Cond:
(msg.xjob_message_seqid = mnv.xjob_message_ref)
                                       ->  Sort
(cost=112341184.27..112393328.48 rows=20857681 width=24) (actual
time=155624.502..231644.023 rows=20993357 loops=1)
                                             Sort Key:
msg.xjob_message_seqid
                                             ->  Append
(cost=100000000.00..109306471.13 rows=20857681 width=24) (actual
time=25.252..94521.657 rows=20993357 loops=1)
                                                   ->  Seq Scan on
xjob_message msg  (cost=100000000.00..100000022.30 rows=1230 width=24)
(actual time=0.003..0.003 rows=0 loops=1)
                                                   ->  Index Scan using
xjob_message_p201007_pkey on xjob_message_p201007 msg
(cost=0.00..327643.28 rows=9551599 width=24) (actual
time=25.244..20871.816 rows=9551599 loops=1)
                                                   ->  Index Scan using
xjob_message_p201008_pkey on xjob_message_p201008 msg
(cost=0.00..2135648.37 rows=9549512 width=24) (actual
time=19.564..22177.935 rows=9686418 loops=1)
                                                   ->  Index Scan using
ix_xjob_message_p201006_2 on xjob_message_p201006 msg
(cost=0.00..6843157.18 rows=1755340 width=24) (actual
time=19.432..5899.709 rows=1755340 loops=1)
                                       ->  Sort
(cost=196157479.22..196224696.81 rows=26887035 width=532) (actual
time=539774.100..721084.551 rows=26889080 loops=1)
                                             Sort Key:
mnv.xjob_message_ref
                                             ->  Append
(cost=100000000.00..186314731.34 rows=26887035 width=532) (actual
time=48.524..418540.359 rows=26889080 loops=1)
                                                   ->  Seq Scan on
xjob_message_nv mnv  (cost=100000000.00..100000011.40 rows=140
width=532) (actual time=0.005..0.005 rows=0 loops=1)
                                                   ->  Index Scan using
ix_xjob_message_nv_p201006_1 on xjob_message_nv_p201006 mnv
(cost=0.00..12091979.65 rows=3100747 width=51) (actual
time=48.515..105184.555 rows=3100747 loops=1)
                                                   ->  Index Scan using
ix_xjob_message_nv_p201007_1 on xjob_message_nv_p201007 mnv
(cost=0.00..46018341.15 rows=11793590 width=99) (actual
time=54.513..122517.400 rows=11793590 loops=1)
                                                   ->  Index Scan using
ix_xjob_message_nv_p201008_1 on xjob_message_nv_p201008 mnv
(cost=0.00..28204399.14 rows=11992558 width=101) (actual
time=9.237..132125.953 rows=11994743 loops=1)
                                 ->  Hash  (cost=15.74..15.74 rows=166
width=26) (actual time=0.479..0.479 rows=166 loops=1)
                                       ->  Index Scan using
uq_lvalue_lu_2 on lvalue_lu tpl  (cost=0.00..15.74 rows=166 width=26)
(actual time=0.008..0.224 rows=166 loops=1)
                           ->  Hash  (cost=15.74..15.74 rows=166
width=26) (actual time=0.597..0.597 rows=166 loops=1)
                                 ->  Index Scan using uq_lvalue_lu_2 on
lvalue_lu att  (cost=0.00..15.74 rows=166 width=26) (actual
time=0.093..0.332 rows=166 loops=1)
                     ->  Hash  (cost=6477316.38..6477316.38
rows=10068203 width=114) (actual time=1181023.780..1181023.780
rows=10074305 loops=1)
                           ->  Index Scan using xjob_pkey on xjob xjb
(cost=0.00..6477316.38 rows=10068203 width=114) (actual
time=39.068..1158862.644 rows=10074305 loops=1)
               ->  Hash  (cost=26512.52..26512.52 rows=704924 width=23)
(actual time=2173.384..2173.384 rows=705106 loops=1)
                     ->  Index Scan using ix_phonenumbers_2 on
phonenumbers src  (cost=0.00..26512.52 rows=704924 width=23) (actual
time=0.021..1132.041 rows=705106 loops=1)
         ->  Hash  (cost=26512.52..26512.52 rows=704924 width=23)
(actual time=2390.358..2390.358 rows=705106 loops=1)
               ->  Index Scan using ix_phonenumbers_2 on phonenumbers
dst  (cost=0.00..26512.52 rows=704924 width=23) (actual
time=0.096..1351.941 rows=705106 loops=1)
   ->  Hash  (cost=11.79..11.79 rows=1 width=54) (actual
time=31.754..31.754 rows=1 loops=1)
         ->  Index Scan using ix_xids_1 on xids xtx  (cost=0.00..11.79
rows=1 width=54) (actual time=31.730..31.734 rows=1 loops=1)
               Index Cond: ((xid)::text =
'2010.08.24.06.59.46-3232238101-010'::text)
 Total runtime: 2761143.201 ms
(42 rows)




Re: Query Optimization with Partitioned Tables

От
Willy-Bas Loos
Дата:
> Note: I just learned from this change that with the 3x2 additional
> tables from the 2 new partitioned table the query of 8 becomes one with
> 14 joined tables

It seems then that you did not partition your tables in such a way
that your query only needs to scan one of them?
That is where partitioning becomes worth wile: when you have to scan
fewer and smaller tables.
You need to have a check constraint that enforces the partitioning
attribute, and you need to set constraint_exclusion to true in
postgresql.conf (or to "partition" if you use 8.4).
Then in your query if you use a predicate that corresponds to the
check constraint, the planner will skip the other tables, since it
knows that the desired data cannot be in there.

So for example if you make one table per month, you partition by the
field "the_date" (date). (make one empty parent table and inheriting
child tables)
You have a check constraint that says (the_date >= '20100101'::date
AND the_date < '20100201'::date)
Then, when you query for data in January of 2010 only this table would
be scanned. So that should not add any more tables to your query plan.

maybe a bit basic, but i hop e it helps.

cheers,

WBL

--
"Patriotism is the conviction that your country is superior to all
others because you were born in it." -- George Bernard Shaw

Re: Query Optimization with Partitioned Tables

От
Kong Mansatiansin
Дата:
WBL,
Unfortunately, with the way it originally designed, there is no column
that our existing queries and views use as predicate to take advantage
of constraint exclusion.  That said, it might imply that these tables
are not good candidates for partitioning in the first place.  The only
main purpose of this for me was to facilitate data pruning without
having to deal with slow DELETE and blocking VACUUM processes.

As a last resource, I was trying to see if I could direct the query plan
to do what I think is the most optimal path.  With
join_collapse_limit=1, I was about to rearrange the join order but the
query planner would still not use the only predicate (xid='...') to
filter rows as the first selective step, despite the fact/stats that it
could eliminate 26M rows from xids (xtx) into only one row (using
ix_xids_1).  Would you be able to advise me on my 2nd question about how
to make the query plan more selective.

(Not sure if you can easily view my original message with the lengthy
query plan.  I just don't want to clutter this email with repeated
cryptic text, but please let me know I should include it on my next
email.)

Thank you much,
-Kong

On Thu, 2010-09-02 at 10:15 +0200, Willy-Bas Loos wrote:
> > Note: I just learned from this change that with the 3x2 additional
> > tables from the 2 new partitioned table the query of 8 becomes one with
> > 14 joined tables
>
> It seems then that you did not partition your tables in such a way
> that your query only needs to scan one of them?
> That is where partitioning becomes worth wile: when you have to scan
> fewer and smaller tables.
> You need to have a check constraint that enforces the partitioning
> attribute, and you need to set constraint_exclusion to true in
> postgresql.conf (or to "partition" if you use 8.4).
> Then in your query if you use a predicate that corresponds to the
> check constraint, the planner will skip the other tables, since it
> knows that the desired data cannot be in there.
>
> So for example if you make one table per month, you partition by the
> field "the_date" (date). (make one empty parent table and inheriting
> child tables)
> You have a check constraint that says (the_date >= '20100101'::date
> AND the_date < '20100201'::date)
> Then, when you query for data in January of 2010 only this table would
> be scanned. So that should not add any more tables to your query plan.
>
> maybe a bit basic, but i hop e it helps.
>
> cheers,
>
> WBL
>