Обсуждение: Can somebody help me to optimize this huge query?

Поиск
Список
Период
Сортировка

Can somebody help me to optimize this huge query?

От
Dirk Lutzebaeck
Дата:
Hi,

here is a query on two tables whith lots of self joins which just
takes hours to complete on 7.2.1. I use multi dimensional
indices which are shown in the explain comments. My question is how
can I use explicit join syntax to let the planner do better. I
think Geoq does not match yet because there are only 2 tables. The
schema behind models abtract document objects.

Here is the query:

SELECT DISTINCT t_sek.docindex, t_sek.envelope, bt.oid, bt.time
FROM boxinfo bt, boxinfo bd, boxinfo bo, docobj t_sek, docobj t_pgr, docobj t_sta, docobj t_sol, docobj d_pnr, docobj
d_sta,docobj o_sek, docobj o_pgr, docobj o_pnr
 
WHERE t_sek.docspec=124999684 and t_pgr.docspec=124999684 and t_sol.docspec=124999684 and t_sta.docspec=124999684 and
 d_pnr.docspec=15378692 and d_sta.docspec=15378692 and o_sek.docspec=125075754 and o_pgr.docspec=125075754 and
o_pnr.docspec=125075754and bt.community=15042052 and bd.community=15042052 and bo.community=15042052 and
 
 bt.member=111459733 and bd.member=111459733 and bo.member=111459733 and bt.hide=FALSE and bd.hide=FALSE and
bo.hide=FALSEand
 
 o_sek.attrid=1 and o_pgr.attrid=4 and t_sek.attrid=0 and t_pgr.attrid=2 and t_sta.attrid=9 and t_sol.attrid=4 and
d_pnr.attrid=6and d_sta.attrid=16 and abstime(bd.time)::date > t_sol.val_date and t_sol.val_date <= now()::date and
 
 o_sek.val_str=t_sek.val_str and o_pgr.val_str=t_pgr.val_str and o_pnr.val_str=d_pnr.val_str and
t_sta.val_str=d_sta.val_strand
 
 o_sek.envelope=o_pgr.envelope and o_sek.envelope=o_pnr.envelope and o_sek.docindex=o_pgr.docindex and
o_sek.docindex=o_pnr.docindexand
 
 t_sek.envelope=t_pgr.envelope and t_sek.envelope=t_sta.envelope and t_sek.envelope=t_sol.envelope and
t_sek.docindex=t_pgr.docindexand t_sek.docindex=t_sta.docindex and t_sek.docindex=t_sol.docindex and
 
 d_pnr.envelope=d_sta.envelope and d_pnr.docindex=d_sta.docindex and
 bt.envelope=t_sek.envelope and  bd.envelope=d_pnr.envelope and bo.envelope=o_sek.envelope


Here is what explain says:
Unique  (cost=3395.39..3395.40 rows=1 width=212) ->  Sort  (cost=3395.39..3395.39 rows=1 width=212)       ->  Nested
Loop (cost=0.00..3395.38 rows=1 width=212)             ->  Nested Loop  (cost=0.00..3389.37 rows=1 width=190)
       ->  Nested Loop  (cost=0.00..3383.35 rows=1 width=168)                         ->  Nested Loop
(cost=0.00..3369.99rows=1 width=146)                               ->  Nested Loop  (cost=0.00..3363.98 rows=1
width=124)                                    ->  Nested Loop  (cost=0.00..3149.05 rows=36 width=102)
                       ->  Nested Loop  (cost=0.00..2727.76 rows=1 width=94)
    ->  Nested Loop  (cost=0.00..2719.21 rows=1 width=82)                                                       ->
NestedLoop  (cost=0.00..1813.58 rows=107 width=60)                                                             ->
NestedLoop  (cost=0.00..1392.83 rows=1 width=48)                                                                   ->
NestedLoop  (cost=0.00..1325.31 rows=11 width=26)
 ->  Index Scan using boxinfo_j_index on boxinfo bo  (cost=0.00..419.68 rows=107 width=4)
                                         ->  Index Scan using docobj_j_index on docobj o_sek  (cost=0.00..8.44 rows=1
width=22)                                                                  ->  Index Scan using docobj_j_index on
docobjo_pgr  (cost=0.00..6.00 rows=1 width=22)                                                             ->  Index
Scanusing boxinfo_j_index on boxinfo bt  (cost=0.00..419.68 rows=107 width=12)
            ->  Index Scan using docobj_j_index on docobj t_sta  (cost=0.00..8.44 rows=1 width=22)
                          ->  Index Scan using docobj_j_index on docobj t_sol  (cost=0.00..6.01 rows=1 width=12)
                                  ->  Index Scan using boxinfo_j_index on boxinfo bd  (cost=0.00..419.68 rows=107
 
width=8)                                     ->  Index Scan using docobj_j_index on docobj t_pgr  (cost=0.00..6.00
rows=1width=22)                                ->  Index Scan using docobj_j_index on docobj o_pnr  (cost=0.00..5.99
rows=1width=22)                         ->  Index Scan using docobj_env_index on docobj d_pnr  (cost=0.00..13.34 rows=2
width=22)                  ->  Index Scan using docobj_j_index on docobj t_sek  (cost=0.00..6.00 rows=1 width=22)
     ->  Index Scan using docobj_j_index on docobj d_sta  (cost=0.00..6.00 rows=1 width=22)
 


Maybe there are just too many joins :/

Dirk


Re: Can somebody help me to optimize this huge query?

От
Josh Berkus
Дата:
Dirk,

> SELECT DISTINCT
>   t_sek.docindex,
>   t_sek.envelope,
>   bt.oid,
>   bt.time
> FROM
>   boxinfo bt, boxinfo bd, boxinfo bo,
>   docobj t_sek, docobj t_pgr, docobj t_sta, docobj t_sol,
>   docobj d_pnr, docobj d_sta,
>   docobj o_sek, docobj o_pgr, docobj o_pnr
> WHERE
>   t_sek.docspec=124999684 and
>   t_pgr.docspec=124999684 and
>   t_sol.docspec=124999684 and
>   t_sta.docspec=124999684 and
<etc ...>

Well, from the look of things, you have no problem with indexing ... the
planner is using your indexes for everything.   How long is it taking to
return a response?

All of those nested loops do give me the impression that you *might* be able
to improve performance by forcing the planner using explicit joins and even
subselects.  This is quite an art form; I can't really give you specifics on
it, but the idea is to use your knowledge of the database to reduce the size
of each hash join before it's formed.  The basic approach is to join tables
small to large order.

However, with an average 3-column primary key, I'm not certain that this is
practical.  I'm also not certain that the query planner is your bottleneck;
that EXPLAIN plan looks pretty good to me.

Also, have a look at your computer's memory, disk i/o and swap memory
activity.   If your machine is being forced to use the swap for query
storage, that's going to slow you down a lot.  A good utility for this on
linux is memstat.   Then you can play with postgres' sort_mem and buffer
settings to try to make the whole thing happen in RAM.

FInally, how big are these tables?  If we're talking 200mb of data, and you're
using IDE drives, you'll need a hardware upgrade -- like a UW SCSI RAID
array.  DIsk I/O has a *big* impact on database efficiency.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology     josh@agliodbs.com   and data management solutions     (415) 565-7293  for law firms, small
businesses     fax 621-2533   and non-profit organizations.     San Francisco