Обсуждение: Can somebody help me to optimize this huge query?
Hi, here is a query on two tables whith lots of self joins which just takes hours to complete on 7.2.1. I use multi dimensional indices which are shown in the explain comments. My question is how can I use explicit join syntax to let the planner do better. I think Geoq does not match yet because there are only 2 tables. The schema behind models abtract document objects. Here is the query: SELECT DISTINCT t_sek.docindex, t_sek.envelope, bt.oid, bt.time FROM boxinfo bt, boxinfo bd, boxinfo bo, docobj t_sek, docobj t_pgr, docobj t_sta, docobj t_sol, docobj d_pnr, docobj d_sta,docobj o_sek, docobj o_pgr, docobj o_pnr WHERE t_sek.docspec=124999684 and t_pgr.docspec=124999684 and t_sol.docspec=124999684 and t_sta.docspec=124999684 and d_pnr.docspec=15378692 and d_sta.docspec=15378692 and o_sek.docspec=125075754 and o_pgr.docspec=125075754 and o_pnr.docspec=125075754and bt.community=15042052 and bd.community=15042052 and bo.community=15042052 and bt.member=111459733 and bd.member=111459733 and bo.member=111459733 and bt.hide=FALSE and bd.hide=FALSE and bo.hide=FALSEand o_sek.attrid=1 and o_pgr.attrid=4 and t_sek.attrid=0 and t_pgr.attrid=2 and t_sta.attrid=9 and t_sol.attrid=4 and d_pnr.attrid=6and d_sta.attrid=16 and abstime(bd.time)::date > t_sol.val_date and t_sol.val_date <= now()::date and o_sek.val_str=t_sek.val_str and o_pgr.val_str=t_pgr.val_str and o_pnr.val_str=d_pnr.val_str and t_sta.val_str=d_sta.val_strand o_sek.envelope=o_pgr.envelope and o_sek.envelope=o_pnr.envelope and o_sek.docindex=o_pgr.docindex and o_sek.docindex=o_pnr.docindexand t_sek.envelope=t_pgr.envelope and t_sek.envelope=t_sta.envelope and t_sek.envelope=t_sol.envelope and t_sek.docindex=t_pgr.docindexand t_sek.docindex=t_sta.docindex and t_sek.docindex=t_sol.docindex and d_pnr.envelope=d_sta.envelope and d_pnr.docindex=d_sta.docindex and bt.envelope=t_sek.envelope and bd.envelope=d_pnr.envelope and bo.envelope=o_sek.envelope Here is what explain says: Unique (cost=3395.39..3395.40 rows=1 width=212) -> Sort (cost=3395.39..3395.39 rows=1 width=212) -> Nested Loop (cost=0.00..3395.38 rows=1 width=212) -> Nested Loop (cost=0.00..3389.37 rows=1 width=190) -> Nested Loop (cost=0.00..3383.35 rows=1 width=168) -> Nested Loop (cost=0.00..3369.99rows=1 width=146) -> Nested Loop (cost=0.00..3363.98 rows=1 width=124) -> Nested Loop (cost=0.00..3149.05 rows=36 width=102) -> Nested Loop (cost=0.00..2727.76 rows=1 width=94) -> Nested Loop (cost=0.00..2719.21 rows=1 width=82) -> NestedLoop (cost=0.00..1813.58 rows=107 width=60) -> NestedLoop (cost=0.00..1392.83 rows=1 width=48) -> NestedLoop (cost=0.00..1325.31 rows=11 width=26) -> Index Scan using boxinfo_j_index on boxinfo bo (cost=0.00..419.68 rows=107 width=4) -> Index Scan using docobj_j_index on docobj o_sek (cost=0.00..8.44 rows=1 width=22) -> Index Scan using docobj_j_index on docobjo_pgr (cost=0.00..6.00 rows=1 width=22) -> Index Scanusing boxinfo_j_index on boxinfo bt (cost=0.00..419.68 rows=107 width=12) -> Index Scan using docobj_j_index on docobj t_sta (cost=0.00..8.44 rows=1 width=22) -> Index Scan using docobj_j_index on docobj t_sol (cost=0.00..6.01 rows=1 width=12) -> Index Scan using boxinfo_j_index on boxinfo bd (cost=0.00..419.68 rows=107 width=8) -> Index Scan using docobj_j_index on docobj t_pgr (cost=0.00..6.00 rows=1width=22) -> Index Scan using docobj_j_index on docobj o_pnr (cost=0.00..5.99 rows=1width=22) -> Index Scan using docobj_env_index on docobj d_pnr (cost=0.00..13.34 rows=2 width=22) -> Index Scan using docobj_j_index on docobj t_sek (cost=0.00..6.00 rows=1 width=22) -> Index Scan using docobj_j_index on docobj d_sta (cost=0.00..6.00 rows=1 width=22) Maybe there are just too many joins :/ Dirk
Dirk, > SELECT DISTINCT > t_sek.docindex, > t_sek.envelope, > bt.oid, > bt.time > FROM > boxinfo bt, boxinfo bd, boxinfo bo, > docobj t_sek, docobj t_pgr, docobj t_sta, docobj t_sol, > docobj d_pnr, docobj d_sta, > docobj o_sek, docobj o_pgr, docobj o_pnr > WHERE > t_sek.docspec=124999684 and > t_pgr.docspec=124999684 and > t_sol.docspec=124999684 and > t_sta.docspec=124999684 and <etc ...> Well, from the look of things, you have no problem with indexing ... the planner is using your indexes for everything. How long is it taking to return a response? All of those nested loops do give me the impression that you *might* be able to improve performance by forcing the planner using explicit joins and even subselects. This is quite an art form; I can't really give you specifics on it, but the idea is to use your knowledge of the database to reduce the size of each hash join before it's formed. The basic approach is to join tables small to large order. However, with an average 3-column primary key, I'm not certain that this is practical. I'm also not certain that the query planner is your bottleneck; that EXPLAIN plan looks pretty good to me. Also, have a look at your computer's memory, disk i/o and swap memory activity. If your machine is being forced to use the swap for query storage, that's going to slow you down a lot. A good utility for this on linux is memstat. Then you can play with postgres' sort_mem and buffer settings to try to make the whole thing happen in RAM. FInally, how big are these tables? If we're talking 200mb of data, and you're using IDE drives, you'll need a hardware upgrade -- like a UW SCSI RAID array. DIsk I/O has a *big* impact on database efficiency. -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco