Re: Project proposal/comments please - query optimization

Поиск
Список
Период
Сортировка
От Kim Bisgaard
Тема Re: Project proposal/comments please - query optimization
Дата
Msg-id 42FC59B3.1040902@dmi.dk
обсуждение исходный текст
Ответ на Re: Project proposal/comments please - query optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane wrote: <blockquote cite="mid22543.1123773443@sss.pgh.pa.us" type="cite"><pre wrap="">Kim Bisgaard <a
class="moz-txt-link-rfc2396E"href="mailto:kib+pg@dmi.dk"><kib+pg@dmi.dk></a> writes: </pre><blockquote
type="cite"><prewrap="">I have noticed a deficiency in the current query optimizer related to 
 
"full outer joins". Tom Lane has confirmed to me that it will not be 8.1 
material.   </pre></blockquote><pre wrap="">
... There
are related issues involving N-way joins that we're still not very
good at.
...

Consider this version of Kim Bisgaard's example:SELECT FROM a join (b full join c using (id)) using (id)
If A is small and B,C have indexes on ID then it is interesting to consider
a plan likeNest Loop    Scan A    Merge Full Join        Indexscan B using id = outer.id        Indexscan C using id =
outer.id
We are fairly far from being able to do this. generate_outer_join_implications
could easily be modified to generate derived equalities (I think it works to
allow a deduction against any clause not overlapping the outerjoin itself)
but the planner would want to evaluate them at the wrong level, and the
executor doesn't have support for passing the outer variable down more than
one level of join.  This is why the existing hack works only for equalities
to pseudoconstants.  We could maybe mark join RestrictInfos as "valid only
below xxx" and ignore them when processing a join that includes all of the
indicated rels?  Still not clear how you get the planner to recognize the
above as an inner indexscan situation though.</pre></blockquote> The query samples I gave was the smallest test I could
findto provoke the behavior. Tom is right in that the full case I am ideally want solved is of the form above with lots
(below20) of full outer joined tables. <br /><br /> I am still a little intrigued by the effect of substituting "full"
with"left" in my examples; maybe an alternative to Toms idea could be to work in the direction of treating "full" more
like"left/right"<br /><br /> There are some examples of my problems (on nightly builds of yesterday) at the bottom of
themail.<br /><br /> Regards,<br /> Kim.<br /><br /> obsdb=> explain analyse select
wmo_id,timeobs,temp_dry_at_2m,temp_grass<br/> from station<br />          join (temp_grass<br />                  full
jointemp_dry_at_2m using (station_id, timeobs)<br />          ) using (station_id)<br /> where wmo_id=6065 and
'2004-1-26:0' between startdate and enddate<br /> and timeobs = '2004-1-2 06:0:0';<br /><br />
                                                                                     QUERY PLAN<br />
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br
/> Hash Join  (cost=5.04..372928.92 rows=1349 width=28) (actual time=23986.480..46301.966 rows=1 loops=1)<br />    Hash
Cond:(COALESCE("outer".station_id, "outer".station_id) = "inner".station_id)<br />    ->  Merge Full Join 
(cost=0.00..338124.90rows=6957100 width=32) (actual time=23965.761..46281.380 rows=76 loops=1)<br />          Merge
Cond:(("outer".timeobs = "inner".timeobs) AND ("outer".station_id = "inner".station_id))<br />          Filter:
(COALESCE("outer".timeobs,"inner".timeobs) = '2004-01-02 06:00:00'::timestamp without time zone)<br />          -> 
IndexScan using temp_grass_idx on temp_grass  (cost=0.00..75312.59 rows=2406292 width=16) (actual time=12.436..4916.043
rows=2406292loops=1)<br />          ->  Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m  (cost=0.00..210390.85
rows=6957100width=16) (actual time=13.696..21363.054 rows=6956994 loops=1)<br />    ->  Hash  (cost=5.03..5.03
rows=1width=8) (actual time=19.612..19.612 rows=0 loops=1)<br />          ->  Index Scan using wmo_idx on station 
(cost=0.00..5.03rows=1 width=8) (actual time=19.586..19.591 rows=1 loops=1)<br />                Index Cond: ((wmo_id =
6065)AND ('2004-01-02 06:00:00'::timestamp without time zone >= startdate) AND ('2004-01-02 06:00:00'::timestamp
withouttime zone <= enddate))<br />  Total runtime: 46302.208 ms<br /> (11 rows)<br /><br /> obsdb=> explain
analyseselect wmo_id,timeobs,wind_dir_10m,temp_dry_at_2m,temp_grass<br /> from station<br /> join (temp_grass<br />
fulljoin temp_dry_at_2m using (station_id, timeobs)<br /> full join wind_dir_10m using (station_id, timeobs)<br /> )
using(station_id)<br /> where wmo_id=6065 and '2004-1-2 6:0' between startdate and enddate<br /> and timeobs =
'2004-1-206:0:0';<br /><br />                                                                                      
QUERYPLAN<br />
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br
/> Hash Join  (cost=1249443.54..1700082.70 rows=1392 width=40) (actual time=331437.803..384389.174 rows=1 loops=1)<br
/>   Hash Cond: (COALESCE(COALESCE("outer".station_id, "outer".station_id), "outer".station_id) =
"inner".station_id)<br/>    ->  Merge Full Join  (cost=1249438.51..1664170.44 rows=7178660 width=48) (actual
time=291406.613..384359.322rows=82 loops=1)<br />          Merge Cond: (("outer".station_id = "inner"."?column7?") AND
("outer".timeobs= "inner"."?column8?"))<br />          Filter: (COALESCE("inner"."?column8?", "outer".timeobs) =
'2004-01-0206:00:00'::timestamp without time zone)<br />          ->  Index Scan using wind_dir_10m_idx on
wind_dir_10m (cost=0.00..299534.38 rows=7178660 width=16) (actual time=15.194..52605.785 rows=7178639 loops=1)<br />
        ->  Sort  (cost=1249438.51..1266831.26 rows=6957100 width=32) (actual time=290255.163..302211.520
rows=8743745loops=1)<br />                Sort Key: COALESCE(temp_grass.station_id, temp_dry_at_2m.station_id),
COALESCE(temp_grass.timeobs,temp_dry_at_2m.timeobs)<br />                ->  Merge Full Join  (cost=0.00..337004.00
rows=6957100width=32) (actual time=28.909..73512.533 rows=8743745 loops=1)<br />                      Merge Cond:
(("outer".timeobs= "inner".timeobs) AND ("outer".station_id = "inner".station_id))<br />                      -> 
IndexScan using temp_grass_idx on temp_grass  (cost=0.00..75312.59 rows=2406292 width=16) (actual time=15.178..6088.732
rows=2406292loops=1)<br />                      ->  Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m 
(cost=0.00..210390.85rows=6957100 width=16) (actual time=13.694..25573.509 rows=6956994 loops=1)<br />    ->  Hash 
(cost=5.03..5.03rows=1 width=8) (actual time=28.609..28.609 rows=0 loops=1)<br />          ->  Index Scan using
wmo_idxon station  (cost=0.00..5.03 rows=1 width=8) (actual time=28.581..28.585 rows=1 loops=1)<br />               
IndexCond: ((wmo_id = 6065) AND ('2004-01-02 06:00:00'::timestamp without time zone >= startdate) AND ('2004-01-02
06:00:00'::timestampwithout time zone <= enddate))<br />  Total runtime: 384979.287 ms<br /> (16 rows)<br /><br
/><br/> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Determining return type of polymorphic function
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: obtaining row locking information