Re: join selectivity
От | Mark Cave-Ayland |
---|---|
Тема | Re: join selectivity |
Дата | |
Msg-id | 9EB50F1A91413F4FA63019487FCD251DAD1F@WEBBASEDDC.webbasedltd.local обсуждение исходный текст |
Ответ на | Re: join selectivity (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: join selectivity
|
Список | pgsql-hackers |
Hi Tom, > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 13 December 2004 17:16 > To: Mark Cave-Ayland > Cc: strk@refractions.net; pgsql-hackers@postgresql.org; > postgis-devel@postgis.refractions.net > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > > For a query like this: > > > > SELECT id FROM table1, table2 > > WHERE table1.geom && table2.geom; > > > > RESTRICT selectivity is invoked twice and > > JOIN selectivity is invoked once. > > Hm, are you testing in a context where both tables have > indexes that are relevant to the && operator? > > The estimated join result size is computed from the join > selectivity estimate for the && operator. I was about to say > that restriction selectivity wouldn't be used at all, but on > second thought I believe that it would be invoked while > considering nestloop with inner indexscan plans. That is, > we'd consider > > NestLoop > Seq Scan on table2 > Indexscan on table1 > IndexCond: table1.geom && outer.geom > > and to determine the estimated cost of each indexscan, we > would invoke restriction selectivity for &&, with varRelid > referencing table1. Given this call you are supposed to treat > table2.geom as a constant of uncertain value, so the thing is > semantically sensible as a restriction clause for table1 > (whether you can produce a really good estimate is another > question :-(). > > Similarly, we'd consider the reverse plan with table1 as > outer, and that would give rise to another restriction > selectivity check with varRelid = table2. Just to clarify, here are the explain results from strk's query: strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom; NOTICE: LWGEOM_gist_joinsel called (returning 0.000005) QUERY PLAN ---------------------------------------------------------------------------- ----------------------------------Nested Loop (cost=3.27..105.84 rows=1 width=64) (actual time=0.217..39.305 rows=2700 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..28.32 rows=132 width=32)(actual time=0.081..1.111 rows=108 loops=1) -> Materialize (cost=3.27..3.52 rows=25 width=32) (actual time=0.001..0.011 rows=25 loops=108) -> Seq Scan on test1 (cost=0.00..3.25 rows=25 width=32) (actual time=0.043..0.129 rows=25 loops=1) Total runtime: 40.471 ms (6 rows) .... so with no indices the JOIN function is called once, RESTRICT never. I can understand this :) strk=# create index test2_gist on test2 using gist (geom gist_geometry_ops); CREATE INDEX strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom; NOTICE: LWGEOM_gist_joinsel called (returning 0.000005) NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity QUERY PLAN ---------------------------------------------------------------------------- ----------------------------------Nested Loop (cost=3.27..92.11 rows=1 width=64) (actual time=0.046..39.219 rows=2700 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..28.08 rows=108 width=32)(actual time=0.009..0.198 rows=108 loops=1) -> Materialize (cost=3.27..3.52 rows=25 width=32) (actual time=0.000..0.013 rows=25 loops=108) -> Seq Scan on test1 (cost=0.00..3.25 rows=25 width=32) (actual time=0.002..0.052 rows=25 loops=1) Total runtime: 40.307 ms (6 rows) ...with one index RESTRICT is called twice..... strk=# create index test1_gist on test1 using gist (geom gist_geometry_ops); CREATE INDEX strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom; NOTICE: LWGEOM_gist_joinsel called (returning 0.000005) NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity QUERY PLAN ---------------------------------------------------------------------------- ----------------------------------Nested Loop (cost=3.27..92.11 rows=1 width=64) (actual time=0.052..38.867 rows=2700 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..28.08 rows=108 width=32)(actual time=0.012..0.181 rows=108 loops=1) -> Materialize (cost=3.27..3.52 rows=25 width=32) (actual time=0.000..0.010 rows=25 loops=108) -> Seq Scan on test1 (cost=0.00..3.25 rows=25 width=32) (actual time=0.002..0.032 rows=25 loops=1) Total runtime: 40.027 ms (6 rows) ...and with two indices RESTRICT is called four times. The part I find confusing is why with one index that RESTRICT is called twice. Surely for a nested loop plan of the form you gave before which was: NestLoop Seq Scan on table2 Indexscan on table1 IndexCond: table1.geom && outer.geom Then if we just have an index on table1.geom this is the only plan that can be considered - surely we cannot begin to consider the reverse plan because there is no index on table2.geom. Based upon what you have suggested, I would have expected that with one index RESTRICT would be called once, and with two indices RESTRICT would be called twice. I was also thinking whether calling RESTRICT when comparing with an unknown value is worth doing at all, however I did think that perhaps if you are using a cast to perform an operation on two datatypes, then you may be able to imply something from the index, such as its physical size, and hint that the planner should use a particular index in preference for the other. Would it be correct to assume that if returning the same value for RESTRICT for both means that the planner will choose one at random? Many thanks, Mark. ------------------------ WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 791021 F: +44 (0)1752 791023 W: http://www.webbased.co.uk
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Jim Buttafuoco"Дата:
Сообщение: Re: [Fwd: Re: race condition for drop schema cascade?]