Re: join selectivity
От | Mark Cave-Ayland |
---|---|
Тема | Re: join selectivity |
Дата | |
Msg-id | 9EB50F1A91413F4FA63019487FCD251DAD23@WEBBASEDDC.webbasedltd.local обсуждение исходный текст |
Ответ на | Re: join selectivity (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: join selectivity
Re: join selectivity |
Список | pgsql-hackers |
Hi Tom, > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 16 December 2004 17:56 > To: Mark Cave-Ayland > Cc: strk@refractions.net; pgsql-hackers@postgresql.org; > postgis-devel@postgis.refractions.net > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > > OK I think I've misunderstood something more fundamental > than that; I > > understood from what you said that the RESTRICT clause is used to > > evaluate the cost of table1.geom && table2.geom against > table2.geom && > > table1.geom (i.e. it is used to help decide which one should be seq > > scanned and which should be index scanned in a nested loop > node). So > > is the trick here for a commutative operator to simply > return the same > > value for both cases, as other factors such as index size costs are > > considered elsewhere? > > If the operator is commutative then the result should be too. > Really you should not be thinking about costs at all when > coding a selectivity > estimator: its charter is to estimate how many rows will > match the condition, not to estimate costs per se. > > Note however that these aren't really the "same case", as > you'd be referencing two different columns with presumably > different statistics. Well at the moment PostGIS has a RESTRICT function that takes an expression of the form <column> <op> <constant> where column is a column consisting of geometries and constant is a bounding box. This is based upon histogram statistics and works well. The surprise came when writing the JOIN function and finding that the RESTRICT clause was being called. Now I understand that this is part of the nested loop and not the JOIN so that helps. But in the case of <column> <op> <unknown constant>, if we're estimating the number of rows to return then that becomes harder - I'm thinking pick a rectangle half the area of the statistical rectangle for the column and return the number of rows within that area. > You should probably read the existing selectivity estimators > in utils/adt/selfuncs.c. There's a fair amount of > infrastructure code in that file that you could borrow. > (It's not currently exported because it tends to change from > version to version, but maybe we could think about making > some of the routines global.) OK will try and find some inspiration within. Many thanks, Mark. ------------------------ WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 791021 F: +44 (0)1752 791023 W: http://www.webbased.co.uk
В списке pgsql-hackers по дате отправления: