Обсуждение: incorrect index behaviour with rtree on box values
Testcase: create table boxtest (a box); create index boxtest_idx on boxtest using rtree (a); create function gen_data() returns void as ' begin for i in 1..200 loop insert into boxtest values (box(point((i*2-1)::float,0),point((i*2)::float,1))); end loop; return; end;' language plpgsql; select gen_data(); analyze boxtest; set enable_seqscan = true; select * from boxtest where a << '(3,0),(3,1)'::box; set enable_seqscan = false; select * from boxtest where a << '(3,0),(3,1)'::box; Those two selects at the end should clearly return the same result, a single row. In fact, what happens is that the second returns no rows at all; I tested this on 7.4.6, but others have confirmed this on everything from 7.3 to latest. The problem is that the semantics of the &< and &> operators for the box type are not what rtree needs for the "OverLeft" and "OverRight" slots of the operator class. Specifically, what rtree needs is this: if X << K or X &< K then for all A where A is a union of values including X, then A &< K (the designation "&<" is of course arbitrary, what matters is what operator is placed in the applicable slot of the opclass. Same goes for >> and &>.) This is because rtree converts (see rtstrat.c) the original "Left" operator to an "OverLeft" when comparing against internal nodes of the index, which contain values which are the union of all values in their subtree. In the testcase, the top node of the tree contains as its first entry a union value of the form (184,1),(1,0), which the scan then rejects since (184,1),(1,0) &< (3,0),(3,1) is false. I can see three possible approaches to fixing this: 1) change the semantics of &< and &> to match rtree's expectations 2) replace &< and &> in the opclass with operators that behave as rtree expects (this will have the side effect of rendering &< and &> un-indexable) 3) change rtree's behaviour in some way. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > The problem is that the semantics of the &< and &> operators for the box > type are not what rtree needs for the "OverLeft" and "OverRight" slots of > the operator class. This was observed nearly a year ago, see this thread: http://archives.postgresql.org/pgsql-general/2004-03/msg01135.php but apparently no one cares enough to fix it. Are you volunteering? regards, tom lane
On 2005-01-25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew - Supernews <andrew+nonews@supernews.com> writes: >> The problem is that the semantics of the &< and &> operators for the box >> type are not what rtree needs for the "OverLeft" and "OverRight" slots of >> the operator class. > > This was observed nearly a year ago, see this thread: > http://archives.postgresql.org/pgsql-general/2004-03/msg01135.php > > but apparently no one cares enough to fix it. Are you volunteering? Possibly. I don't feel comfortable with changing anything specific to the geometric operators, since (a) I don't actually use them (I discovered this issue when adding rtree support to a type of my own) and (b) the compatibility implications are obvious. But I think there is a solution that involves only changes to the rtree strategy code. Looking at the earlier discussion: it seems to have ended with the conclusion that &< should mean "does not extend to the right of", which matches the current implementation for box, but not for some other types. So for box values, we seem (and someone please correct me if I'm wrong) to have the following semantics: a << b - a is strictly left of b, i.e. a.right < b.left a &< b - a is no further right than b, i.e. a.right <= b.right a &> b - a is no further left than b, i.e. a.left >= b.left a >> b - a is strictly right of b, i.e. a.left > b.right For rtree to work as apparently intended, it needs four more operators, to use for inner nodes when the scan operator is one of the above four. However, a small modification to the way that the internal scan key is initialised should eliminate the requirement to explicitly specify these operators, which strikes me as the solution which preserves maximum compatibility. The four operators required are: NOT (a &> b) (used when the scan operator is (a << b)) NOT (a >> b) (used when the scan operator is (a &< b)) NOT (a << b) (used when the scan operator is (a &> b)) NOT (a &< b) (used when the scan operator is (a >> b)) (This won't fix rtree on contrib/seg or contrib/cube, but those appear to be broken already since they have different, and equally incorrect, definitions of &> and &<. Fixing those would require slightly more complex operators, such as NOT (a &> b OR a >> b) and so on. The more complex operators would work for box too, so it might be worth using them anyway, but I don't yet understand the scan key handling well enough to know if these can be constructed rather than supplied in the opclass.) Proof: Let V be the scan key, i.e. the value we are searching for in the index. Let U be a union over a set of values. Let X be some value for which X OP V holds. Consider an internal node entry with union U. We require that the following holds: if U contains some value X where X OP V holds, then U OP' V must be true. (But not the converse; U OP' V may be true even if no such X exists in U. However, we wish it to be false as much as possible for efficiency.) When OP is << : X << V, therefore X.right < V.left, therefore X.left < V.left therefore NOT (X &> V) If U contains X, then U &> V is true iff U.left >= V.left U.left <= min(E.left) for all elements E of U, and therefore for X if X in U So if X in U, then U.left <= X.left < V.left, and therefore NOT (U &> V) When OP is &< : X &< V, therefore X.right <= V.right, therefore X.left <= V.right therefore NOT (X >> V), and similar reasoning for U containing X as above. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
This has been saved for the 8.1 release: http://momjian.postgresql.org/cgi-bin/pgpatches2 --------------------------------------------------------------------------- Andrew - Supernews wrote: > On 2005-01-25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew - Supernews <andrew+nonews@supernews.com> writes: > >> The problem is that the semantics of the &< and &> operators for the box > >> type are not what rtree needs for the "OverLeft" and "OverRight" slots of > >> the operator class. > > > > This was observed nearly a year ago, see this thread: > > http://archives.postgresql.org/pgsql-general/2004-03/msg01135.php > > > > but apparently no one cares enough to fix it. Are you volunteering? > > Possibly. I don't feel comfortable with changing anything specific to the > geometric operators, since (a) I don't actually use them (I discovered > this issue when adding rtree support to a type of my own) and (b) the > compatibility implications are obvious. But I think there is a solution > that involves only changes to the rtree strategy code. > > Looking at the earlier discussion: it seems to have ended with the > conclusion that &< should mean "does not extend to the right of", which > matches the current implementation for box, but not for some other types. > > So for box values, we seem (and someone please correct me if I'm wrong) to > have the following semantics: > > a << b - a is strictly left of b, i.e. a.right < b.left > a &< b - a is no further right than b, i.e. a.right <= b.right > a &> b - a is no further left than b, i.e. a.left >= b.left > a >> b - a is strictly right of b, i.e. a.left > b.right > > For rtree to work as apparently intended, it needs four more operators, > to use for inner nodes when the scan operator is one of the above four. > However, a small modification to the way that the internal scan key is > initialised should eliminate the requirement to explicitly specify these > operators, which strikes me as the solution which preserves maximum > compatibility. The four operators required are: > > NOT (a &> b) (used when the scan operator is (a << b)) > NOT (a >> b) (used when the scan operator is (a &< b)) > NOT (a << b) (used when the scan operator is (a &> b)) > NOT (a &< b) (used when the scan operator is (a >> b)) > > (This won't fix rtree on contrib/seg or contrib/cube, but those appear to be > broken already since they have different, and equally incorrect, definitions > of &> and &<. Fixing those would require slightly more complex operators, > such as NOT (a &> b OR a >> b) and so on. The more complex operators would > work for box too, so it might be worth using them anyway, but I don't yet > understand the scan key handling well enough to know if these can be > constructed rather than supplied in the opclass.) > > Proof: > > Let V be the scan key, i.e. the value we are searching for in the index. > Let U be a union over a set of values. > Let X be some value for which X OP V holds. > > Consider an internal node entry with union U. We require that the following > holds: if U contains some value X where X OP V holds, then U OP' V must be > true. (But not the converse; U OP' V may be true even if no such X exists in > U. However, we wish it to be false as much as possible for efficiency.) > > When OP is << : > > X << V, therefore X.right < V.left, therefore X.left < V.left > therefore NOT (X &> V) > > If U contains X, then U &> V is true iff U.left >= V.left > > U.left <= min(E.left) for all elements E of U, and therefore for X if X in U > > So if X in U, then U.left <= X.left < V.left, and therefore NOT (U &> V) > > When OP is &< : > > X &< V, therefore X.right <= V.right, therefore X.left <= V.right > therefore NOT (X >> V), and similar reasoning for U containing X as above. > > -- > Andrew, Supernews > http://www.supernews.com - individual and corporate NNTP services > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073