Обсуждение: Index not used with IS NULL
If I have a query like SELECT * FROM table WHERE key IS NULL and an index on column "key", a sequential scan is used. A query like SELECT * FROM table WHERE key=123 uses an index scan. This is slowing down some queries in my current project. Why aren't the NULL columns indexed? Is there a work-around? Nick
Nick Wellnhofer <wellnhofer@aevum.de> writes: > If I have a query like > SELECT * FROM table WHERE key IS NULL > and an index on column "key", a sequential scan is used. IS NULL is not an indexable operator. I suggest reconsidering your data representation, as this is unlikely to change soon... regards, tom lane
What is the problem with indexing nulls? I had the similar problem some time ago, and created a custom set of operators as a work around (that do the same thing as <=> for numbers, but treat null as infinity and '=' returns true if both operand are null, and false if only one is)... It seems to work fine. The only problem is, that it is kinda cumbersome to create custom opclasses in postgres, and also, that I don't want to create the same wrappers for all possible types (int2,int4,int8,float etc)... It would be a lot nicer if the default operators could handle that... Why can it not be done? Thanks! Dima Tom Lane wrote: > Nick Wellnhofer <wellnhofer@aevum.de> writes: > >>If I have a query like >>SELECT * FROM table WHERE key IS NULL >>and an index on column "key", a sequential scan is used. > > > IS NULL is not an indexable operator. > > I suggest reconsidering your data representation, as this is unlikely to > change soon... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Dima Tkach <dmitry@openratings.com> writes: > What is the problem with indexing nulls? We do index nulls (at least in btree indexes). What I said was >> IS NULL is not an indexable operator. IS NULL is not an operator at all (it's a special syntactic construct). It has no entry in pg_operator. Therefore it doesn't fit into the operator class structure that describes which operators can be used with indexes. There are a bunch of internal structures (ScanKeys, etc) that it wouldn't fit into, either. > I had the similar problem some time ago, and created a custom set of > operators as a work around (that do the same thing as <=> for numbers, > but treat null as infinity and '=' returns true if both operand are > null, and false if only one is)... > It seems to work fine. Non-strict = operators wil be a real bad idea starting in PG 7.4, as they prevent usage of a number of hashed-aggregation optimizations. I suggest rethinking your schema: whatever you are using NULL to represent does not fit very well with SQL's idea of NULL semantics. In particular, the notion that "NULL = NULL" should yield true is going to get you in all kinds of trouble. regards, tom lane
> > >I suggest rethinking your schema: whatever you are using NULL to >represent does not fit very well with SQL's idea of NULL semantics. >In particular, the notion that "NULL = NULL" should yield true is >going to get you in all kinds of trouble. > Oh, no, it is not really a notion of "NULL=NULL", as I said, I only use it as a workaround for postgres inability to use index with null keys. Tom, as you said in your message "we do index nulls" - why do you index them, if there is no way to use those index values? :-) I mean, if they were not in the index at all, I could understand that, but they are already there, and not used, just because of some syntactical difference between 'is null' and other operators??? That looks very weird to me. Of course, I would not want to use the 'notion of null=null' if "is null" worked the same way, but, as you said yourself, it doesn't... So what do I do? As for "rethinking my schema"... I would appreciate any suggestions... There are many instances where I need to have nulls in the indexes, here is the simplest one: create table trees ( id serial primary key, parent int references trees on delete cascade on update cascade data text ); create unique index trees_idx on trees (parent, id); A row in the table is a tree node. A node can have one parent, ot no parent at all. About the only way to do this I know (aside from hacking around and inserting "dummy" rows into the table) is to use null as parent values for the nodes with no parents, but then a query like select * from trees where parent is null will take forever if the table is any large... What do you recommend? Predicate indexes? Waste of space... What else? And what exactly is being able to just say something like 'select * from trees where parent == null' to work around the syntactical problem of is null not being an operator? My only real problem with this is it being so complicated to set up. And I don't really understand what's wrong with it conceptually. To me, it looks like mereley a wrokaround for a problem with postgres parser (or planner?) not being able to treat is null as an operator for indexing purposes. Dima
Dima Tkach <dmitry@openratings.com> writes: > Tom, as you said in your message "we do index nulls" - why do you index > them, if there is no way to use those index values? :-) So that an indexscan can be a substitute for seqscan + sort. Also, in a multi-column index you must be prepared to index nulls, or you won't correctly answer questions that look at only some of the columns. Index types that don't support ordered scans don't have to store nulls (at least in their first column) and indeed rtree and gist do not. I forget whether hash does. > A row in the table is a tree node. A node can have one parent, ot no > parent at all. You're better off making the root node link to itself (compare handling of /.. in a Unix filesystem). NULL parent link does not mean "has no parent", it means "parent is unknown". regards, tom lane
On Sat, 15 Feb 2003, Dima Tkach wrote: > It would be a lot nicer if the default operators could handle that... > Why can it not be done? Jumping in... I usually use a partial index as a workaround. Postgresql will look at a partial index whose condition is IS NULL for queries of col IS NULL.
>>A row in the table is a tree node. A node can have one parent, ot no >>parent at all. > > > You're better off making the root node link to itself (compare handling > of /.. in a Unix filesystem). NULL parent link does not mean "has no > parent", it means "parent is unknown". > Great idea! I'll do that. Thanks! What about another example: create table user ( id serial primary key, login text not null unique ); create table tag_set ( id serial primay key, tag text not null unique, data text not null, userid int references users on delete cascade on update cascade ); The idea is that 'tags' may be user-specific or user-independent - so that to get a set of tags for a given user, I would do select tag,data from tag_set where userid is null or userid=? with my 'workaround' solution I do select tag,data from tag_set where userid==null or userid=? (where '==' is my special non-strict operator) to force both parts of the criteria to use the index Any ideas how to do this better (again, other than creating a dummy user with id 0)? I'll apppreciate any suggestions... Thanks a lot! Dima
>>A row in the table is a tree node. A node can have one parent, ot no >>parent at all. > > > You're better off making the root node link to itself (compare handling > of /.. in a Unix filesystem). NULL parent link does not mean "has no > parent", it means "parent is unknown". > Actually, I am afraid, I have to take back my previous message (where I said, this was a good idea) - after giving it some thought, I don't see how it will make things better (if anything, it will make them worse). For example, how would I get the list of the "top-level" (no parent) nodes given your suggestion? select * from trees where parent=id is hardly a good idea, because it just has to be a seq. scan, right? Right now, I am, at least, able to do select * from trees where parent == null; (where '==' is my custom non-strict equivalence operator), that uses an index scan and works just fine. Of course, it would be nicer to be able to get away with the standard sql set of operators, but, I guess, I have to do what I have to do :-( Dima P.S. Frankly, I still don't understand what is the big problem with making 'is null' indexable - as far as I can see, this is purely syntactical problem, because the btree code itself seems to be able to handle nulls just fine - it is at the level of the planner the index option just gets cut off, because it doesn't know what to do with 'is null'... I may be missing something of course, but so far, this looks to me like a very useful feature, that would be very easy to implement too...
Stephan Szabo wrote: > On Sat, 15 Feb 2003, Dima Tkach wrote: > > >>It would be a lot nicer if the default operators could handle that... >>Why can it not be done? > > > Jumping in... I usually use a partial index as a workaround. Postgresql > will look at a partial index whose condition is IS NULL for queries of col > IS NULL. > Yeah... I thought about it... But, one problem was that in 7.2 partial indexes do not really work (more precisely, do not get used) if your query has more than one table (Tom has given me a patch to fix that a while back, but I never got to installing it) :-( More importantly, if you need to make queries of both kinds (for 'is null' and for = something), ther are two options, and both of them are not very good: - create two indexes, one with predicate, and one without predicate - is a waste of space, because all the rows with nulls get indexed twice. The space may not be such an important consideration by itself, but, when the table is huge and heavily being updated, the overhead of having to keep both indexes in synch becomes significant. - create two indexes with complimetary predicates (one of IS NULL, one for IS NOT NULL)... Well, this seems to be better, at least from the space and performance standpoint, but I don't know how to even begin to explain to my users that they have to write queries like ... where parent is not null and parent=1 looks pretty reidiculous to me :-) .. and the planner does not seem to be smart enough to know to use the index unless you mention the predicate *explicitly* in the query. Dima
Dima Tkach <dmitry@openratings.com> writes: > But, one problem was that in 7.2 partial indexes do not really work > (more precisely, do not get used) if your query has more than one table > (Tom has given me a patch to fix that a while back, but I never got to > installing it) :-( That patch is in 7.2.2 and later. If you're still running 7.2 or 7.2.1 I have *zero* sympathy for any problems you may have. Update before you get bit by one of the data-loss bugs we've fixed. regards, tom lane
Dima Tkach <dmitry@openratings.com> writes: > For example, how would I get the list of the "top-level" (no parent) > nodes given your suggestion? > select * from trees where parent=id Exactly. > is hardly a good idea, because it just has to be a seq. scan, right? Make a partial index if you need it to be fast. regression=# create table trees (id int, parent int); CREATE TABLE regression=# explain select * from trees where parent=id; QUERY PLAN ------------------------------------------------------ Seq Scan on trees (cost=0.00..22.50 rows=5 width=8) Filter: (parent = id) (2 rows) regression=# create index foo on trees(id) where parent=id; CREATE INDEX regression=# explain select * from trees where parent=id; QUERY PLAN ------------------------------------------------------------------ Index Scan using foo on trees (cost=0.00..17.07 rows=5 width=8) Filter: (parent = id) (2 rows) > I may be missing something of course, but so far, this looks to me like > a very useful feature, that would be very easy to implement too... Criticism in the form of patches is more useful than unsubstantiated opinions that something is easy. regards, tom lane
On Sun, 16 Feb 2003, Dima Tkach wrote: > Stephan Szabo wrote: > > On Sat, 15 Feb 2003, Dima Tkach wrote: > > > > > >>It would be a lot nicer if the default operators could handle that... > >>Why can it not be done? > > > > > > Jumping in... I usually use a partial index as a workaround. Postgresql > > will look at a partial index whose condition is IS NULL for queries of col > > IS NULL. > > > > - create two indexes, one with predicate, and one without predicate - is > a waste of space, because all the rows with nulls get indexed twice. The > space may not be such an important consideration by itself, but, when > the table is huge and heavily being updated, the overhead of having to > keep both indexes in synch becomes significant. Yes, this solution does double index the NULLs, but if you have alot of NULLs you probably should be doing a seqscan to find them anyway and don't need the index. High update frequency costs you the NULL check, which is a little annoying, and if you've got a small number of NULL rows or the data is clustered in some fashion (so that the index is a win) that have lots of updates this may become significant.
Hello, sorry for barging in... I use a similar structure for keeping some some text pages categorized. CREATE TABLE pages ( id SERIAL NOT NULL PRIMARY KEY, categ INTEGER, CONSTRAINT categ_fk FOREIGN KEY(categ) REFERENCES categs(id) ON DELETE CASCADE ); All the pages that are not contained in a category are marked by categ IS NULL ( this is like the files in / in a filesystem). If I use other values than NULL for marking this kind of pages, then the constraint would complain, but then I can't use an index to find these pages. Do you have a better solution for this ? Thanks. On Mon, 17 Feb 2003, Tom Lane wrote: > Dima Tkach <dmitry@openratings.com> writes: > > For example, how would I get the list of the "top-level" (no parent) > > nodes given your suggestion? > > select * from trees where parent=id > > Exactly. > > > is hardly a good idea, because it just has to be a seq. scan, right? > > Make a partial index if you need it to be fast. > > regression=# create table trees (id int, parent int); > CREATE TABLE > regression=# explain select * from trees where parent=id; > QUERY PLAN > ------------------------------------------------------ > Seq Scan on trees (cost=0.00..22.50 rows=5 width=8) > Filter: (parent = id) > (2 rows) > > regression=# create index foo on trees(id) where parent=id; > CREATE INDEX > regression=# explain select * from trees where parent=id; > QUERY PLAN > ------------------------------------------------------------------ > Index Scan using foo on trees (cost=0.00..17.07 rows=5 width=8) > Filter: (parent = id) > (2 rows) > > > > I may be missing something of course, but so far, this looks to me like > > a very useful feature, that would be very easy to implement too... > > Criticism in the form of patches is more useful than unsubstantiated > opinions that something is easy. > > regards, tom lane
> > >Criticism in the form of patches is more useful than unsubstantiated >opinions that something is easy. > > > I'd be happy to come up with a patch... It just was my understanding that you would not accept such a patc hanyway, because your opinion is that it is unnecessary and dangerous... Did I misunderstand you here? Thanks! Dima
> > >Yes, this solution does double index the NULLs, but if you have alot of >NULLs you probably should be doing a seqscan to find them anyway and don't >need the index. High update frequency costs you the NULL check, which is >a little annoying, and if you've got a small number of NULL rows or the >data is clustered in some fashion (so that the index is a win) that have >lots of updates this may become significant. > > Yeah... But imagine the real-life condition, when I have a *moderate* number of nulls (not enough to justify a seq.scan, but still a lot) in *several* different columns - so that, instead of creating a single multi-column index, I would have to create a whole bunch of them with different predcates - where this is null and that is not null, where this is null and tha is null etc, etc... That's what annoys me a lot here :-(
Dima Tkach <dmitry@openratings.com> writes: > I'd be happy to come up with a patch... It just was my understanding > that you would not accept such a patc hanyway, because your opinion is > that it is unnecessary and dangerous... Did I misunderstand you here? I don't see anything dangerous about it --- except perhaps to readability and mantainability of the code. The problem is that IS NULL doesn't fit into the operator-and-opclass model of what indexes can do. If you can find a solution to that problem that's not a complete kluge, I'm all ears. regards, tom lane
Tom Lane wrote: >Dima Tkach <dmitry@openratings.com> writes: > > >>I'd be happy to come up with a patch... It just was my understanding >>that you would not accept such a patc hanyway, because your opinion is >>that it is unnecessary and dangerous... Did I misunderstand you here? >> >> > >I don't see anything dangerous about it --- except perhaps to >readability and mantainability of the code. The problem is that IS NULL >doesn't fit into the operator-and-opclass model of what indexes can do. >If you can find a solution to that problem that's not a complete kluge, >I'm all ears. > > regards, tom lane > > Well... At first glance, it seems that what needs to be done is: - add a special case in is_indexable_operator() to return true for IS_NULL - modify _bt_checkkeys () to return isNull from inside if (key->sk_flags & SK_ISNULL) clause instead of just false. - remove sk_flags & SK_ISNULL checks from _bt_orderkeys Obviously, I haven't tested this, and I may very well have missed something (I will, of course, inspect it and test thoroughly if you ok the change in principle - just don't want to spend much time right now on something, that may end up not being needed to anybody), but this looks to me like pretty much it. Dima
Dmitry Tkach <dmitry@openratings.com> writes: > Tom Lane wrote: >> I don't see anything dangerous about it --- except perhaps to >> readability and mantainability of the code. The problem is that IS NULL >> doesn't fit into the operator-and-opclass model of what indexes can do. >> If you can find a solution to that problem that's not a complete kluge, >> I'm all ears. > Well... At first glance, it seems that what needs to be done is: > - add a special case in is_indexable_operator() to return true for IS_NULL And is_indexable_operator() will know that this is safe how? Or do you plan to fix the other three index types to support NULLs too? > - modify _bt_checkkeys () to return isNull from inside if > (key->sk_flags & SK_ISNULL) clause instead of just false. > - remove sk_flags & SK_ISNULL checks from _bt_orderkeys IIRC, SK_ISNULL marks that the value being compared against is null --- not that the scan operator is ISNULL. An approach as above would cause "WHERE x = something" indexscans to start returning nulls if the "something" is null, no? You need a representation that preserves the difference between "x = NULL" and "x IS NULL". The ScanKey structure can't do this at the moment, mainly because it assumes that the scan operator *is* an operator. Which IS NULL is not. regards, tom lane
On Mon, Feb 17, 2003 at 10:52:54AM -0500, Tom Lane wrote: > Dmitry Tkach <dmitry@openratings.com> writes: > > Tom Lane wrote: > >> I don't see anything dangerous about it --- except perhaps to > >> readability and mantainability of the code. The problem is that IS NULL > >> doesn't fit into the operator-and-opclass model of what indexes can do. > >> If you can find a solution to that problem that's not a complete kluge, > >> I'm all ears. > > > Well... At first glance, it seems that what needs to be done is: > > > - add a special case in is_indexable_operator() to return true for IS_NULL > > And is_indexable_operator() will know that this is safe how? Or do you > plan to fix the other three index types to support NULLs too? I would have thought that the other index type supported null anyway, for the purposes of uniqueness checks. > > - modify _bt_checkkeys () to return isNull from inside if > > (key->sk_flags & SK_ISNULL) clause instead of just false. > > - remove sk_flags & SK_ISNULL checks from _bt_orderkeys > > IIRC, SK_ISNULL marks that the value being compared against is null > --- not that the scan operator is ISNULL. An approach as above would > cause "WHERE x = something" indexscans to start returning nulls if the > "something" is null, no? You need a representation that preserves the > difference between "x = NULL" and "x IS NULL". The ScanKey structure > can't do this at the moment, mainly because it assumes that the scan > operator *is* an operator. Which IS NULL is not. I remember looking into this a while ago. My solution to that problem was that x = NULL is always NULL and so doesn't need to go through the scan anyway (index or sequential). Once you've taken care of the x = NULL case elsewhere, you can use the available state for x IS NULL. I don't remember if I actually found the place to fix that though. I would really like it if this was finally made to work. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Support bacteria! They're the only culture some people have.
Вложения
Martijn van Oosterhout <kleptog@svana.org> writes: >> And is_indexable_operator() will know that this is safe how? Or do you >> plan to fix the other three index types to support NULLs too? > I would have thought that the other index type supported null anyway, for > the purposes of uniqueness checks. Well, (a) the other index types don't support uniqueness checks, and (b) it wouldn't be relevant anyway, because multiple nulls don't violate a unique constraint. GIST does support nulls in second and subsequent columns of a multi-column index, because it *has* to do so, but not in the first column --- and hash and rtree don't store nulls at all. > I remember looking into this a while ago. My solution to that problem was > that x =3D NULL is always NULL and so doesn't need to go through the scan > anyway (index or sequential). Once you've taken care of the x =3D NULL case > elsewhere, you can use the available state for x IS NULL. But how do you get from point A to point B? You need to represent both cases in ScanKeys further upstream than where that conclusion can be drawn (namely _bt_orderkeys()) --- or else do some very substantial restructuring work, which is exactly the point. Also, this would amount to hard-wiring the assumption that indexable operators are always strict. Which is rather a curious assumption to be putting in, if your goal is to support the obviously-not-strict construct IS NULL as an indexable operator. (Now I believe we make that assumption anyway in the index access methods ... but wiring it into ScanKeys, which is a very widespread data structure, would be the death knell for any hope of removing it someday.) regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > > I would have thought that the other index type supported null anyway, for > the purposes of uniqueness checks. Well, rtree indexes are used for boxes, which don't even have a useful = operators anyways... Unique indexes on them wouldn't work very well. -- greg
On Mon, Feb 17, 2003 at 08:46:17PM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I would have thought that the other index type supported null anyway, for > > the purposes of uniqueness checks. > > Well, (a) the other index types don't support uniqueness checks, and (b) > it wouldn't be relevant anyway, because multiple nulls don't violate > a unique constraint. GIST does support nulls in second and subsequent > columns of a multi-column index, because it *has* to do so, but not in > the first column --- and hash and rtree don't store nulls at all. I stand corrected. I just tested it here and multiple nulls in a unique column indeed do work. > > I remember looking into this a while ago. My solution to that problem was > > that x =3D NULL is always NULL and so doesn't need to go through the scan > > anyway (index or sequential). Once you've taken care of the x =3D NULL case > > elsewhere, you can use the available state for x IS NULL. > > But how do you get from point A to point B? You need to represent both > cases in ScanKeys further upstream than where that conclusion can be > drawn (namely _bt_orderkeys()) --- or else do some very substantial > restructuring work, which is exactly the point. > > Also, this would amount to hard-wiring the assumption that indexable > operators are always strict. Which is rather a curious assumption > to be putting in, if your goal is to support the obviously-not-strict > construct IS NULL as an indexable operator. (Now I believe we make > that assumption anyway in the index access methods ... but wiring it > into ScanKeys, which is a very widespread data structure, would be the > death knell for any hope of removing it someday.) I hadn't thought of that. While I can't think of a situation of a non-strict indexable operator, I wouldn't want to rule it out. My Plan B was to create a operator IS (and its inverse ISNOT) which is then binary operator. It would be identical to = and <> except that it would be defined where either argument is NULL. Fiddle the parser to use this operator instead of the unary ISNULL. The disadvantage is that (unless you restrict it in the parser) you could say things like: SELECT * FROM x, y WHERE x.field IS y.field Allowing you to join on NULL fields. This is not allowed by the spec. Do you think this would be a better approach? Or is there something special about the ISNULL in SQL does means this cannot work? It does seem a bit wasteful to have an operator whose second argument is always NULL (unless you allow the extra syntax). As a bonus, if this could be made to work, you *know* your index operators don't need to be strict. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Support bacteria! They're the only culture some people have.
Вложения
> > > > >>I remember looking into this a while ago. My solution to that problem was >>that x =3D NULL is always NULL and so doesn't need to go through the scan >>anyway (index or sequential). Once you've taken care of the x =3D NULL case >>elsewhere, you can use the available state for x IS NULL. >> >> > >But how do you get from point A to point B? You need to represent both >cases in ScanKeys further upstream than where that conclusion can be >drawn (namely _bt_orderkeys()) --- or else do some very substantial >restructuring work, which is exactly the point. > > I think what he meant was that a = null is actually a CONSTANT (null) and can be reduced at the parsing/planning stage, so that the indexing code never has to deal with such condition at all, and the only way for it to ever see SK_NULL would be the case of 'is null'... As for the hard-wired assumption that indexable operators are always strict, it seems to me that it is already there - _bt_checkkeys() always retruns false if it ever sees a null key. Doesn't it mean that it assumes that all the operators it will ever see are strict? Dima
so NULLs **DON'T** count in a unique index? You can have more than one NULL in a single column UNIQUE constraint? I guess I am showing my ignorance, I thought you could only have one. I was planning to do some interesting default configuration for a column value to ensure uniqueness, but flag an unknown value. Tom Lane wrote: > > Martijn van Oosterhout <kleptog@svana.org> writes: > >> And is_indexable_operator() will know that this is safe how? Or do you > >> plan to fix the other three index types to support NULLs too? > > > I would have thought that the other index type supported null anyway, for > > the purposes of uniqueness checks. > > Well, (a) the other index types don't support uniqueness checks, and (b) > it wouldn't be relevant anyway, because multiple nulls don't violate > a unique constraint. GIST does support nulls in second and subsequent > columns of a multi-column index, because it *has* to do so, but not in > the first column --- and hash and rtree don't store nulls at all. > > > I remember looking into this a while ago. My solution to that problem was > > that x =3D NULL is always NULL and so doesn't need to go through the scan > > anyway (index or sequential). Once you've taken care of the x =3D NULL case > > elsewhere, you can use the available state for x IS NULL. > > But how do you get from point A to point B? You need to represent both > cases in ScanKeys further upstream than where that conclusion can be > drawn (namely _bt_orderkeys()) --- or else do some very substantial > restructuring work, which is exactly the point. > > Also, this would amount to hard-wiring the assumption that indexable > operators are always strict. Which is rather a curious assumption > to be putting in, if your goal is to support the obviously-not-strict > construct IS NULL as an indexable operator. (Now I believe we make > that assumption anyway in the index access methods ... but wiring it > into ScanKeys, which is a very widespread data structure, would be the > death knell for any hope of removing it someday.) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Carpe Dancem ;-) ----------------------------------------------------------------- Remember your friends while they are alive ----------------------------------------------------------------- Sincerely, Dennis Gearon
Dima Tkach <dmitry@openratings.com> writes: > I think what he meant was that a = null is actually a CONSTANT (null) > and can be > reduced at the parsing/planning stage, so that the indexing code never > has to deal with such condition at all, If the = operator is strict, yes it will be. But that just begs the question. > As for the hard-wired assumption that indexable operators are always > strict, it seems to me that it is already there - > _bt_checkkeys() always retruns false if it ever sees a null key. Doesn't > it mean that it assumes that all the operators it will ever see are strict? That assumption does currently exist in _bt_checkkeys() and some other localized places. But it could possibly be removed from them --- or a new index access method could be written that makes no such assumption. If you propagate the assumption upwards into ScanKeys and the planner, then it'll never be possible to get rid of it, not even with an entirely new access method. These sorts of considerations are why I said that I thought a non-kluge solution was difficult. Yes, we could hack slash and burn our way to a patch that works (most of the time anyway) in short order. Developing something that doesn't cripple future progress is another story. regards, tom lane
Dennis Gearon <gearond@cvc.net> writes: > so NULLs **DON'T** count in a unique index? You can have more than one > NULL in a single column UNIQUE constraint? I guess I am showing my > ignorance, I thought you could only have one. Unique indexes enforce SQL92 unique constraints, which are defined by the spec thusly (sec 4.10): A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns. If this doesn't seem right to you, you have not grokked the concept of NULL. Two nulls are never "equal" per spec. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > My Plan B was to create a operator IS (and its inverse ISNOT) which is then > binary operator. It would be identical to =3D and <> except that it would be > defined where either argument is NULL. Fiddle the parser to use this > operator instead of the unary ISNULL. I don't think there's anything fundamental that assumes that indexable operators are binary, so you might as well make the operator unary. The problem with this approach isn't that --- it's the tedium of making an ISNULL operator for every datatype, adding it to every opclass, etc. Maybe there's no non-kluge answer that doesn't make us buy into that, but it sure seems like the hard way. It's definitely not going to be a short and sweet patch :-( regards, tom lane
On Mon, Feb 17, 2003 at 11:54:59PM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > My Plan B was to create a operator IS (and its inverse ISNOT) which is then > > binary operator. It would be identical to =3D and <> except that it would be > > defined where either argument is NULL. Fiddle the parser to use this > > operator instead of the unary ISNULL. > > I don't think there's anything fundamental that assumes that indexable > operators are binary, so you might as well make the operator unary. The > problem with this approach isn't that --- it's the tedium of making an > ISNULL operator for every datatype, adding it to every opclass, etc. I'll have to look again. I thought there was this whole section dealing with how < related to <= and such which kind of implied it had to be binary. Similarly, with an index scan on a unary operator would require a new entry point, because it requires *no* parameters. An index scan relies on there being an order, how can a unary operator have an order? That said, how is this different from having an isnull() function and building a functional index on that? An operator is just a function with different syntax. Are there any optimisations involving IS NULL that preclude us from simply replacing it with a function call globally. I guess the issue is that it's a new index rather than using the existing one. As for all the different versions, the actual implementation doesn't need to know about the datatype, it only needs the NULL-ness of the argument so it should be generic. Can an operator/function have a argument type that matches anything? > Maybe there's no non-kluge answer that doesn't make us buy into that, > but it sure seems like the hard way. It's definitely not going to be > a short and sweet patch :-( I'm still hoping. I'm sure we just need to think about it the right way... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Support bacteria! They're the only culture some people have.
Вложения
This is a resend, don't know if the first time it got to the list... sorry if it did. Hello, sorry for barging in... I use a similar structure for keeping some some text pages categorized. CREATE TABLE pages ( id SERIAL NOT NULL PRIMARY KEY, categ INTEGER, CONSTRAINT categ_fk FOREIGN KEY(categ) REFERENCES categs(id) ON DELETE CASCADE ); All the pages that are not contained in a category are marked by categ IS NULL ( this is like the files in / in a filesystem). If I use other values than NULL for marking this kind of pages, then the constraint would complain, but then I can't use an index to find these pages. Do you have a better solution for this ? Thanks. On Mon, 17 Feb 2003, Tom Lane wrote: > Dima Tkach <dmitry@openratings.com> writes: > > For example, how would I get the list of the "top-level" (no parent) > > nodes given your suggestion? > > select * from trees where parent=id > > Exactly. > > > is hardly a good idea, because it just has to be a seq. scan, right? > > Make a partial index if you need it to be fast. > > regression=# create table trees (id int, parent int); > CREATE TABLE > regression=# explain select * from trees where parent=id; > QUERY PLAN > ------------------------------------------------------ > Seq Scan on trees (cost=0.00..22.50 rows=5 width=8) > Filter: (parent = id) > (2 rows) > > regression=# create index foo on trees(id) where parent=id; > CREATE INDEX > regression=# explain select * from trees where parent=id; > QUERY PLAN > ------------------------------------------------------------------ > Index Scan using foo on trees (cost=0.00..17.07 rows=5 width=8) > Filter: (parent = id) > (2 rows) > > > > I may be missing something of course, but so far, this looks to me like > > a very useful feature, that would be very easy to implement too... > > Criticism in the form of patches is more useful than unsubstantiated > opinions that something is easy. > > regards, tom lane
From: "Andrei Ivanov" <andrei.ivanov@ines.ro> > > This is a resend, don't know if the first time it got to the list... sorry > if it did. > > Hello, sorry for barging in... > I use a similar structure for keeping some some text pages categorized. > > CREATE TABLE pages ( > id SERIAL NOT NULL PRIMARY KEY, > categ INTEGER, > CONSTRAINT categ_fk FOREIGN KEY(categ) REFERENCES categs(id) ON DELETE CASCADE > ); > > All the pages that are not contained in a category are marked by categ IS > NULL ( this is like the files in / in a filesystem). If I use other values > than NULL for marking this kind of pages, then the constraint would > complain, but then I can't use an index to find these pages. > > Do you have a better solution for this ? If some pages aren't associated with a category, shouldn't you have three relations? categories ( categ PRIMARY KEY ... ); pages ( id PRIMARY KEY ... ); category_pages ( categ INTEGER NOT NULL, id INTEGER NOT NULL ); Similarly, with previous posts regarding hierarchies, the model should look like: employees ( employeeid PRIMARY KEY ... ) employee_manager ( employeeid INTEGER NOT NULL, manager INTEGER NOT NULL ) *not*: employees ( employeeid PRIMARY KEY, manager INTEGER ); NULLs are evil. ;-) Mike Mascari mascarm@mascari.com
Tom Lane wrote: >Dmitry Tkach <dmitry@openratings.com> writes: > > >>Tom Lane wrote: >> >> >>>I don't see anything dangerous about it --- except perhaps to >>>readability and mantainability of the code. The problem is that IS NULL >>>doesn't fit into the operator-and-opclass model of what indexes can do. >>>If you can find a solution to that problem that's not a complete kluge, >>>I'm all ears. >>> >>> > > > >>Well... At first glance, it seems that what needs to be done is: >> >> > > > >>- add a special case in is_indexable_operator() to return true for IS_NULL >> >> > >And is_indexable_operator() will know that this is safe how? Or do you >plan to fix the other three index types to support NULLs too? > Well, yeah... Generally, I think, the best thing to do is to fix them all - if it can handle '=', there is no reason it cannot be made to handle 'is null' as well... I haven't seen much of the code for rtree and hash, but I doubt it (null handling) is going to be that much different from btree... But if we do not want to fix them all, just btree, you are right - indexable_operator() needs to know about that... In that case, we'd either need to add a param to it to pass the index info, or leave it alone, and just handle the is null case separately, at the higher level... Either way, we'd probably want to add something like idxisstrict to pg_index - checking for btree explicitly would look like a kludge even to me :-) > > > >>- modify _bt_checkkeys () to return isNull from inside if >>(key->sk_flags & SK_ISNULL) clause instead of just false. >>- remove sk_flags & SK_ISNULL checks from _bt_orderkeys >> >> > >IIRC, SK_ISNULL marks that the value being compared against is null >--- not that the scan operator is ISNULL. An approach as above would >cause "WHERE x = something" indexscans to start returning nulls if the >"something" is null, no? You need a representation that preserves the >difference between "x = NULL" and "x IS NULL". The ScanKey structure >can't do this at the moment, mainly because it assumes that the scan >operator *is* an operator. Which IS NULL is not. > > This can be resolved by just putting null (0) as the operator in the ScanKey to denote is null. I know, this looks ugly, but the ugliness is due to 'is null' not being an operator in the first place... :-) And another possibility is to create isnull () operator... but that would have to wait until postgres allows functions with unknown argument types (this can be done - just always pass pointers, and let the function figure out what it wants to do with them)... Otherwise, as you said earlier, creating isnull() for every possible type would be quite a job.... and besides, there are also user-defined types too... Dima
Dima Tkach <dmitry@openratings.com> writes: > And another possibility is to create isnull () operator... but that > would have to wait until postgres allows functions with unknown argument > types Actually, we do have that now --- it'd be reasonable to implement such a function and operator as taking type ANY. Hm, maybe this is more practical than I thought. If we replace the special-purpose NullTest expression node by two operators (IS NULL, IS NOT NULL) taking type ANY, then you wouldn't have to do any violence to the ScanKeys representation to handle these operators as index quals. Rather than adding them to pg_opclass for every btree opclass, I'd be inclined to special-case them in the planner (they could be a case that special_indexable_operator handles) --- with only two to deal with, that doesn't seem impractical. Hm, probably only IS NULL need be indexable. We don't index != ... regards, tom lane
I like this solution because it uses the existing infrastructure of functional indexes and the ANY capability. It makes perfect sense to me. So, did this make it to the todo list? It didn't see it there. Dima, Tom, Stephan, Martin? Is this likely in 7.4? elein On Wednesday 19 February 2003 06:44, Tom Lane wrote: > Dima Tkach <dmitry@openratings.com> writes: > > And another possibility is to create isnull () operator... but that > > would have to wait until postgres allows functions with unknown argument > > types > > Actually, we do have that now --- it'd be reasonable to implement such > a function and operator as taking type ANY. Hm, maybe this is more > practical than I thought. If we replace the special-purpose NullTest > expression node by two operators (IS NULL, IS NOT NULL) taking type ANY, > then you wouldn't have to do any violence to the ScanKeys representation > to handle these operators as index quals. Rather than adding them to > pg_opclass for every btree opclass, I'd be inclined to special-case them > in the planner (they could be a case that special_indexable_operator > handles) --- with only two to deal with, that doesn't seem impractical. > Hm, probably only IS NULL need be indexable. We don't index != ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- ---------------------------------------------------------------------------------------- elein@varlena.com Database Consulting www.varlena.com I have always depended on the [QA] of strangers.