Обсуждение: Optimizer & boolean syntax
Are those two syntaxes eqivalent ? select * from users where monitored; select * from users where monitored=true; If the answer is yes, the optimimer probably doesn't agree with you :) Tested on RC1: template1=# create table a (a boolean, b text); CREATE TABLE .... inserted ~18000 rows with just one true (just to make an index scan meaningful).... template1=# vacuum analyze a; VACUUM template1=# explain select * from a where a; QUERY PLAN ---------------------------------------------------- Seq Scan on a (cost=0.00..802.64 rows=1 width=11) Filter: a (2 rows) template1=# explain select * from a where a=true; QUERY PLAN -------------------------------------------------------------- Index Scan using a_a on a (cost=0.00..2.01 rows=1 width=11) Index Cond: (a = true) (2 rows) Bye! -- Daniele Orlandi Planet Srl
Using the famous WAG tech, in your first query the optimizer has to evaluate monitored for each record to determine its value. Robert Treat On Thu, 2002-11-21 at 13:39, Daniele Orlandi wrote: > > Are those two syntaxes eqivalent ? > > select * from users where monitored; > select * from users where monitored=true; > > If the answer is yes, the optimimer probably doesn't agree with you :) > > Tested on RC1: > > template1=# create table a (a boolean, b text); > CREATE TABLE > > > .... inserted ~18000 rows with just one true (just to make an index scan > meaningful).... > > template1=# vacuum analyze a; > VACUUM > template1=# explain select * from a where a; > QUERY PLAN > ---------------------------------------------------- > Seq Scan on a (cost=0.00..802.64 rows=1 width=11) > Filter: a > (2 rows) > > template1=# explain select * from a where a=true; > QUERY PLAN > -------------------------------------------------------------- > Index Scan using a_a on a (cost=0.00..2.01 rows=1 width=11) > Index Cond: (a = true) > (2 rows) > > Bye! > > -- > Daniele Orlandi > Planet Srl > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
On Thu, 21 Nov 2002, Daniele Orlandi wrote: > Are those two syntaxes eqivalent ? > > select * from users where monitored; > select * from users where monitored=true; > > If the answer is yes, the optimimer probably doesn't agree with you :) That depends on the definition of equivalent. They presumably give the same answer (I'm assuming monitored is a boolean), but the latter has something that's considered an indexable condition and I believe the former does not (even with enable_seqscan=off the former syntax appears to give a sequence scan, usually a good sign it's not considered indexable).
> > Are those two syntaxes eqivalent ? > > > > select * from users where monitored; > > select * from users where monitored=true; > > > > If the answer is yes, the optimimer probably doesn't agree with you :) > > That depends on the definition of equivalent. They presumably give the > same answer (I'm assuming monitored is a boolean), but the latter has > something that's considered an indexable condition and I believe the > former does not (even with enable_seqscan=off the former syntax > appears to give a sequence scan, usually a good sign it's not considered > indexable). I think his point is that they _should_ be equivalent. Surely there's something in the optimiser that discards '=true' stuff, like 'a=a' should be discarded? Chris
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote: > > > Are those two syntaxes eqivalent ? > > > > > > select * from users where monitored; > > > select * from users where monitored=true; > > > > > > If the answer is yes, the optimimer probably doesn't agree with you :) > > > > That depends on the definition of equivalent. They presumably give the > > same answer (I'm assuming monitored is a boolean), but the latter has > > something that's considered an indexable condition and I believe the > > former does not (even with enable_seqscan=off the former syntax > > appears to give a sequence scan, usually a good sign it's not considered > > indexable). > > I think his point is that they _should_ be equivalent. Surely there's > something in the optimiser that discards '=true' stuff, like 'a=a' should be > discarded? I figure that's what he meant, but it isn't what was said. ;) "col" isn't of the general form "indexkey op constant" or "constant op indexkey" which I presume it's looking for given the comments in indxpath.c. I'm not sure what the best way to make it work would be given that presumably we'd want to make col IS TRUE/FALSE use an index at the same time (since that appears to not do so as well).
> > I think his point is that they _should_ be equivalent. Surely there's > > something in the optimiser that discards '=true' stuff, like 'a=a' should be > > discarded? > > I figure that's what he meant, but it isn't what was said. ;) > > "col" isn't of the general form "indexkey op constant" or "constant op > indexkey" which I presume it's looking for given the comments in > indxpath.c. I'm not sure what the best way to make it work would be given > that presumably we'd want to make col IS TRUE/FALSE use an index at the > same time (since that appears to not do so as well). Not that I see the point of indexing booleans, but hey :) Chris
On Thu, Nov 21, 2002 at 02:45:34PM -0800, Christopher Kings-Lynne wrote: > > > I think his point is that they _should_ be equivalent. Surely there's > > > something in the optimiser that discards '=true' stuff, like 'a=a' > should be > > > discarded? > Not that I see the point of indexing booleans, but hey :) If one of the values is much more infrequent than the other, you can probably get a substantial win using a partial index, can't you? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Everybody understands Mickey Mouse. Few understand Hermann Hesse. Hardly anybody understands Einstein. And nobody understands Emperor Norton."
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote: > > > I think his point is that they _should_ be equivalent. Surely there's > > > something in the optimiser that discards '=true' stuff, like 'a=a' > should be > > > discarded? > > > > I figure that's what he meant, but it isn't what was said. ;) > > > > "col" isn't of the general form "indexkey op constant" or "constant op > > indexkey" which I presume it's looking for given the comments in > > indxpath.c. I'm not sure what the best way to make it work would be given > > that presumably we'd want to make col IS TRUE/FALSE use an index at the > > same time (since that appears to not do so as well). > > Not that I see the point of indexing booleans, but hey :) While full indexes do seem futile, partial indexes can be quite useful. select articles from forum where approved is false if 99.9% of all articles are approved would be quite common.
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote: > > > I think his point is that they _should_ be equivalent. Surely there's > > > something in the optimiser that discards '=true' stuff, like 'a=a' > should be > > > discarded? > > > > I figure that's what he meant, but it isn't what was said. ;) > > > > "col" isn't of the general form "indexkey op constant" or "constant op > > indexkey" which I presume it's looking for given the comments in > > indxpath.c. I'm not sure what the best way to make it work would be given > > that presumably we'd want to make col IS TRUE/FALSE use an index at the > > same time (since that appears to not do so as well). > > Not that I see the point of indexing booleans, but hey :) also, in reference to my last message, even if the % was 50/50, if the table was such that the bool was in a table next to a text field with 20k or text in it, an index on the bool would be much faster to go through than to seq scan the table.
> > Not that I see the point of indexing booleans, but hey :) > > If one of the values is much more infrequent than the other, you can > probably get a substantial win using a partial index, can't you? Yes, I thought of the partial index after I wrote that email :) Chris
> > > "col" isn't of the general form "indexkey op constant" or "constant op > > > indexkey" which I presume it's looking for given the comments in > > > indxpath.c. I'm not sure what the best way to make it work would be given > > > that presumably we'd want to make col IS TRUE/FALSE use an index at the > > > same time (since that appears to not do so as well). > > > > Not that I see the point of indexing booleans, but hey :) > > also, in reference to my last message, even if the % was 50/50, if the > table was such that the bool was in a table next to a text field with 20k > or text in it, an index on the bool would be much faster to go through > than to seq scan the table. Hmmm...I'm not sure about that. Postgres's storage strategry with text will be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and it will only be retrieved if it's in the select parameters. Chris
On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote: > > > > "col" isn't of the general form "indexkey op constant" or "constant op > > > > indexkey" which I presume it's looking for given the comments in > > > > indxpath.c. I'm not sure what the best way to make it work would be > given > > > > that presumably we'd want to make col IS TRUE/FALSE use an index at > the > > > > same time (since that appears to not do so as well). > > > > > > Not that I see the point of indexing booleans, but hey :) > > > > also, in reference to my last message, even if the % was 50/50, if the > > table was such that the bool was in a table next to a text field with 20k > > or text in it, an index on the bool would be much faster to go through > > than to seq scan the table. > > Hmmm...I'm not sure about that. Postgres's storage strategry with text will > be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and > it will only be retrieved if it's in the select parameters. True, but replace that text with 1500 integers. :) The only problem with the partial index solution is that it seems to still only work for the same method of asking for the result, so if you make an index where col=true, using col IS TRUE or col in a query doesn't seem to use it.
Stephan Szabo wrote: > On Thu, 21 Nov 2002, Daniele Orlandi wrote: > > >>Are those two syntaxes eqivalent ? >> >>select * from users where monitored; >>select * from users where monitored=true; >> >>If the answer is yes, the optimimer probably doesn't agree with you :) > > > That depends on the definition of equivalent. By equivalent I mean "means the same thing so, behaves in the same way". I consider the former syntax to be cleaner and I would tend to use it most of times. For what concerns partial indexes, I agree, it's a better approach for this kind of indexing and I did some test: ------------------------- ctonet=# create index users_monitored on users (monitored) where monitored; CREATE ctonet=# explain select * from users where monitored; NOTICE: QUERY PLAN: Index Scan using users_monitored on users (cost=0.00..9.44 rows=6 width=186) EXPLAIN Nice, it appears to use the index, but: ctonet=# explain select * from users where monitored=true; NOTICE: QUERY PLAN: Seq Scan on users (cost=0.00..8298.84 rows=59 width=186) EXPLAIN ------------------------- The problem is the opposite... so, effectively, seems that the optimizer considers "monitored" and "monitored=true" as two different expressions... The viceversa is analog and we also can see that the syntax "monitored is true" is considered different from the other two syntaxes: ----------------------- ctonet=# drop index users_monitored; DROP ctonet=# create index users_monitored on users (monitored) where monitored=true; CREATE ctonet=# explain select * from users where monitored=true; NOTICE: QUERY PLAN: Index Scan using users_monitored on users (cost=0.00..9.45 rows=6 width=186) EXPLAIN ctonet=# explain select * from users where monitored; NOTICE: QUERY PLAN: Seq Scan on users (cost=0.00..8077.07 rows=59 width=186) EXPLAIN ctonet=# create index users_monitored on users (monitored) where monitored=true; CREATE ctonet=# explain select * from users where monitored is true; NOTICE: QUERY PLAN: Seq Scan on users (cost=0.00..8077.07 rows=59 width=186) EXPLAIN ------------------------- What I propose is that all those syntaxes are made equivalent (by, for example, rewriting boolean comparisons to a common form) in order to have a more consistent index usage. Bye! -- Daniele Orlandi Planet Srl
Daniele Orlandi <daniele@orlandi.com> writes:
> The problem is the opposite... so, effectively, seems that the optimizer 
> considers "monitored" and "monitored=true" as two different expressions...
Check.
> The viceversa is analog and we also can see that the syntax "monitored 
> is true" is considered different from the other two syntaxes:
As it should be.
> What I propose is that all those syntaxes are made equivalent
Only two of them are logically equivalent.  Consider NULL.
Even for the first two, assuming equivalence requires hard-wiring an
assumption about the behavior of the "bool = bool" operator; which is
a user-redefinable operator.  I'm not totally comfortable with the idea.
        regards, tom lane
			
		On Thu, 21 Nov 2002, Stephan Szabo wrote: > > On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote: > > > > > > "col" isn't of the general form "indexkey op constant" or "constant op > > > > > indexkey" which I presume it's looking for given the comments in > > > > > indxpath.c. I'm not sure what the best way to make it work would be > > given > > > > > that presumably we'd want to make col IS TRUE/FALSE use an index at > > the > > > > > same time (since that appears to not do so as well). > > > > > > > > Not that I see the point of indexing booleans, but hey :) > > > > > > also, in reference to my last message, even if the % was 50/50, if the > > > table was such that the bool was in a table next to a text field with 20k > > > or text in it, an index on the bool would be much faster to go through > > > than to seq scan the table. > > > > Hmmm...I'm not sure about that. Postgres's storage strategry with text will > > be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and > > it will only be retrieved if it's in the select parameters. > > True, but replace that text with 1500 integers. :) > > The only problem with the partial index solution is that it seems to still > only work for the same method of asking for the result, so if you make an > index where col=true, using col IS TRUE or col in a query doesn't seem to > use it. True. I always use the syntax: select * from table where field IS TRUE OR IS FALSE for consistency.
Tom Lane wrote: > > Only two of them are logically equivalent. Consider NULL. Ohhh IS NOT TRUE or IS NOT FALSE also match NULL, I never knew this :) > Even for the first two, assuming equivalence requires hard-wiring an > assumption about the behavior of the "bool = bool" operator; which is > a user-redefinable operator.> I'm not totally comfortable with the idea. Ok, I see your point and the problems that may arise, but I hope wou will agree with me that from the point of view of the user, both clauses have the same meaning and the index usage should be consistant with it. Unfortunatelly I don't know very well PostgreSQL internals, so I may be saying a load of bullshits, but wouldn't be possible to consider any evaluation of a bool expression in the form of bool=bool with true as the second 'bool'[1] ? At least as a TODO item ? Thanks! Bye! [1] Eventually including the "var IS TRUE" and "var IS FALSE" (not var IS NOT ...) which already are special syntax cases if I am not wrong. -- Daniele Orlandi Planet Srl