Обсуждение: Query planner isn't using my indices
To cut a long story short, my largish development database was running the query I was tinkering with very slowly. Looking a little deeper, I found that it was always doing a full table scan. Which is odd, seeing as we're selecting on a uniquely indexed field... frontwire=# \d stakeholder_pk Index "stakeholder_pk" Attribute | Type -----------+-------- id | bigint unique btree frontwire=# explain select * from stakeholder where id = 1; NOTICE: QUERY PLAN: Seq Scan on stakeholder (cost=0.00..602.81 rows=1 width=336) EXPLAIN frontwire=# select count(*) from stakeholder; count ------- 9170 (1 row) ...why is this happening? It... shouldn't! ABS -- Alaric B. Snell, Developer abs@frontwire.com
"Alaric B. Snell" <abs@frontwire.com> writes: > frontwire=# \d stakeholder_pk > Index "stakeholder_pk" > Attribute | Type > -----------+-------- > id | bigint > unique btree > frontwire=# explain select * from stakeholder where id = 1; > NOTICE: QUERY PLAN: > Seq Scan on stakeholder (cost=0.00..602.81 rows=1 width=336) Try "where id = 1::bigint" or "where id = '1'". Unadorned 1 is an int4, not an int8 ... see archives for previous discussions. regards, tom lane
On Tue, 8 Jan 2002, Alaric B. Snell wrote: > frontwire=# \d stakeholder_pk > Index "stakeholder_pk" > Attribute | Type > -----------+-------- > id | bigint > unique btree > > frontwire=# explain select * from stakeholder where id = 1; > NOTICE: QUERY PLAN: > > Seq Scan on stakeholder (cost=0.00..602.81 rows=1 width=336) > See past discussions, you'll need to either explicitly cast the constant to bigint or quote it, because otherwise it prematurely casts the constant into an int4.
"Alaric B. Snell" <abs@frontwire.com> writes: > To cut a long story short, my largish development database was running the > query I was tinkering with very slowly. > > Looking a little deeper, I found that it was always doing a full table > scan. > > Which is odd, seeing as we're selecting on a uniquely indexed field... > > frontwire=# \d stakeholder_pk > Index "stakeholder_pk" > Attribute | Type > -----------+-------- > id | bigint > unique btree > > frontwire=# explain select * from stakeholder where id = 1; > NOTICE: QUERY PLAN: > > Seq Scan on stakeholder (cost=0.00..602.81 rows=1 width=336) The query planner isn't *quite* smart enough currently to realize that it can use the index in this case, because an unadorned "1" is an int (not bigint) constant. Try: EXPLAIN SELECT * FROM stakeholder WHERE id = 1::bigint; or EXPLAIN SELECT * FROM stakeholder WHERE id = '1'; [I think either will work] -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
If you cast the 1 to type bigint then PostgreSQL will use the index, otherwise it won't. You can do one of four things: 1) An explicit cast (btw what's the SQL92 way of doing this): SELECT * FROM stakeholder WHERE id = 1::bigint; 2) Put the constant in "'" and let PostgreSQL work out what to do: SELECT * FROM stakeholder WHERE id = '1'; 3) Turn off sequential scans (see the manual). 4) Live with sequential scans :). Jason "Alaric B. Snell" <abs@frontwire.com> writes: > To cut a long story short, my largish development database was running the > query I was tinkering with very slowly. > > Looking a little deeper, I found that it was always doing a full table > scan. > > Which is odd, seeing as we're selecting on a uniquely indexed field... > > frontwire=# \d stakeholder_pk > Index "stakeholder_pk" > Attribute | Type > -----------+-------- > id | bigint > unique btree > > frontwire=# explain select * from stakeholder where id = 1; > NOTICE: QUERY PLAN: > > Seq Scan on stakeholder (cost=0.00..602.81 rows=1 width=336) > > EXPLAIN > frontwire=# select count(*) from stakeholder; > count > ------- > 9170 > (1 row) > > ...why is this happening? It... shouldn't! > > ABS > > -- > Alaric B. Snell, Developer > abs@frontwire.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On 8 Jan 2002, Jason Earl wrote: > If you cast the 1 to type bigint then PostgreSQL will use the index, > otherwise it won't. You can do one of four things: Ahhh, suddenly it all makes sense! Thanks, people. Now I need to figure out how to make JBoss write its queries differently :-) ABS -- Alaric B. Snell, Developer abs@frontwire.com
On Tue, 8 Jan 2002, Stephan Szabo wrote: > See past discussions, you'll need to either explicitly cast the > constant to bigint or quote it, because otherwise it prematurely > casts the constant into an int4. But isn't that the point? The fact that it casts it to int4 isn't the problem. The fact that int4 and int8 *should* be compatible fields *is*. You'd think the query optimizer would do this: "Hmmm, I have a field composed of numbers. Are there any indexes corresponding to this field? Are the types compatible? They're pretty close, let me just recast this... there." Oracle does this, Mysql does this, Sybase does this... why must Postgres's query optimizer assume all types are spot-on identical? Similarly, why do I have to cast my char column to text when doing uncasted quoted-string concatenation? Quoted strings are character data, and char columns are character data, right? Wouldn't it logically follow that the two types are compatible? Unless for some reason the database engine is mapping the database column types directly to C types, there is no reason for this. Postgres's type-checking abilities are sorely lacking to such a degree that it not only hurts database performance unless the DBA writes every damn application, but it forces any application to be postgres specific due to all of the unnecessary casting. Doesn't this bother anyone? Isn't the query optimizer important enough to be made as "smart" as possible, since it affects the performance of the entire database? Should a company hosting postgresql services really be held ransom by customers who write "bad" queries because they don't know every little quirk of the database? Bueller? McFly? Can anyone explain this to me? Anyone at all? I mean, what logical reason is there to leave the database so crippled? -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Programmer | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Shaun Thomas <sthomas@townnews.com> writes: > On Tue, 8 Jan 2002, Stephan Szabo wrote: > > > See past discussions, you'll need to either explicitly cast the > > constant to bigint or quote it, because otherwise it prematurely > > casts the constant into an int4. > > But isn't that the point? The fact that it casts it to int4 isn't the > problem. The fact that int4 and int8 *should* be compatible fields *is*. [rant...] > Can anyone explain this to me? Anyone at all? I mean, what logical > reason is there to leave the database so crippled? Finite developer resources? Feel free to pitch in once the 7.3 cycle starts. FWIW, the following is on Bruce's TODO list at http://developer.postgresql.org/todo.php: * Allow better handling of numeric constants, type conversion There is a link there to a discussion about ways to improve the situation. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Shaun Thomas <sthomas@townnews.com> writes: > Can anyone explain this to me? Anyone at all? I mean, what logical > reason is there to leave the database so crippled? Feel free to step right up and fix it. This is an open-source project. Whining aside, the reason why these restrictions still exist is that we have an extensible type system and so we don't really want to put hard-wired knowledge about different types into the parser or optimizer. The difficult part is to design a general-purpose, extensible framework in which these desired conversions can be specified. There have been prior discussions of how to do this (see the pghackers archives); but no one's put forward a really satisfactory proposal yet. If anyone had, it'd probably have been implemented by now --- there isn't anyone here who likes the current behavior any more than you do. regards, tom lane
On Wed, 9 Jan 2002, Shaun Thomas wrote: > On Tue, 8 Jan 2002, Stephan Szabo wrote: > > > See past discussions, you'll need to either explicitly cast the > > constant to bigint or quote it, because otherwise it prematurely > > casts the constant into an int4. > > Can anyone explain this to me? Anyone at all? I mean, what logical > reason is there to leave the database so crippled? Perhaps you should see the past discussions (particularly on hackers) of this topic.
On 9 Jan 2002, Doug McNaught wrote: > Finite developer resources? Feel free to pitch in once the 7.3 cycle > starts. Smile, everyone! I'm going to pitch to my employer to let me fix it on company time, since it's causing problems for us with JBoss (which generates most of the queries for us, out of our control). ABS -- Alaric B. Snell, Developer abs@frontwire.com
On Thu, Jan 10, 2002 at 12:18:33PM +0000, Alaric B. Snell wrote: > On 9 Jan 2002, Doug McNaught wrote: > > > Finite developer resources? Feel free to pitch in once the 7.3 cycle > > starts. > > Smile, everyone! > > I'm going to pitch to my employer to let me fix it on company time, since > it's causing problems for us with JBoss (which generates most of the > queries for us, out of our control). Interestingly, I find the easiest fix is: never use unquoted constants. If it didn't break other peoples queries I'd suggest modifying the parser so that numbers are parsed as strings of type unknown instead of numbers. Then let the normal type matching system go to work on it. That said, a good system for type promotion would be cool. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Terrorists can only take my life. Only my government can take my freedom.
Martijn van Oosterhout <kleptog@svana.org> writes: > Interestingly, I find the easiest fix is: never use unquoted constants. > If it didn't break other peoples queries I'd suggest modifying the parser so > that numbers are parsed as strings of type unknown instead of numbers. Well, we don't want to lose the information that the input looked like a number and not a string; that's a reasonable thing for the type heuristics to use. There was a proposal on the table to initially treat numeric literals as type UNKNOWNNUMERIC and then have a resolution mechanism for that, much like UNKNOWN but with the possible target types restricted to the numeric category. It still had some problems though. One that I recall was that OIDs weren't accounted for; another was that it's not real clear how to let the size of the value affect resolution (eg, it's reasonable to resolve 123456 as int4 or int8, but not int2). I think we had also talked about having the scanner initially resolve a literal as the "smallest" possible containing type and then make sure there are useful automatic up-conversions in place. The trouble here is that the numeric hierarchy isn't strictly ordered (think about float8 vs numeric; OID is also a fly in the ointment) so "smallest" isn't very well defined. Something I have been thinking about is that we may actually have too many entries in pg_proc and pg_operator. Earlier in this thread, someone complained about this: regression=# select 'foo'::char(3) || 'bar'::varchar; ERROR: Unable to identify an operator '||' for types 'character' and 'character varying' You will have to retype this query using an explicit cast But it works fine if one of the operands is text: regression=# select 'foo'::char(3) || 'bar'::text; ?column? ---------- foobar (1 row) Why is that? It's because we have too many || operators: regression=# \do || List of operators Name | Left arg type | Right arg type | Result type | Description ------+-------------------+-------------------+-------------------+------------- ---------- || | bit | bit | bit | bitwise concatenation || | bytea | bytea | bytea | concatenate || | character | character | character | concatenate || | character varying | character varying | character varying | concatenate || | text | text | text | concatenate (5 rows) There's no exact match for char || varchar, and the resolver can't figure out whether to prefer char || char or varchar || varchar, so it punts. OTOH text is the "preferred type" in the string category, so it will select text || text if either operand is text. IMHO the correct fix is to remove the entries for char || char and varchar || varchar, so that there's only one textual || operator, namely text || text. This would mean the result was always considered text and never char or varchar, but so what? Those types are all assignment-compatible anyway. A related thought is that our numeric-hierarchy problems might be greatly simplified if we eliminated most or all of the cross-datatype numeric operators (eg, int4 plus int2) and insisted that these be handled as a type coercion step plus a single-datatype operator. In that way the resolver would be much less likely to be faced with multiple alternatives that it couldn't choose among. See the pghackers archives for past discussions if you want to pursue this. regards, tom lane