Обсуждение: Bad query plan when the wrong data type is used

Поиск
Список
Период
Сортировка

Bad query plan when the wrong data type is used

От
Laszlo Nagy
Дата:
This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
                            from  variation_item_sellingsite_asin visa
                            inner join product p on p.id = visa.product_id
                            inner join variation_item vi on vi.id = visa.variation_item_id
                            where visa.id =4

runs in 43 msec. The "visa.id" column has int4 datatype. The query plan uses an index condition:

"Nested Loop  (cost=0.00..26.19 rows=1 width=28)"
"  ->  Nested Loop  (cost=0.00..17.75 rows=1 width=24)"
"        ->  Index Scan using variation_item_sellingsite_asin_pkey on variation_item_sellingsite_asin visa  (cost=0.00..8.58 rows=1 width=16)"
"              Index Cond: (id = 4)"
"        ->  Index Scan using pk_product_id on product p  (cost=0.00..9.16 rows=1 width=16)"
"              Index Cond: (p.id = visa.product_id)"
"  ->  Index Scan using pk_variation_item_id on variation_item vi  (cost=0.00..8.43 rows=1 width=12)"
"        Index Cond: (vi.id = visa.variation_item_id)"


This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
                            from  variation_item_sellingsite_asin visa
                            inner join product p on p.id = visa.product_id
                            inner join variation_item vi on vi.id = visa.variation_item_id
                            where visa.id =4.0

Runs for  1144 msec! Query plan uses seq scan + filter:

"Nested Loop  (cost=33957.27..226162.68 rows=14374 width=28)"
"  ->  Hash Join  (cost=33957.27..106190.76 rows=14374 width=20)"
"        Hash Cond: (visa.variation_item_id = vi.id)"
"        ->  Seq Scan on variation_item_sellingsite_asin visa  (cost=0.00..71928.04 rows=14374 width=16)"
"              Filter: ((id)::numeric = 4.0)"
"        ->  Hash  (cost=22026.01..22026.01 rows=954501 width=12)"
"              ->  Seq Scan on variation_item vi  (cost=0.00..22026.01 rows=954501 width=12)"
"  ->  Index Scan using pk_product_id on product p  (cost=0.00..8.33 rows=1 width=16)"
"        Index Cond: (p.id = visa.product_id)"


Which is silly. I think that PostgreSQL converts the int side to a float, and then compares them.

It would be better to do this, for each item in the loop:
  • evaluate the right side (which is float)
  • tell if it is an integer or not
  • if not an integer, then discard the row immediately
  • otherwise use its integer value for the index scan
The result is identical, but it makes possible to use the index scan. Of course, I know that the query itself is wrong, because I sould not use a float where an int is expected. But this CAN be optimized, so I think it should be! My idea for the query optimizer is not to use the "wider" data type, but use the data type that has an index on it instead.

(I spent an hour figuring out what is wrong with my program. In some cases it was slow, in other cases it was really fast, and I never got an error message.)

What do you think?

   Laszlo

Re: Bad query plan when the wrong data type is used

От
Josh Berkus
Дата:
Laszlo,

> Which is silly. I think that PostgreSQL converts the int side to a
> float, and then compares them.
>
> It would be better to do this, for each item in the loop:
>
>     * evaluate the right side (which is float)
>     * tell if it is an integer or not
>     * if not an integer, then discard the row immediately
>     * otherwise use its integer value for the index scan

Not terribly likely, I'm afraid.  Data type coercion is *way* more
complex than you realize (consider the number of data types we have, and
the ability to add UDTs, and then square it).  And the functionality you
propose would break backwards compatibility; many people currently use
".0" currently in order to force a coercion to Float or Numeric.

I'm not saying that PostgreSQL couldn't do better on this kind of case,
but that doing better is a major project, not a minor one.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Bad query plan when the wrong data type is used

От
Dave Crooke
Дата:
You will get the same behaviour from any database product where the query as written requires type coercion - the coercion has to go in the direction of the "wider" type. I have seen the exact same scenario with Oracle, and I view it as a problem with the way the query is written, not with the database server.

Whoever coded the application which is making this query presumably knows that the visa.id field is an integer type in the schema they designed, so why are they passing a float? Convert the 4.0 to 4 on the application side instead, it's one function call or cast.

It's not reasonable to expect the query compiler to pick up the slack for poorly written SQL.

Cheers
Dave

On Tue, Feb 8, 2011 at 4:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
Laszlo,

> Which is silly. I think that PostgreSQL converts the int side to a
> float, and then compares them.
>
> It would be better to do this, for each item in the loop:
>
>     * evaluate the right side (which is float)
>     * tell if it is an integer or not
>     * if not an integer, then discard the row immediately
>     * otherwise use its integer value for the index scan

Not terribly likely, I'm afraid.  Data type coercion is *way* more
complex than you realize (consider the number of data types we have, and
the ability to add UDTs, and then square it).  And the functionality you
propose would break backwards compatibility; many people currently use
".0" currently in order to force a coercion to Float or Numeric.

I'm not saying that PostgreSQL couldn't do better on this kind of case,
but that doing better is a major project, not a minor one.

--
                                 -- Josh Berkus
                                    PostgreSQL Experts Inc.
                                    http://www.pgexperts.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Bad query plan when the wrong data type is used

От
Vitalii Tymchyshyn
Дата:
09.02.11 01:14, Dave Crooke написав(ла):
You will get the same behaviour from any database product where the query as written requires type coercion - the coercion has to go in the direction of the "wider" type. I have seen the exact same scenario with Oracle, and I view it as a problem with the way the query is written, not with the database server.

Whoever coded the application which is making this query presumably knows that the visa.id field is an integer type in the schema they designed, so why are they passing a float? Convert the 4.0 to 4 on the application side instead, it's one function call or cast.
Actually the problem may be in layers, and the  problem may even be not noticed until it's late enough. As far as I remember from this list there are problems with column being integer and parameter prepared as bigint or number. Same for number vs double vs float.
As for me it would be great for optimizer to consider the next:
1) val1::narrow = val2::wide as (val1::narrow = val2::narrow and val2::narrow = val2::wide)
2) val1::narrow < val2::wide as (val1::narrow < val2::narrow and val1::wide < val2::wide)
3) val1::narrow > val2::wide as (val1::narrow + 1 > val2::narrow and val1::wide > val2::wide)
Of course it should use additional check it this allows to use an index.
Surely, this is not an easy thing to implement, but as for me similar question are raised quite often in this list.

Best regards, Vitalii Tymchyshyn

Re: Bad query plan when the wrong data type is used

От
Robert Haas
Дата:
On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Laszlo,
>
>> Which is silly. I think that PostgreSQL converts the int side to a
>> float, and then compares them.
>>
>> It would be better to do this, for each item in the loop:
>>
>>     * evaluate the right side (which is float)
>>     * tell if it is an integer or not
>>     * if not an integer, then discard the row immediately
>>     * otherwise use its integer value for the index scan
>
> Not terribly likely, I'm afraid.  Data type coercion is *way* more
> complex than you realize (consider the number of data types we have, and
> the ability to add UDTs, and then square it).  And the functionality you
> propose would break backwards compatibility; many people currently use
> ".0" currently in order to force a coercion to Float or Numeric.
>
> I'm not saying that PostgreSQL couldn't do better on this kind of case,
> but that doing better is a major project, not a minor one.

Specifically, the problem is that x = 4.0, where x is an integer, is
defined to mean x::numeric = 4.0, not x = 4.0::integer.  If it meant
the latter, then testing x = 3.5 would throw an error, whereas what
actually happens is it just returns false.

Now, in this particular case, we all know that the only way x::numeric
= 4.0 can be true is if x = 4::int.  But that's a property of the
numeric and integer data types that doesn't hold in general.  Consider
t = 'foo'::citext, where t has type text.  That could be true if t =
'Foo' or t = 'foO' or t = 'FOO', etc.

We could fix this by adding some special case logic that understands
properties of integers and numeric values and optimizes x =
4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
That would be cool, in a way, but I'm not sure it's really worth the
code it would take, unless it falls naturally out of some larger
project in that area.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bad query plan when the wrong data type is used

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> I'm not saying that PostgreSQL couldn't do better on this kind of case,
>> but that doing better is a major project, not a minor one.

> Specifically, the problem is that x = 4.0, where x is an integer, is
> defined to mean x::numeric = 4.0, not x = 4.0::integer.  If it meant
> the latter, then testing x = 3.5 would throw an error, whereas what
> actually happens is it just returns false.

> We could fix this by adding some special case logic that understands
> properties of integers and numeric values and optimizes x =
> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
> That would be cool, in a way, but I'm not sure it's really worth the
> code it would take, unless it falls naturally out of some larger
> project in that area.

I think that most of the practical problems around this case could be
solved without such a hack.  What we should do instead is invent
cross-type operators "int = numeric" etc and make them members of both
the integer and numeric index opclasses.  There are reasons why that
wouldn't work for integer versus float (read the last section of
src/backend/access/nbtree/README) but right offhand it seems like it
ought to be safe enough for numeric.  Now, it wouldn't be quite as fast
as if we somehow downconverted numeric to integer beforehand, but at
least you'd only be talking about a slow comparison operator and not a
fundamentally stupider plan.  That's close enough for me, for what is
in the end a stupidly written query.

Of course, the above is still not exactly a small project, since you'd
be talking about something like 36 new operators to cover all of int2,
int4, int8.  But it's a straightforward extension.

            regards, tom lane

Re: Bad query plan when the wrong data type is used

От
Robert Haas
Дата:
On Sun, Feb 27, 2011 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> I'm not saying that PostgreSQL couldn't do better on this kind of case,
>>> but that doing better is a major project, not a minor one.
>
>> Specifically, the problem is that x = 4.0, where x is an integer, is
>> defined to mean x::numeric = 4.0, not x = 4.0::integer.  If it meant
>> the latter, then testing x = 3.5 would throw an error, whereas what
>> actually happens is it just returns false.
>
>> We could fix this by adding some special case logic that understands
>> properties of integers and numeric values and optimizes x =
>> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
>> That would be cool, in a way, but I'm not sure it's really worth the
>> code it would take, unless it falls naturally out of some larger
>> project in that area.
>
> I think that most of the practical problems around this case could be
> solved without such a hack.  What we should do instead is invent
> cross-type operators "int = numeric" etc and make them members of both
> the integer and numeric index opclasses.  There are reasons why that
> wouldn't work for integer versus float (read the last section of
> src/backend/access/nbtree/README) but right offhand it seems like it
> ought to be safe enough for numeric.  Now, it wouldn't be quite as fast
> as if we somehow downconverted numeric to integer beforehand, but at
> least you'd only be talking about a slow comparison operator and not a
> fundamentally stupider plan.  That's close enough for me, for what is
> in the end a stupidly written query.
>
> Of course, the above is still not exactly a small project, since you'd
> be talking about something like 36 new operators to cover all of int2,
> int4, int8.  But it's a straightforward extension.

Interesting.  Worth a TODO?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bad query plan when the wrong data type is used

От
Bruce Momjian
Дата:
On Mon, Feb 28, 2011 at 02:04:53PM -0500, Robert Haas wrote:
> On Sun, Feb 27, 2011 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
> >>> I'm not saying that PostgreSQL couldn't do better on this kind of case,
> >>> but that doing better is a major project, not a minor one.
> >
> >> Specifically, the problem is that x = 4.0, where x is an integer, is
> >> defined to mean x::numeric = 4.0, not x = 4.0::integer.  If it meant
> >> the latter, then testing x = 3.5 would throw an error, whereas what
> >> actually happens is it just returns false.
> >
> >> We could fix this by adding some special case logic that understands
> >> properties of integers and numeric values and optimizes x =
> >> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
> >> That would be cool, in a way, but I'm not sure it's really worth the
> >> code it would take, unless it falls naturally out of some larger
> >> project in that area.
> >
> > I think that most of the practical problems around this case could be
> > solved without such a hack.  What we should do instead is invent
> > cross-type operators "int = numeric" etc and make them members of both
> > the integer and numeric index opclasses.  There are reasons why that
> > wouldn't work for integer versus float (read the last section of
> > src/backend/access/nbtree/README) but right offhand it seems like it
> > ought to be safe enough for numeric.  Now, it wouldn't be quite as fast
> > as if we somehow downconverted numeric to integer beforehand, but at
> > least you'd only be talking about a slow comparison operator and not a
> > fundamentally stupider plan.  That's close enough for me, for what is
> > in the end a stupidly written query.
> >
> > Of course, the above is still not exactly a small project, since you'd
> > be talking about something like 36 new operators to cover all of int2,
> > int4, int8.  But it's a straightforward extension.
>
> Interesting.  Worth a TODO?

Since we are discussing int2 casting, I wanted to bring up this other
casting issue from 2011, in case it helped the discussion.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +