Обсуждение: How to use index in WHERE int = float

Поиск

Список

Период

Сортировка

How to use index in WHERE int = float

От

"Andrus"

Дата:

05 ноября 2008 г., 18:04:51

I have table with index

CREATE TABLE firma2.dok(
...
  dokumnr serial NOT NULL,
...
  CONSTRAINT dok_pkey PRIMARY KEY (dokumnr),
...
);

CREATE INDEX dok_dokumnr_idx
  ON firma2.dok
  USING btree
  (dokumnr);

I ran analyze command on it.

explain analyze select * from firma2.dok where  dokumnr='1228137'::float8

"Seq Scan on dok  (cost=0.00..187766.23 rows=6255 width=1145) (actual
time=43168.460..43176.063 rows=1 loops=1)"
"  Filter: ((dokumnr)::double precision = 1228137::double precision)"
"Total runtime: 43176.375 ms"

shows that index is not used and thus query takes very long time.

How to force PostgreSql to speed up without changing query ?

"PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC
i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,
pie-8.7.9)"

Andrus.

Re: How to use index in WHERE int = float

От

Sam Mason

Дата:

05 ноября 2008 г., 18:16:57

On Wed, Nov 05, 2008 at 08:52:50PM +0200, Andrus wrote:
> explain analyze select * from firma2.dok where  dokumnr='1228137'::float8
>
> "Seq Scan on dok  (cost=0.00..187766.23 rows=6255 width=1145) (actual
> time=43168.460..43176.063 rows=1 loops=1)"
> "  Filter: ((dokumnr)::double precision = 1228137::double precision)"
> "Total runtime: 43176.375 ms"
>
> shows that index is not used and thus query takes very long time.
>
> How to force PostgreSql to speed up without changing query ?

I'm not sure if PG 8.1 will recognize it, but you could try creating a
functional index on the column when cast to the appropriate type.  I.e:

  CREATE INDEX dok_dokumnr_float8_idx ON dok (((dokumnr)::double precision));


  Sam

Re: How to use index in WHERE int = float

От

Sam Mason

Дата:

05 ноября 2008 г., 20:07:53

On Wed, Nov 05, 2008 at 08:52:50PM +0200, Andrus wrote:
> explain analyze select * from firma2.dok where  dokumnr='1228137'::float8
>
> "Seq Scan on dok  (cost=0.00..187766.23 rows=6255 width=1145) (actual
> time=43168.460..43176.063 rows=1 loops=1)"
> "  Filter: ((dokumnr)::double precision = 1228137::double precision)"
> "Total runtime: 43176.375 ms"

I've just noticed that "actual time" doesn't run from zero; is this part
of another query, or is something else going on?


  Sam

Re: How to use index in WHERE int = float

От

"Scott Marlowe"

Дата:

05 ноября 2008 г., 20:21:35

On Wed, Nov 5, 2008 at 11:52 AM, Andrus <kobruleht2@hot.ee> wrote:
> I have table with index
>
> CREATE TABLE firma2.dok(
> ...
>  dokumnr serial NOT NULL,
> ...
>  CONSTRAINT dok_pkey PRIMARY KEY (dokumnr),
> ...
> );
>
> CREATE INDEX dok_dokumnr_idx
>  ON firma2.dok
>  USING btree
>  (dokumnr);
>
> I ran analyze command on it.
>
> explain analyze select * from firma2.dok where  dokumnr='1228137'::float8
>
> "Seq Scan on dok  (cost=0.00..187766.23 rows=6255 width=1145) (actual
> time=43168.460..43176.063 rows=1 loops=1)"
> "  Filter: ((dokumnr)::double precision = 1228137::double precision)"
> "Total runtime: 43176.375 ms"
>
> shows that index is not used and thus query takes very long time.
>
> How to force PostgreSql to speed up without changing query ?

Stop trying to compare exact and inexact types?

You do realize that a float is not an exact number.  What you and I
see as 1228137 might really be, internally, 1228136.9999999999999999 ?
 So it won't get an exact match.  What's wrong with trying to match to
an exact number instead?  I have a feeling you've given us a
simplified example of your problem.  It might help to see the real
problem.

Re: How to use index in WHERE int = float

От

"Andrus"

Дата:

05 ноября 2008 г., 20:42:27

> Stop trying to compare exact and inexact types?
>
> You do realize that a float is not an exact number.  What you and I
> see as 1228137 might really be, internally, 1228136.9999999999999999 ?
> So it won't get an exact match.  What's wrong with trying to match to
> an exact number instead?

My query contains

'1228137'::float8

I do'nt see

1228136.9999999999999999

in this query.

Those strange casts are auto-generated by ODBC parameter passing software
which I must use so I must live with it.

Only way to fix this it to replace parameters manually by creating strings
which I'm trying to do for slow queries.

> I have a feeling you've given us a
> simplified example of your problem.  It might help to see the real
> problem.

I has feeling that PostgreSql must determine that index is integer type and
convert float to integer itself to speed it up.

Another issue:

Wuery

   SELECT dokumnr
     FROM DOK
    where dokumnr IN (123)
 AND ( '0' or
  dokumnr IN (SELECT dokumnr FROM bilkaib WHERE
         alusdok='LG' AND masin LIKE 'a%') )

runs slowly.

If

'0' or

is removed form where clause it runs fast since it founds indexes.

It is real surpise that expression containing

'0' or

prevents PostgreSql to use indexes.


I have changed those queries to manually created better sql statements which
run fast.
however I expected that PsotgreSql can optimize those itself.

Andrus.

Re: How to use index in WHERE int = float

От

"Scott Marlowe"

Дата:

05 ноября 2008 г., 20:48:59

On Wed, Nov 5, 2008 at 2:42 PM, Andrus <kobruleht2@hot.ee> wrote:
>> Stop trying to compare exact and inexact types?
>>
>> You do realize that a float is not an exact number.  What you and I
>> see as 1228137 might really be, internally, 1228136.9999999999999999 ?
>> So it won't get an exact match.  What's wrong with trying to match to
>> an exact number instead?
>
> My query contains
>
> '1228137'::float8
>
> I do'nt see
>
> 1228136.9999999999999999
>
> in this query.

Did you read what I wrote?  Cause you just repeated it as an argument
against my point.

 I don't think you get the difference between exact and inexact types.
 Google it for more comprehensive reading on the subject.

> Those strange casts are auto-generated by ODBC parameter passing software
> which I must use so I must live with it.

Sorry to hear that.

PostgreSQL doesn't automatically use indexes when comparing
incompatible types (i.e. exact versus non-exact types) so it's up to
you to use casting to make them match up.  There are lots of things
you can do here, but if you're comparing ints with floats you will
eventually be bitten by some comparison that should match but doesn't.
 Stick to exact number types if possible.

Re: How to use index in WHERE int = float

От

Sam Mason

Дата:

05 ноября 2008 г., 20:49:42

On Wed, Nov 05, 2008 at 02:21:33PM -0700, Scott Marlowe wrote:
> On Wed, Nov 5, 2008 at 11:52 AM, Andrus <kobruleht2@hot.ee> wrote:
> > explain analyze select * from firma2.dok where  dokumnr='1228137'::float8
> >
> > How to force PostgreSql to speed up without changing query ?
>
> Stop trying to compare exact and inexact types?
>
> You do realize that a float is not an exact number.  What you and I
> see as 1228137 might really be, internally, 1228136.9999999999999999 ?

A double precision number is defined to have 52bits of precision;
which implies that integer values up to 4503599627370496 can be stored
accurately (I'd not recommend relying on this though).  In terms of
correctness, what he's doing is OK as the column is a 32bit int and its
complete range can be represented accurately.  Yet, from an aesthetic
point of view, the OP is loosing a lot of points here.  PG 8.3 would
even throw it out, unless dokumnr was explicitly cast to a float8 as
well.

Complaining about performance when doing this seems somewhat
contradictory!  Asking the database to do extra work (in this case,
converting an integer to a floating point number, and then doing a
floating point compare) for no good reason is never good.

  Sam

Re: How to use index in WHERE int = float

От

"Andrus"

Дата:

05 ноября 2008 г., 21:04:52

>> explain analyze select * from firma2.dok where  dokumnr='1228137'::float8
>>
>> "Seq Scan on dok  (cost=0.00..187766.23 rows=6255 width=1145) (actual
>> time=43168.460..43176.063 rows=1 loops=1)"
>> "  Filter: ((dokumnr)::double precision = 1228137::double precision)"
>> "Total runtime: 43176.375 ms"
>
> I've just noticed that "actual time" doesn't run from zero; is this part
> of another query, or is something else going on?

Is is standalone query.
This is exact statement executed in PgAdmin window and exact result copied
from pgadmin result windw.

I tried it again and got

"Seq Scan on dok  (cost=0.00..186943.67 rows=5974 width=1173) (actual
time=39275.713..39319.865 rows=1 loops=1)"
"  Filter: ((dokumnr)::double precision = 1228137::double precision)"
"Total runtime: 39320.030 ms"

Andrus.

Re: How to use index in WHERE int = float

От

Sam Mason

Дата:

05 ноября 2008 г., 21:26:50

On Wed, Nov 05, 2008 at 11:42:12PM +0200, Andrus wrote:
> Scott Marlowe wrote:
> >You do realize that a float is not an exact number.  What you and I
> >see as 1228137 might really be, internally, 1228136.9999999999999999 ?
>
> My query contains
> '1228137'::float8
> I do'nt see
> 1228136.9999999999999999
> in this query.

As Scott said, I'd definitely recommend reading about about how floating
point numbers are represented and manipulated.  The semantics can get a
bit awkward in the corner cases and having a rough idea of where these
are will help point you in the right direction should strangeness (e.g.
when does 1/3*3 not equal 1) start of occur.

> Those strange casts are auto-generated by ODBC parameter passing software
> which I must use so I must live with it.
>
> Only way to fix this it to replace parameters manually by creating strings
> which I'm trying to do for slow queries.

These last two statements seems mutually exclusive; either use the
"ODBC parameter passing software" or don't.  By the sounds of it, I'd
recommend not using it.

> I has feeling that PostgreSql must determine that index is integer type and
> convert float to integer itself to speed it up.

If you build a functional index as in an earlier email from me, it
should just work.  Whether it's a good idea for it to work this way is
another matter entirely!

> Another issue:
>
>   SELECT dokumnr
>     FROM DOK
>    where dokumnr IN (123)
> AND ( '0' or
>  dokumnr IN (SELECT dokumnr FROM bilkaib WHERE
>         alusdok='LG' AND masin LIKE 'a%') )

> If
> '0' or
> is removed form where clause it runs fast since it founds indexes.
> It is real surpise that expression containing
> '0' or
> prevents PostgreSql to use indexes.

Constant propagation is something that PG doesn't do amazingly well at;
I think the main reason is that it's assumed the programmer is capable
of removing most constants and so the optimizer doesn't worry too much
about it.

That said, in 8.3.4, if I run:

  EXPLAIN SELECT * FROM source_livestock
    WHERE FALSE OR id IN (
      SELECT id FROM source_livestock LIMIT 10);

I get:

   Seq Scan on source_livestock  (cost=0.22..848037.12 rows=18949156 width=58)
     Filter: (hashed subplan)
     SubPlan
       ->  Limit  (cost=0.00..0.20 rows=10 width=4)
             ->  Seq Scan on source_livestock  (cost=0.00..753291.12 rows=37898312 width=4)

and if I replace the FALSE with TRUE, giving:

  EXPLAIN SELECT * FROM source_livestock
    WHERE TRUE OR id IN (
      SELECT id FROM source_livestock LIMIT 10);

it realizes the subselect isn't needed and I just get:

   Seq Scan on source_livestock  (cost=0.00..753291.12 rows=37898312 width=58)

so it seems to some propagation is performed, but not all cases are
handled.  I don't mind much though; it's a case I know about now and can
deal with.

  Sam

Re: How to use index in WHERE int = float

От

"Andrus"

Дата:

05 ноября 2008 г., 21:34:53

> PG 8.3 would
> even throw it out, unless dokumnr was explicitly cast to a float8 as
> well.

I tried in 8.3

create temp table dok ( dokumnr serial primary key );
select * from dok where dokumnr='1'::float8

and this run without error.

So i do'nt understand how 8.3 throws out.

Andrus.

Re: How to use index in WHERE int = float

От

"Andrus"

Дата:

05 ноября 2008 г., 21:34:53

> Did you read what I wrote?  Cause you just repeated it as an argument
> against my point.

Lets re-visit the second issue in my reply.

I tried in 8.3

explain SELECT dokumnr
     FROM DOK
    where dokumnr IN (1227714)
 AND
 ( '0'  or
  dokumnr IN (SELECT dokumnr FROM firma1.bilkaib )
          )

"Index Scan using dok_pkey on dok  (cost=16.55..24.82 rows=1 width=4)"
"  Index Cond: (dokumnr = 1227714)"
"  Filter: (hashed subplan)"
"  SubPlan"
"    ->  Seq Scan on bilkaib  (cost=0.00..15.44 rows=444 width=4)"


and

  explain SELECT dokumnr
     FROM DOK
    where dokumnr IN (1227714)
 AND
 ( -- '0'  or
  dokumnr IN (SELECT dokumnr FROM firma1.bilkaib )
          )

"Nested Loop IN Join  (cost=0.00..16.55 rows=1 width=4)"
"  ->  Index Scan using dok_pkey on dok  (cost=0.00..8.27 rows=1 width=4)"
"        Index Cond: (dokumnr = 1227714)"
"  ->  Index Scan using bilkaib_dokumnr_idx on bilkaib  (cost=0.00..8.27
rows=1 width=4)"
"        Index Cond: (bilkaib.dokumnr = 1227714)"


As you see simply removing constant expression

 '0' or

produces different query plan which is much faster for large amoutnts of
data.
Same results are for large data set and for earlier postgresql versions.

Do you think that is OK and reasonable ?

Andrus.

Re: How to use index in WHERE int = float

От

Tom Lane

Дата:

05 ноября 2008 г., 23:13:42

"Andrus" <kobruleht2@hot.ee> writes:
> As you see simply removing constant expression
>  '0' or
> produces different query plan which is much faster for large amoutnts of
> data.

The IN-pullup code runs before constant-simplification does, so it
doesn't see that as a simple join condition.

            regards, tom lane

Re: How to use index in WHERE int = float

От

Sam Mason

Дата:

07 ноября 2008 г., 13:22:39

On Thu, Nov 06, 2008 at 12:08:57AM +0200, Andrus wrote:
> >PG 8.3 would
> >even throw it out, unless dokumnr was explicitly cast to a float8 as
> >well.
>
> I tried in 8.3
>
> create temp table dok ( dokumnr serial primary key );
> select * from dok where dokumnr='1'::float8
>
> and this run without error.

Hum, it doesn't seem to does it.  Sorry, I was thinking PG was more
strict than it appears to be.

> So i do'nt understand how 8.3 throws out.

It won't, I was getting confused!


  Sam

Re: How to use index in WHERE int = float

От

"Andrus"

Дата:

09 ноября 2008 г., 21:05:39

> The IN-pullup code runs before constant-simplification does, so it
> doesn't see that as a simple join condition.

Seems serious design flaw.
How to change expression parser so that contant parts of expressions are
removed before IN-pullup?

Andrus.

Re: How to use index in WHERE int = float

От

Tom Lane

Дата:

09 ноября 2008 г., 21:22:44

"Andrus" <kobruleht2@hot.ee> writes:
>> The IN-pullup code runs before constant-simplification does, so it
>> doesn't see that as a simple join condition.

> Seems serious design flaw.
> How to change expression parser so that contant parts of expressions are
> removed before IN-pullup?

It would be far simpler to fix your query generator to not emit the
useless "0 or".

Even if we made the planner deal with that, the number of cycles it
would expend to recover from the generator's stupidity would be several
orders of magnitude higher than the number of cycles needed to not be so
stupid.

            regards, tom lane

Re: How to use index in WHERE int = float

От

"Andrus"

Дата:

10 ноября 2008 г., 08:03:37

> It would be far simpler to fix your query generator to not emit the
> useless "0 or".

I'm using ODBC  and npgsql drivers. Those drivers replace parameters
automatically.

E.q. for npgsql or every other ADO .NET I can write

"SELECT * FROM (:param1 OR (x IN SELECT y FROM z) AND :param2) ...

etc.

param1, param2, ..., paramn etc. are entered by user and passed to driver as
parameters.

Proposed solution requires:

1. Stop using this type parameter replacement. Write case statements for
every possible paramn combination in application code,
total n! case statements.

or

2. Create expression parser which understands syntax for every backend
version and simplifies expression as string before passing to backend.

> Even if we made the planner deal with that, the number of cycles it
> would expend to recover from the generator's stupidity would be several
> orders of magnitude higher than the number of cycles needed to not be so
> stupid.

Proposed solution requires re-writing ODBC and npgsql and possibly every
other driver to add expression parser and constant optimization to it in
client side.
This seems to be huge work and no one dbms does not implement this.
In this case every query is processed two times. I do'nt understand how this
takes less time that processing query once in backend.

Andrus.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: How to use index in WHERE int = float