Обсуждение: Slow UPADTE, compared to INSERT

Поиск
Список
Период
Сортировка

Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
Hello!

I am relative newcomer to SQL and PostgreSQL world, so please forgive me
if this question is stupid.

I am experiencing strange behaviour, where simple UPDATE of one field is
very slow, compared to INSERT into table with multiple indexes. I have
two tables - one with raw data records (about 24000), where one field
contains status information (varchar(10)). First table has no indexes,
only primary key (recid). Second table contains processed records - some
fields are same as first table, others are calculated during processing.
Records are processed by Python script, which uses PyPgSQL for PostgreSQL
access.

Processing is done by selecting all records from table1 where status
matches certain criteria (import). Each record is processed and results
are inserted into table2, after inserting status field on same record in
table1 is updated with new value (done). Update statement itself is
extremely simple: "update table1 set status = 'done' where recid = ..."

Most interesting is, that insert takes 0.004 seconds in average, but
update takes 0.255 seconds in average. Processing of 24000 records took
around 1 hour 20 minutes.

Then i changed processing logic not to update every record in table1
after processing. Instead i did insert recid value into temporary table
and updated records in table1 after all records were processed and
inserted into table2:
UPDATE table1 SET Status = 'done' WHERE recid IN (SELECT recid FROM temptable)

This way i got processing time of 24000 records down to about 16 minutes.
About 13 minutes from this took last UPDATE statement.

Why is UPDATE so slow compared to INSERT? I would expect more or less
similar performance, or slower on insert since table2 has four indexes
in addition to primary key, table1 has only primary key, which is used
on update. Am i doing something wrong or is this normal?

I am using PostgreSQL 7.3.4, Debian/GNU Linux 3.0 (Woody),
kernel 2.4.21, Python 2.3.2, PyPgSQL 2.4

--
Ivar Zarans


Re: Slow UPADTE, compared to INSERT

От
Jeff
Дата:
On Thu, 4 Dec 2003 20:57:51 +0200
Ivar Zarans <iff@alcaron.ee> wrote:
.

> table1 is updated with new value (done). Update statement itself is
> extremely simple: "update table1 set status = 'done' where recid =
> ..."
>
> Most interesting is, that insert takes 0.004 seconds in average, but
> update takes 0.255 seconds in average. Processing of 24000 records
> took around 1 hour 20 minutes.

Do you have an index on recid?

and did you vacuum analyze after you loaded up the data?

>
> Then i changed processing logic not to update every record in table1
> after processing. Instead i did insert recid value into temporary
> table and updated records in table1 after all records were processed
> and inserted into table2:
> UPDATE table1 SET Status = 'done' WHERE recid IN (SELECT recid FROM
> temptable)
>

"IN" queries are terribly slow on versions before 7.4

> Why is UPDATE so slow compared to INSERT? I would expect more or less
> similar performance, or slower on insert since table2 has four indexes
> in addition to primary key, table1 has only primary key, which is used
> on update. Am i doing something wrong or is this normal?
>

Remember, UPDATE has to do all the work of select and more.

And if you have 4 indexes those will also add to the time (Since it has
to update/add them to the tree)

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Thu, Dec 04, 2003 at 02:23:20PM -0500, Jeff wrote:

> > Most interesting is, that insert takes 0.004 seconds in average, but
> > update takes 0.255 seconds in average. Processing of 24000 records
> > took around 1 hour 20 minutes.
>
> Do you have an index on recid?

Yes, this is primary key of table1

> and did you vacuum analyze after you loaded up the data?

No, this is running as nightly cronjob. All tests were done during one
day, so no vacuum was done.

> "IN" queries are terribly slow on versions before 7.4

OK, this is useful to know :)

> > Why is UPDATE so slow compared to INSERT? I would expect more or less
> > similar performance, or slower on insert since table2 has four indexes
> > in addition to primary key, table1 has only primary key, which is used
> > on update. Am i doing something wrong or is this normal?

> Remember, UPDATE has to do all the work of select and more.
>
> And if you have 4 indexes those will also add to the time (Since it has
> to update/add them to the tree)

My primary concern is performance difference between INSERT and UPDATE
in my first tests. There i did select from table1, fetched record,
processed it and inserted into table2. Then updated status of fetched
record in table1. Repeated in cycle as long as fetch returned record.
Average time for INSERT was 0.004 seconds, average time for UPDATE 0.255
seconds. Update was done as "update table1 set status = 'done' where
recid = xxxx". As far as i understand, this type of simple update should
be faster, compared to INSERT into table with four indexes, but in my
case it is more than 60 times slower. Why??

My second tests were done with temporary table and update query as:
"UPDATE table1 SET Status = 'done' WHERE recid IN (SELECT recid FROM
temptable)". It is still slower than INSERT, but more or less
acceptable. Compared to my first tests overall processing time dropped
from 1 hour and 20 minutes to 16 minutes.

So, my question remains - why is simple update more than 60 times
slower, compared to INSERT?  Any ideas?

--
Ivar Zarans

Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Thu, Dec 04, 2003 at 08:23:36PM +0000, Richard Huxton wrote:

> Ah - it's probably not the update but the IN. You can rewrite it using PG's
> non-standard FROM:
>
> UPDATE t1 SET status='done' FROM t_tmp WHERE t1.rec_id = t_tmp.rec_id;

Thanks for the hint. I'll try this.

> Now that doesn't explain why the update is taking so long. One fifth of a
> second is extremely slow. Are you certain that the index is being used?

Explain shows following output:

explain update table1 set status = 'PROC' where recid = '199901';

Index Scan using table1_pkey on table1  (cost=0.00..6.01 rows=1 width=198)
   Index Cond: (recid = 199901::bigint)
   (2 rows)



--
Ivar Zarans


Re: Slow UPADTE, compared to INSERT

От
William Yu
Дата:
Ivar Zarans wrote:
>
> I am experiencing strange behaviour, where simple UPDATE of one field is
> very slow, compared to INSERT into table with multiple indexes. I have
> two tables - one with raw data records (about 24000), where one field

In Postgres and any other DB that uses MVCC (multi-version concurrency),
UPDATES will always be slower than INSERTS. With MVCC, what the DB does
is makes a copy of the record, updates that record and then invalidates
the previous record. This allows maintains a consistent view for anybody
who's reading the DB and also avoids the requirement of row locks.

If you have to use UPDATE, make sure (1) your UPDATE WHERE clause is
properly indexed and (2) you are running ANALYZE/VACUUM periodically so
the query planner can optimize for your UPDATE statements.


Re: Slow UPADTE, compared to INSERT

От
Richard Huxton
Дата:
On Thursday 04 December 2003 19:51, Ivar Zarans wrote:
>
> My second tests were done with temporary table and update query as:
> "UPDATE table1 SET Status = 'done' WHERE recid IN (SELECT recid FROM
> temptable)". It is still slower than INSERT, but more or less
> acceptable. Compared to my first tests overall processing time dropped
> from 1 hour and 20 minutes to 16 minutes.

Ah - it's probably not the update but the IN. You can rewrite it using PG's
non-standard FROM:

UPDATE t1 SET status='done' FROM t_tmp WHERE t1.rec_id = t_tmp.rec_id;

Now that doesn't explain why the update is taking so long. One fifth of a
second is extremely slow. Are you certain that the index is being used?

--
  Richard Huxton
  Archonet Ltd

Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Thu, Dec 04, 2003 at 08:23:36PM +0000, Richard Huxton wrote:

> Ah - it's probably not the update but the IN. You can rewrite it using PG's
> non-standard FROM:
>
> UPDATE t1 SET status='done' FROM t_tmp WHERE t1.rec_id = t_tmp.rec_id;

This was one *very useful* hint! Using this method i got my processing
time of 24000 records down to around 3 minutes 10 seconds. Comparing
with initial 1 hour 20 minutes and then 16 minutes, this is impressive
improvement!

> Now that doesn't explain why the update is taking so long. One fifth of a
> second is extremely slow. Are you certain that the index is being used?

I posted results of "EXPLAIN" in my previous message. Meanwhile i tried
to update just one record, using "psql". Also tried out "EXPLAIN
ANALYZE". This way i did not see any big delay - total runtime for one
update was around 1 msec.

I am confused - has slowness of UPDATE something to do with Python and
PyPgSQL, since "psql" seems to have no delay whatsoever? Or is this
related to using two cursors, one for select results and other for
update? Even if this is related to Python or cursors, how am i getting
so big speed improvement only by using different query?

--
Ivar


Re: Slow UPADTE, compared to INSERT

От
Richard Huxton
Дата:
On Thursday 04 December 2003 19:59, William Yu wrote:
> Ivar Zarans wrote:
> > I am experiencing strange behaviour, where simple UPDATE of one field is
> > very slow, compared to INSERT into table with multiple indexes. I have
> > two tables - one with raw data records (about 24000), where one field
>
> In Postgres and any other DB that uses MVCC (multi-version concurrency),
> UPDATES will always be slower than INSERTS. With MVCC, what the DB does
> is makes a copy of the record, updates that record and then invalidates
> the previous record.
[snip]

Yes, but he's seeing 0.25secs to update one row - that's something odd.

--
  Richard Huxton
  Archonet Ltd

Re: Slow UPADTE, compared to INSERT

От
Richard Huxton
Дата:
On Thursday 04 December 2003 22:13, Ivar Zarans wrote:
> On Thu, Dec 04, 2003 at 08:23:36PM +0000, Richard Huxton wrote:
> > Ah - it's probably not the update but the IN. You can rewrite it using
> > PG's non-standard FROM:
> >
> > UPDATE t1 SET status='done' FROM t_tmp WHERE t1.rec_id = t_tmp.rec_id;
>
> This was one *very useful* hint! Using this method i got my processing
> time of 24000 records down to around 3 minutes 10 seconds. Comparing
> with initial 1 hour 20 minutes and then 16 minutes, this is impressive
> improvement!

Be aware, this is specific to PG - I'm not aware of this construction working
on any other DB. Three minutes still doesn't sound brilliant, but that could
be tuning issues.

> > Now that doesn't explain why the update is taking so long. One fifth of a
> > second is extremely slow. Are you certain that the index is being used?
>
> I posted results of "EXPLAIN" in my previous message. Meanwhile i tried
> to update just one record, using "psql". Also tried out "EXPLAIN
> ANALYZE". This way i did not see any big delay - total runtime for one
> update was around 1 msec.

Yep - the explain looked fine. If you run EXPLAIN ANALYSE it will give you
timings too (actual timings will be slightly less than reported ones since PG
won't be timing/reporting).

> I am confused - has slowness of UPDATE something to do with Python and
> PyPgSQL, since "psql" seems to have no delay whatsoever? Or is this
> related to using two cursors, one for select results and other for
> update? Even if this is related to Python or cursors, how am i getting
> so big speed improvement only by using different query?

Hmm - you didn't mention cursors. If this was a problem with PyPgSQL in
general I suspect we'd know about it by now. It could however be some
cursor-related issue. In general, you're probably better off trying to do
updates/inserts as a single statement and letting PG manage things rather
than processing one row at a time.

If you've got the time, try putting together a small test-script with some
dummy data and see if it's reproducible. I'm sure the other Python users
would be interested in seeing where the problem is.

--
  Richard Huxton
  Archonet Ltd

Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Thu, Dec 04, 2003 at 10:45:21PM +0000, Richard Huxton wrote:

> If you've got the time, try putting together a small test-script with some
> dummy data and see if it's reproducible. I'm sure the other Python users
> would be interested in seeing where the problem is.

Tried with test-script, but this functioned normally (Murphy's law!).
Then tweaked postrgesql.conf and switched on debugging options. Results
show (in my opinion) that Python has nothing to do with slow UPDATE.
Timing from postgresql itself shows duration of 0.29 sec.

===
postgres[21247]: [2707] DEBUG:  StartTransactionCommand
postgres[21247]: [2708-1] LOG:  query:
postgres[21247]: [2708-2]     UPDATE
postgres[21247]: [2708-3]         imp_cdr_200311
postgres[21247]: [2708-4]     SET
postgres[21247]: [2708-5]         Status = 'SKIP'
postgres[21247]: [2708-6]     WHERE
postgres[21247]: [2708-7]         ImpRecID = '202425'
...
Skipped rewritten parse tree
...
postgres[21247]: [2710-1] LOG:  plan:
postgres[21247]: [2710-2]    { INDEXSCAN
postgres[21247]: [2710-3]    :startup_cost 0.00
postgres[21247]: [2710-4]    :total_cost 6.01
postgres[21247]: [2710-5]    :rows 1
postgres[21247]: [2710-6]    :width 199
postgres[21247]: [2710-7]    :qptargetlist (
...
Skipped target list
...
postgres[21247]: [2711] DEBUG:  CommitTransactionCommand
postgres[21247]: [2712] LOG:  duration: 0.292529 sec
===

Any suggestions for further investigation?

--
Ivar Zarans


Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
I have played around with explain and explain analyze and noticed one
interesting oddity:

===
explain UPDATE table1 SET status = 'SKIP' WHERE recid = 196641;

 Seq Scan on table1 (cost=0.00..16709.97 rows=1 width=199)
 Filter: (recid = 196641)

===

explain UPDATE table1 SET status = 'SKIP' WHERE recid = '196641';

 Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=199)
 Index Cond: (recid = 196641::bigint)

===

explain UPDATE table1 SET status = 'SKIP' WHERE recid = 196641::bigint;

 Index Scan using table1_pkey on table1 (cost=0.00..6.01 rows=1 width=199)
 Index Cond: (recid = 196641::bigint)

===

Why first example, where recid is given as numeric constant, is using
sequential scan, but second example, where recid is given as string
constant works with index scan, as expected? Third example shows, that
numeric constant must be typecasted in order to function properly.

Is this normal behaviour of fields with bigint type?

--
Ivar Zarans


Re: Slow UPADTE, compared to INSERT

От
Christopher Kings-Lynne
Дата:
> Why first example, where recid is given as numeric constant, is using
> sequential scan, but second example, where recid is given as string
> constant works with index scan, as expected? Third example shows, that
> numeric constant must be typecasted in order to function properly.
>
> Is this normal behaviour of fields with bigint type?

Yes, it's a known performance problem in PostgreSQL 7.4 and below.  I
believe it's been fixed in 7.5 CVS already.

Chris



Re: Slow UPADTE, compared to INSERT

От
Richard Huxton
Дата:
On Friday 05 December 2003 02:07, Ivar Zarans wrote:
> I have played around with explain and explain analyze and noticed one
> interesting oddity:
[snip]
> Why first example, where recid is given as numeric constant, is using
> sequential scan, but second example, where recid is given as string
> constant works with index scan, as expected? Third example shows, that
> numeric constant must be typecasted in order to function properly.
>
> Is this normal behaviour of fields with bigint type?

As Christopher says, normal (albeit irritating). Not sure it applies here -
all the examples you've shown me are using the index.

Well - I must admit I'm stumped. Unless you have a *lot* of indexes and
foreign keys to check, I can't see why it would take so long to update a
single row. Can you post the schema for the table?
--
  Richard Huxton
  Archonet Ltd

Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Fri, Dec 05, 2003 at 10:08:20AM +0000, Richard Huxton wrote:

> > numeric constant must be typecasted in order to function properly.
> >
> > Is this normal behaviour of fields with bigint type?
>
> As Christopher says, normal (albeit irritating). Not sure it applies here -
> all the examples you've shown me are using the index.

I guess i have solved this mystery. Problem appears to be exactly with
this - numeric constant representation in query.

I am using PyPgSQL for PostgreSQL access and making update queries as this:

qry = "UPDATE table1 SET status = %s WHERE recid = %s"
cursor.execute(qry, status, recid)

Execute method of cursor object is supposed to merge "status" and
"recid" values into "qry", using proper quoting. When i started to play
around with debug information i noticed, that this query used sequential
scan for "recid". Then i also noticed, that query, sent to server looked
like this:
"UPDATE table1 SET status = 'SKIP' WHERE recid = 199901"

Sure enough, when i used psql and EXPLAIN on this query, i got query
plan with sequential scan. And using recid value as string or typecasted
integer gave correct results with index scan. I wrote about this in my
previous message.

It seems, that PyPgSQL query quoting is not aware of this performance
problem (to which Cristopher referred) and final query, sent to server
is correct SQL, but not correct, considering PostgreSQL bugs.

One more explanation - previously i posted some logs, showing correct
query, using index scan, but still taking 0.29 seconds. Reason for this
delay is logging itself - it generates enough IO traffic to have impact
on query speed. With logging disabled, this query takes around 0.0022
seconds, which is perfectly normal.

Finally - what would be correct solution to this problem? Upgrading to
7.5 CVS is not an option :) One possibility is not to use PyPgSQL
variable substitution and create every query "by hand" - not very nice
solution, since variable substitution and quoting is quite convenient.

Second (and better) possibility is to ask PyPgSQL develeopers to take care
of PostgreSQL oddities.

Any other suggestions?

--
Ivar


Re: Slow UPADTE, compared to INSERT

От
Shridhar Daithankar
Дата:
Ivar Zarans wrote:
> It seems, that PyPgSQL query quoting is not aware of this performance
> problem (to which Cristopher referred) and final query, sent to server
> is correct SQL, but not correct, considering PostgreSQL bugs.

Personally I don't consider a bug but anyways.. You are the one facing problem
so I understand..

> Finally - what would be correct solution to this problem? Upgrading to
> 7.5 CVS is not an option :) One possibility is not to use PyPgSQL
> variable substitution and create every query "by hand" - not very nice
> solution, since variable substitution and quoting is quite convenient.
>
> Second (and better) possibility is to ask PyPgSQL develeopers to take care
> of PostgreSQL oddities.
>
> Any other suggestions?

I know zero in python but just guessing..

Will following help?

qry = "UPDATE table1 SET status = %s WHERE recid = '%s'"
cursor.execute(qry, status, recid)

  Just a thought..

  Shridhar


Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Fri, Dec 05, 2003 at 06:19:46PM +0530, Shridhar Daithankar wrote:

> >is correct SQL, but not correct, considering PostgreSQL bugs.
>
> Personally I don't consider a bug but anyways.. You are the one facing
> problem so I understand..

Well, if this is not bug, then what is consideration behind this
behaviour? BTW, according to Cristopher it is fixed in 7.5 CVS.
Why fix it if this is not a bug? :))

One more question - is this "feature" related only to "bigint" fields,
or are other datatypes affected as well?

> Will following help?
>
> qry = "UPDATE table1 SET status = %s WHERE recid = '%s'"
> cursor.execute(qry, status, recid)

Yes, this helps. But then it sort of obsoletes PyPgSQL-s own quoting
logic. I would prefer to take care of this all by myself or trust some
underlying code to do this for me. And PyPgSQL is quite nice - it
checks datatype and acts accordingly.

--
Ivar


Re: Slow UPADTE, compared to INSERT

От
Richard Huxton
Дата:
On Friday 05 December 2003 12:49, Shridhar Daithankar wrote:
> Ivar Zarans wrote:
> > It seems, that PyPgSQL query quoting is not aware of this performance
> > problem (to which Cristopher referred) and final query, sent to server
> > is correct SQL, but not correct, considering PostgreSQL bugs.

>
> Will following help?
>
> qry = "UPDATE table1 SET status = %s WHERE recid = '%s'"
> cursor.execute(qry, status, recid)

Better IMHO would be: "UPDATE table1 SET status = %s WHERE recid = %s::int8"

PG is very strict regarding types - normally a good thing, but it can hit you
unexpectedly in this scenario. The reason is that the literal number is
treated as int4, whereas quoted it is marked as type unknown. Unkown gets
cast to int8, whereas int4 gets left as-is. If you want to know why int4
doesn't get promoted to int8 automatically, browse the hackers list for the
last couple of years.

--
  Richard Huxton
  Archonet Ltd

Re: Slow UPADTE, compared to INSERT

От
Shridhar Daithankar
Дата:
Ivar Zarans wrote:
> On Fri, Dec 05, 2003 at 06:19:46PM +0530, Shridhar Daithankar wrote:
>
>
>>>is correct SQL, but not correct, considering PostgreSQL bugs.
>>Personally I don't consider a bug but anyways.. You are the one facing
>>problem so I understand..
> Well, if this is not bug, then what is consideration behind this
> behaviour? BTW, according to Cristopher it is fixed in 7.5 CVS.
> Why fix it if this is not a bug? :))

This is not a bug. It is just that people find it confusing when postgresql
planner consider seemingly same type as different. e.g. treating int8 as
different than int4. Obvious thinking is they should be same. But given
postgresql's flexibility with create type, it is difficult to promote.

AFAIK, the fix in CVS is to make indexes operatable with seemingly compatible
types. Which does not change the fact that postgresql can not upgrade data types
on it's own.

Write good queries which adhere to strict data typing. It is better to
understand anyway.

> One more question - is this "feature" related only to "bigint" fields,
> or are other datatypes affected as well?

Every data type is affected. int2 will not use a int4 index and so on.

>>Will following help?
>>
>>qry = "UPDATE table1 SET status = %s WHERE recid = '%s'"
>>cursor.execute(qry, status, recid)
>
>
> Yes, this helps. But then it sort of obsoletes PyPgSQL-s own quoting
> logic. I would prefer to take care of this all by myself or trust some
> underlying code to do this for me. And PyPgSQL is quite nice - it
> checks datatype and acts accordingly.

Well, then pypgsql should be upgraded to query the pg catalogd to find exact
type of column. But that would be too cumbersome I guess.

  Shridhar


Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Fri, Dec 05, 2003 at 01:23:43PM +0000, Richard Huxton wrote:

> Better IMHO would be: "UPDATE table1 SET status = %s WHERE recid = %s::int8"

Thanks for the hint!

> unexpectedly in this scenario. The reason is that the literal number is
> treated as int4, whereas quoted it is marked as type unknown. Unkown gets
> cast to int8, whereas int4 gets left as-is.

This explains a lot. Thanks!
BTW, is this mentioned somewhere in PostgreSQL documentation? I can't
remember anything on this subject. Maybe i just somehow skipped it...

--
Ivar


Re: Slow UPADTE, compared to INSERT

От
Ivar Zarans
Дата:
On Fri, Dec 05, 2003 at 07:21:38PM +0530, Shridhar Daithankar wrote:

> planner consider seemingly same type as different. e.g. treating int8 as
> different than int4. Obvious thinking is they should be same. But given
> postgresql's flexibility with create type, it is difficult to promote.

OK, this makes sense and explains a lot. Thanks!

> Well, then pypgsql should be upgraded to query the pg catalogd to find
> exact type of column. But that would be too cumbersome I guess.

Yes, so it seems. Time to rewrite my queries :)
Thanks again for help and explanations!

--
Ivar


Re: Slow UPADTE, compared to INSERT

От
"Mike C. Fletcher"
Дата:
I just spent 2 days tracking this error down in my own code, actually.
What I wound up doing is having the two places where I generate the
queries (everything in my system goes through those two points, as I'm
using a middleware layer) check values used as identifying fields for
the presence of a bigint, and if one exists, replaces it with a wrapper
that does the coerced-string representation:

                        class Wrap:
                            def __init__( self, value ):
                                self.value = value
                            def __str__( self ):
                                return "'%s'::bigint"%(self.value,)
                            __repr__ = __str__
                        value = Wrap(value)

Just doing that for the indexing/identifying values ATM.  pyPgSQL will
back up to using simple repr for the object (rather than raising an
error as it would if you were using a formatted string), but will
otherwise treat it as a regular value for quoting and the like, so no
other modifications to the code required.

By no means an elegant fix, but since your post (well, the resulting
thread) managed to solve my problem, figured I should at least tell
everyone thanks and how I worked around the problem.  You wouldn't want
this kind of hack down in the pyPgSQL level I would think, as it's
DB-version specific.  I suppose you could alter the __repr__ of the
PgInt8 class/type to always use the string or coerced form, but it seems
wrong to me.  I'm actually hesitant to include it in our own middleware
layer, but oh well, it does seem to be necessary for even somewhat
reasonable performance.

BTW, my case was a largish (88,000 record) table with a non-unique
bigint key, explain on update shows sequential search, while with
'int'::bigint goes to index search.  Using pyPgSQL as the interface to
7.3.4 and 7.3.3.

Enjoy,
Mike

Ivar Zarans wrote:

>On Fri, Dec 05, 2003 at 10:08:20AM +0000, Richard Huxton wrote:
>
>
...

>I am using PyPgSQL for PostgreSQL access and making update queries as this:
>
>
...

>It seems, that PyPgSQL query quoting is not aware of this performance
>problem (to which Cristopher referred) and final query, sent to server
>is correct SQL, but not correct, considering PostgreSQL bugs.
>
>
...

>Finally - what would be correct solution to this problem? Upgrading to
>7.5 CVS is not an option :) One possibility is not to use PyPgSQL
>variable substitution and create every query "by hand" - not very nice
>solution, since variable substitution and quoting is quite convenient.
>
>Second (and better) possibility is to ask PyPgSQL develeopers to take care
>of PostgreSQL oddities.
>
>Any other suggestions?
>
>

_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/




Re: Slow UPADTE, compared to INSERT

От
"Mike C. Fletcher"
Дата:
Ivar Zarans wrote:

>On Fri, Dec 05, 2003 at 01:23:43PM +0000, Richard Huxton wrote:
>
>
>>Better IMHO would be: "UPDATE table1 SET status = %s WHERE recid = %s::int8"
>>
>>
>
>Thanks for the hint!
>
>
Which makes the wrapper class need:
                            def __str__( self ):
                                return "%s::int8"%(self.value,)

Enjoy,
Mike

_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/




Re: Slow UPADTE, compared to INSERT

От
Greg Stark
Дата:
Ivar Zarans <iff@alcaron.ee> writes:

> > qry = "UPDATE table1 SET status = %s WHERE recid = '%s'"
> > cursor.execute(qry, status, recid)
>
> Yes, this helps. But then it sort of obsoletes PyPgSQL-s own quoting
> logic. I would prefer to take care of this all by myself or trust some
> underlying code to do this for me. And PyPgSQL is quite nice - it
> checks datatype and acts accordingly.

You should tell the PyPgSQL folk to use the new binary protocol for parameters
so that there are no quoting issues at all.

But if it's going to interpolate strings into the query then pyPgSQL really
ought to be doing '%s' as above even for numbers. This lets postgres decide
what the optimal datatype is based on what you're comparing it to. Skipping
the quotes will only cause headaches.

--
greg

Re: Slow UPADTE, compared to INSERT

От
Neil Conway
Дата:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> This is not a bug. It is just that people find it confusing when
> postgresql planner consider seemingly same type as different.

It certainly is a bug, or at least a deficiency: PostgreSQL planner
*could* use the index to process the query, but the planner doesn't
consider doing so. The fact that it isn't able to do the necessary
type coercion is the *cause* of the bug, not a defence for this
behavior.

> AFAIK, the fix in CVS is to make indexes operatable with seemingly
> compatible types. Which does not change the fact that postgresql can
> not upgrade data types on it's own.

I'm not sure what you mean by that. In any case, I just checked, and
it does seem Tom has fixed this in CVS:

template1=# create table abc (b int8);
CREATE TABLE
template1=# set enable_seqscan = false;
SET
template1=# create index abc_b_idx on abc (b);
CREATE INDEX
template1=# explain select * from abc where b = 4;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using abc_b_idx on abc  (cost=0.00..17.07 rows=5 width=8)
   Index Cond: (b = 4)
(2 rows)

Cool!

-Neil