Обсуждение: 7.3.4 on Linux: UPDATE .. foo=foo+1 degrades massivly over time
Hello, postgresql 7.3.4 on Debian or the redhat packaged 7.3.4-8 on RHEL AS3 - same issue, so I somewhat cut out RH is playing things on me. Tested on two different PCs, too (say, one debian, one RHEL). While running UPDATE banner SET counterhalf=counterhalf+1 WHERE BannerID=50 several thousand times, the return times degrade (somewhat linear). The relation banner has currently *seven* rows and thus it doesnt matter (and i checked :>) if counterhalf is indexed, or not. A following VACCUM brings back return times to 'start' - but I cannot run VACUUM any other minute (?). And it exactly vaccums as many tuples as I updated.. sure thing: INFO: Removed 5000 tuples in 95 pages. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: Pages 95: Changed 1, Empty 0; Tup 7: Vac 5000, Keep 0, UnUsed 3. Total CPU 0.01s/0.03u sec elapsed 0.04 sec. What I cant explain is the query statistics output: 'In the beginning': DEBUG: StartTransactionCommand LOG: query: UPDATE banner SET counterhalf=counterhalf+1 WHERE BannerID=50 DEBUG: ProcessQuery DEBUG: CommitTransactionCommand LOG: QUERY STATISTICS ! system usage stats: ! 0.001110 elapsed 0.000000 user 0.000000 system sec ! [0.940000 user 0.080000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/0 [437/192] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! buffer usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 100.00 % ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written After 5000 updates: DEBUG: StartTransactionCommand LOG: query: UPDATE banner SET counterhalf=counterhalf+1 WHERE BannerID=50 DEBUG: ProcessQuery DEBUG: CommitTransactionCommand LOG: QUERY STATISTICS ! system usage stats: ! 0.002503 elapsed 0.000000 user 0.000000 system sec ! [8.400000 user 0.740000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/0 [711/192] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! buffer usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 100.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written I checked all 5000 entries, and (obviously?) never touches the filesystem. Where I stumble is that it keeps down with 'elapsed' time, but the user/sys times grow linear (which is corresponding to wallclock). The effect is the same (only in other ranges) with a default or "tuned" postgresql.conf and either on debian or the RHEL machine. I dont know where to go now. I was reading the whole changelog/history from 7.3.4 up to 7.4.2 and only found 'auto vacuum' - which might be a deal, yet it needs permanent statistics (really?) and thus would eat response time on the other hand then. And for the very record I tried this on a mysql4.0.18 where the return time is in general faster (dont care), but it also doesnt degrade over even 50.000 updates (do care here >:). Next thing is profiling postgres to see, where it loses the time, but maybe someone already can point me at something. Any pointer is appreciated.. link to an archived mail (search on archives is quite slow, too? :) ), pointer to some "hidden" doc I might have missed or a different SQL possibility to count banner-views in pgsql. Thanks for any consideration, -- Philipp Buehler, aka fips | <double-p> cvs -d /dev/myself commit -m "it's my life" dont/you/forget
On Wed, Apr 21, 2004 at 19:52:15 +0200, Philipp Buehler <pb-pgsql-g@mlsub.buehler.net> wrote: > > While running > UPDATE banner SET counterhalf=counterhalf+1 WHERE BannerID=50 > several thousand times, the return times degrade (somewhat linear). This is to be expected. Postgres uses MVCC and everytime you do an update a new row is created. > A following VACCUM brings back return times to 'start' - but I cannot > run VACUUM any other minute (?). And it exactly vaccums as many tuples > as I updated.. sure thing: Why not? You only have to vacuum this one table. Vacuuming it once a minute should be doable.
Philipp Buehler <pb-pgsql-g@mlsub.buehler.net> writes: > While running > UPDATE banner SET counterhalf=counterhalf+1 WHERE BannerID=50 > several thousand times, the return times degrade (somewhat linear). You need to vacuum occasionally ... > A following VACCUM brings back return times to 'start' - but I cannot > run VACUUM any other minute (?). Sure you can. regards, tom lane
I hope I understand your question... All the old tuples that were current before your updates are still in the heap. The executer has to do the equivelent of 'where tuple_visible_to_current_transaction' on every tuple in the heap. The more updates you do, the more tuples have to be visited on subsequent update runs. This is why vacuum exists, and it's the price we pay for the otherwise excellent transactional model in PG. HTH :-) Glen Parker > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Philipp Buehler > Sent: Wednesday, April 21, 2004 10:52 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] 7.3.4 on Linux: UPDATE .. foo=foo+1 degrades massivly > over time > > While running > UPDATE banner SET counterhalf=counterhalf+1 WHERE BannerID=50 > several thousand times, the return times degrade (somewhat linear). > The relation banner has currently *seven* rows and thus it doesnt matter > (and i checked :>) if counterhalf is indexed, or not. > > A following VACCUM brings back return times to 'start' - but I cannot > run VACUUM any other minute (?). And it exactly vaccums as many tuples > as I updated.. sure thing: > INFO: Removed 5000 tuples in 95 pages. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: Pages 95: Changed 1, Empty 0; Tup 7: Vac 5000, Keep 0, UnUsed 3. > Total CPU 0.01s/0.03u sec elapsed 0.04 sec. > > < big snip >
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, April 21, 2004 12:19 PM > To: Philipp Buehler > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] 7.3.4 on Linux: UPDATE .. foo=foo+1 > degrades massivly over time > > > Philipp Buehler <pb-pgsql-g@mlsub.buehler.net> writes: > > While running > > UPDATE banner SET counterhalf=counterhalf+1 WHERE > BannerID=50 several > > thousand times, the return times degrade (somewhat linear). > > You need to vacuum occasionally ... > > > A following VACCUM brings back return times to 'start' - > but I cannot > > run VACUUM any other minute (?). > > Sure you can. Look in contrib for pg_autovacuum Build that project Edit your Postgresql configuration and enable statistics Restart your database server After it settles down, start pg_autovacuum BTW, you can build it for Win32 if you disable the fork() option for logging purposes This should be part of the server itself (along with the large object cleanup). IMO-YMMV. See this article: http://www.bricolage.cc/docs/Bric/DBA.html And this one: http://www.argudo.org/postgresql/soft-tuning.php
On 21/04/2004, Tom Lane <tgl@sss.pgh.pa.us> wrote To Philipp Buehler: > > While running > > UPDATE banner SET counterhalf=counterhalf+1 WHERE BannerID=50 > > several thousand times, the return times degrade (somewhat linear). > > You need to vacuum occasionally ... > > > A following VACCUM brings back return times to 'start' - but I cannot > > run VACUUM any other minute (?). > > Sure you can. Yes, it's probably bearable. Just that I am sure now, it's a systematic thing I've to deal with and not some fubar. Thanks also for the other hints/URLs I got (pg_autovacuum in contrib, etc..) ciao -- Philipp Buehler, aka fips | <double-p> cvs -d /dev/myself commit -m "it's my life" dont/you/forget
> -----Original Message----- > From: Bruno Wolff III [mailto:bruno@wolff.to] > Sent: Wednesday, April 21, 2004 11:19 AM > To: Philipp Buehler > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] 7.3.4 on Linux: UPDATE .. foo=foo+1 > degrades massivly over time > > > On Wed, Apr 21, 2004 at 19:52:15 +0200, > Philipp Buehler <pb-pgsql-g@mlsub.buehler.net> wrote: > > > > While running > > UPDATE banner SET counterhalf=counterhalf+1 WHERE > BannerID=50 several > > thousand times, the return times degrade (somewhat linear). > > This is to be expected. Postgres uses MVCC and everytime you > do an update a new row is created. > > > A following VACCUM brings back return times to 'start' - > but I cannot > > run VACUUM any other minute (?). And it exactly vaccums as > many tuples > > as I updated.. sure thing: > > Why not? You only have to vacuum this one table. Vacuuming it > once a minute should be doable. Shouldn't the Database server be the entity that decides when vacuum is needed? Something is very, very strange about the whole PostgreSQL maintenance model. Oracle uses MVCC and I do not have to UPDATE STATISTICS constantly to keep the system from going into the toilet. Also, I should be able to do an update on every row in a database table without causing severe problems. Every other database system I know of does not have this problem. If I have a million row table with a column called is_current, and I do this: UPDATE tname SET is_current = 0; Horrible things happen. Just an idea: Why not recognize that more rows will be modified than the row setting can support and actually break the command into batches internally?
On Wed, Apr 21, 2004 at 14:55:51 -0700, Dann Corbit <DCorbit@connx.com> wrote: > > Shouldn't the Database server be the entity that decides when vacuum is > needed? At least in simple cases it should. That is what the auto vacuum project is trying to do. > Also, I should be able to do an update on every row in a database table > without causing severe problems. Every other database system I know of > does not have this problem. You can do this in postgres without causing too much trouble. The problem at the beginning of this thread was caused by updating a one row table thousands of times which can cause problems if you don't vacuum. > If I have a million row table with a column called is_current, and I do > this: > UPDATE tname SET is_current = 0; > Horrible things happen. Like what? At worst you will double the disk space used by this table. That isn't great, but it surely isn't horrible under normal circumstances. > Just an idea: > Why not recognize that more rows will be modified than the row setting > can support and actually break the command into batches internally? This doesn't make sense. There is no limit on the number of rows that can be modified at once.
Dann Corbit wrote: >>>A following VACCUM brings back return times to 'start' - >>> >>> >>but I cannot >> >> >>>run VACUUM any other minute (?). And it exactly vaccums as >>> >>> >>many tuples >> >> >>>as I updated.. sure thing: >>> >>> >>Why not? You only have to vacuum this one table. Vacuuming it >>once a minute should be doable. >> >> > >Shouldn't the Database server be the entity that decides when vacuum is >needed? > How is the database supposed to know when you want to purge records? Once a vacuum has been run, the table can not be rolled back or time traveled. >Something is very, very strange about the whole PostgreSQL maintenance >model. > Giving the administrator full control over database management is a good thing. If you want to write a cron job, to purge records automaticaly, thats your prerogative. Not every one needs to, nor want's to constantly purge records. Most of my databases collect information and changing information in them would be taboo. Since records are not updated or deleted their is no reason to vacuum the collection tables, and they collect between 400 K to 40 M records per period. >Oracle uses MVCC and I do not have to UPDATE STATISTICS constantly to >keep the system from going into the toilet. > Does Oracle purge records automaticaly? If so how do you configure it, and what are the default parameters? >Also, I should be able to do an update on every row in a database table >without causing severe problems. Every other database system I know of >does not have this problem. > >If I have a million row table with a column called is_current, and I do >this: >UPDATE tname SET is_current = 0; >Horrible things happen. > >Just an idea: >Why not recognize that more rows will be modified than the row setting >can support and actually break the command into batches internally? > It sounds like you have significant hardware limitations. I have a database I use for traffic analysys, that has over 40,000,000 records, I have done some complicated queries with multiple subselects and joins. The complicated queries take a long time to complete, but they work. I have also done updates that affected at least 5% of the records, then vacuumed the table shortly there after. The bigger the table the more "scatch pad" disk space, and memory you need.
On Wed, 21 Apr 2004, Dann Corbit wrote: > Shouldn't the Database server be the entity that decides when vacuum is > needed? when the autovacuum daemon is finished and integrated into the backend, it will. til then, you, the administrator decide when to run it. Note that the autovacuum daemon is quite usable as it right now, it's just not integrated. > Something is very, very strange about the whole PostgreSQL maintenance > model. Yes, it is. It's quite different from other databases in that it lets you decide if vacuum should happen now or later when there's less load. this isn't bad or good, just different. > Oracle uses MVCC and I do not have to UPDATE STATISTICS constantly to > keep the system from going into the toilet. That's because Oracle's MVCC works in a completely different way. Oracle uses roll back segments to maintain it's MVCC while postgresql does it directly in the data store. That means Oracle is limited to transactions that can fit in the roll back segment. If you want a bigger transaction, you have to diddle the size of said rollback segment. no such limit exists in PostgreSQL except for the size of your data storage. > Also, I should be able to do an update on every row in a database table > without causing severe problems. Every other database system I know of > does not have this problem. PostgreSQL doesn't either. As long as you vacuum up after yourself. It's a trade off. Neither better nor worse really, than the limitations of Oracles Roll back segment. Each database has limits / quirks due to the design choice. PostgreSQL's limit is that cleaning up after updates/deletes is handled by vacuuming. > If I have a million row table with a column called is_current, and I do > this: > UPDATE tname SET is_current = 0; > Horrible things happen. OK, imagine you're in Oracle and your rollback segment isn't big enough to hold all this. What happens? The whole operation rolls back. That's not a good thing either. And Oracle DBAs know that they need a big enough roll back segment to do such things. On PostgreSQL, instead of worrying about running out of room in your rollback segment, you have to worry about running out of room on your storage subsystem. and you have to vacuum. Different, not worse, not better, different. > Just an idea: > Why not recognize that more rows will be modified than the row setting > can support and actually break the command into batches internally? Huh? How would that possibly help? You'd still have to vacuum away the dead tuples at the end, whether you did them all at once or one at a time. And since we don't use rollback segments, you don't have to worry about your transaction being too big for it, just for your data store.
On 22/04/2004, Guy Fraser <guy@incentre.net> wrote To pgsql-general@postgresql.org: > >Shouldn't the Database server be the entity that decides when vacuum is > >needed? > > How is the database supposed to know when you want to purge records? > Once a vacuum has been run, the table can not be rolled back or time > traveled. Hmm, if the UPDATE is in a transaction block. After this block is committed, the deleted tuple could be purged if there is a flag. Like, either in the schema 'purge-on-commit', or as an option like 'UPDATE PURGE'? Just an idea.. (Brainfart?) :) ciao -- Philipp Buehler, aka fips | <double-p> cvs -d /dev/myself commit -m "it's my life" dont/you/forget
Philipp Buehler wrote: >On 22/04/2004, Guy Fraser <guy@incentre.net> wrote To pgsql-general@postgresql.org: > > >>>Shouldn't the Database server be the entity that decides when vacuum is >>>needed? >>> >>> >>How is the database supposed to know when you want to purge records? >>Once a vacuum has been run, the table can not be rolled back or time >>traveled. >> >> > >Hmm, if the UPDATE is in a transaction block. After this block is >committed, the deleted tuple could be purged if there is a flag. Like, >either in the schema 'purge-on-commit', or as an option like 'UPDATE PURGE'? > >Just an idea.. (Brainfart?) :) > >ciao > > What if your not the only person accessing the database and someone else has an active transaction that was initiated before your transaction was committed? If you delete the 'stale' data, then you would have to abort their transaction or their transaction would have data with mixed results from before your update and after your update. When to remove 'stale' data is not a simple thing to determine in an active database. In order to preserve transactional integrity, all transactions that are being handled during the transaction that modifies the data, must be completed before the data can be considered 'stale' and once the 'stale' data is purged roll backs can not be permitted for any transaction before the purge. Eventually the automatic purging of 'stale' data will be supported, but hopefully it will be configurable to allow 'time travel' when required, and allow for a reasonable time to be able to roll back transactions. I am not an RDBMS developer and don't claim to know how the inner workings work, but I have been using RDBMS application for many years and have worked with a number of different products. I prefer PostGreSQL over all others, but I do use others for some projects where their support or special features out way those of PostGreSQL.
Alvaro Herrera wrote: >On Mon, Apr 26, 2004 at 12:53:09PM -0600, Guy Fraser wrote: > > > >>Eventually the automatic purging of 'stale' data will be supported, >>but hopefully it will be configurable to allow 'time travel' when >>required, and allow for a reasonable time to be able to roll back >>transactions. >> >> > >Well, you are saying two different things here: to garbage-collect >automatically the database (rather than by manual VACUUMs), and to be >able to UNDO transactions, effectively going back in time. > >The former is likely to be supported in some not-too-distant future, >maybe hopefully the next release; the latter is not even planned, and in >the past it has been disregarded as too costly. Not implementation >time cost, mind you, but runtime cost. > > I realize that one you vacuum you can no longer time travel to before the vacuum. Although I never tried to use it, I thoughttime travel was a feature in PostGreSQL. My understanding of the time travel feature was to allow a query to be processedwith the data set as it was at a previous time. Since I did not have need for that feature fro any of the projects I have been involved in, I did not check to see how it worked, or followed it's development or demise as it maybe. Thank you for the update, I will not use time travel in further explanations of transactional integrity.:-)
On Mon, Apr 26, 2004 at 12:53:09PM -0600, Guy Fraser wrote: > Eventually the automatic purging of 'stale' data will be supported, > but hopefully it will be configurable to allow 'time travel' when > required, and allow for a reasonable time to be able to roll back > transactions. Well, you are saying two different things here: to garbage-collect automatically the database (rather than by manual VACUUMs), and to be able to UNDO transactions, effectively going back in time. The former is likely to be supported in some not-too-distant future, maybe hopefully the next release; the latter is not even planned, and in the past it has been disregarded as too costly. Not implementation time cost, mind you, but runtime cost. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Linux transformó mi computadora, de una `máquina para hacer cosas', en un aparato realmente entretenido, sobre el cual cada día aprendo algo nuevo" (Jaime Salinas)
On 26/04/2004, Guy Fraser <guy@incentre.net> wrote To Philipp Buehler: > What if your not the only person accessing the database and someone else > has an active transaction that was initiated before your transaction was > committed? There are row locks for a reason, too? OTOH that's somewhat like handling a filesystem. if you have a refcount=x. Just decrease the refcount for completed transaction and after it this count drops to 0 the tuple can be purged. > supported, but hopefully it will be configurable to allow 'time travel' > when required, and allow for a reasonable time to be able to roll back > transactions. Hm well. a comitted transaction is one, right? I dont get your time travel here now.. hmm. Ciao -- Philipp Buehler, aka fips | <double-p> cvs -d /dev/myself commit -m "it's my life" dont/you/forget
On Mon, Apr 26, 2004 at 01:58:57PM -0600, Guy Fraser wrote: > I realize that one you vacuum you can no longer time travel to before the > vacuum. Although I never tried to use it, I thought time travel was a > feature in PostGreSQL. I've heard that the Berkeley code had the time travel functionality, but apparently it was removed in very early PostgreSQL days, or maybe even before that. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "I call it GNU/Linux. Except the GNU/ is silent." (Ben Reiter)