Обсуждение: slow update on 1M rows (worse with indexes)
Hello, I experience significant performance issues with postgresql and updates. I have a table which contains ~1M rows. Layout: TOTO=# \d versions_9d; Table «public.versions_9d» Colonne | Type | Modificateurs ------------+------------------------+--------------- hash | character(32) | date | integer | default 0 diff | integer | default 0 flag | integer | default 0 size | bigint | default 0 zip_size | bigint | default 0 jds | integer | default 0 scanned | integer | default 0 dead | integer | default 0 Test case: Create a new DB and load a dump of the above database with 976009 rows, then i perform updates on the whole table. I recorded the time taken for each full update and the amount of extra disk space used. Each consecutive update of the table is slower than the previous beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=2" UPDATE 976009 real 0m41.542s beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=3" UPDATE 976009 real 0m45.140s (+480M) beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=4" UPDATE 976009 real 1m10.554s (+240M) beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=5" UPDATE 976009 real 1m24.065s (+127M) beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=6" UPDATE 976009 real 1m17.758s (+288M) beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=7" UPDATE 976009 real 1m26.777s (+288M) beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=8" UPDATE 976009 real 1m39.151s (+289M) Then i tried adding an index to the table on the column date (int) that stores unix timestamps. TOTO=# CREATE INDEX versions_index ON versions_9d (date); (-60M) disk space goes down on index creation beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=9" UPDATE 976009 real 3m8.219s (+328M) beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=8" UPDATE 976009 real 6m24.716s (+326M) beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=10" UPDATE 976009 real 8m25.274s (+321M) As a sanity check, i loaded mysql5 and tried the same database and updates. With mysql, the update always lasts ~8s. The conclusions I have come to is that update==insert+delete which seems very heavy when index are present (and heavy disk wise on big tables). Is there a switch i can flip to optimise this? Thanks in advance, Gabriel Biberian
On Thu, Feb 22, 2007 at 07:11:42PM +0100, Gabriel Biberian wrote: > Create a new DB and load a dump of the above database with 976009 rows, > then i perform updates on the whole table. I recorded the time taken > for each full update and the amount of extra disk space used. Each > consecutive update of the table is slower than the previous > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=2" > UPDATE 976009 > real 0m41.542s You're creating a huge amount of dead rows by this kind of procedure. Try a VACUUM in-between, or enable autovacuum. (Adjusting your WAL and checkpointing settings might help too.) Apart from that, do you really have a scenario that requires updating _all_ rows in your table regularly? /* Steinar */ -- Homepage: http://www.sesse.net/
n i tried adding an index to the table on the column date (int) that > stores unix timestamps. > TOTO=# CREATE INDEX versions_index ON versions_9d (date); > (-60M) disk space goes down on index creation > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=9" > UPDATE 976009 > real 3m8.219s (+328M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=8" > UPDATE 976009 > real 6m24.716s (+326M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=10" > UPDATE 976009 > real 8m25.274s (+321M) > > As a sanity check, i loaded mysql5 and tried the same database and > updates. With mysql, the update always lasts ~8s. Yes but with mysql did you use myisam or innodb? > The conclusions I have come to is that update==insert+delete which seems > very heavy when index are present (and heavy disk wise on big tables). > Is there a switch i can flip to optimise this? > > Thanks in advance, > Gabriel Biberian > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
how about saying: lock table versions_9d in EXCLUSIVE mode; UPDATE versions_9d SET flag=2; commit; Ismo On Thu, 22 Feb 2007, Gabriel Biberian wrote: > Hello, > > I experience significant performance issues with postgresql and updates. > I have a table which contains ~1M rows. > Layout: > TOTO=# \d versions_9d; > Table «public.versions_9d» > Colonne | Type | Modificateurs > ------------+------------------------+--------------- > hash | character(32) | > date | integer | default 0 > diff | integer | default 0 > flag | integer | default 0 > size | bigint | default 0 > zip_size | bigint | default 0 > jds | integer | default 0 > scanned | integer | default 0 > dead | integer | default 0 > > Test case: > Create a new DB and load a dump of the above database with 976009 rows, then i > perform updates on the whole table. I recorded the time taken for each full > update and the amount of extra disk space used. Each consecutive update of > the table is slower than the previous > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=2" > UPDATE 976009 > real 0m41.542s > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=3" > UPDATE 976009 > real 0m45.140s (+480M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=4" > UPDATE 976009 > real 1m10.554s (+240M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=5" > UPDATE 976009 > real 1m24.065s (+127M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=6" > UPDATE 976009 > real 1m17.758s (+288M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=7" > UPDATE 976009 > real 1m26.777s (+288M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=8" > UPDATE 976009 > real 1m39.151s (+289M) > > Then i tried adding an index to the table on the column date (int) that stores > unix timestamps. > TOTO=# CREATE INDEX versions_index ON versions_9d (date); > (-60M) disk space goes down on index creation > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=9" > UPDATE 976009 > real 3m8.219s (+328M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=8" > UPDATE 976009 > real 6m24.716s (+326M) > beebox@evobrik01:~$ time psql TOTO -c "UPDATE versions_9d SET flag=10" > UPDATE 976009 > real 8m25.274s (+321M) > > As a sanity check, i loaded mysql5 and tried the same database and updates. > With mysql, the update always lasts ~8s. > The conclusions I have come to is that update==insert+delete which seems very > heavy when index are present (and heavy disk wise on big tables). Is there a > switch i can flip to optimise this? > > Thanks in advance, > Gabriel Biberian > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate >