Обсуждение: Need to run CLUSTER to keep performance
Hello This is a question about something we have seen sometimes in the last months. It happens with tables with a large amount of updates/selects compared with the amount of inserts/deletes. The sizes of these tables are small and the amount of rows too. The 'problem' is that performance decrease during the day and the only thing that helps is to run CLUSTER on the table with problems. VACUUM ANALYZE does not help. Some information that can help to find out why this happens: - PostgreSQL version: 8.1.9 ------------------------------------------------------------------------------ scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts')); pg_size_pretty ---------------- 12 MB ------------------------------------------------------------------------------ scanorama=# SELECT count(*) FROM hosts ; count ------- 16402 ------------------------------------------------------------------------------ scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; Seq Scan on hosts (cost=0.00..2771.56 rows=66756 width=314) (actual time=0.008..2013.415 rows=16402 loops=1) Total runtime: 2048.486 ms ------------------------------------------------------------------------------ scanorama=# VACUUM ANALYZE ; VACUUM ------------------------------------------------------------------------------ scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; Seq Scan on hosts (cost=0.00..2718.57 rows=61357 width=314) (actual time=0.008..1676.283 rows=16402 loops=1) Total runtime: 1700.826 ms ------------------------------------------------------------------------------ scanorama=# CLUSTER hosts_pkey ON hosts ; CLUSTER ------------------------------------------------------------------------------ scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; Seq Scan on hosts (cost=0.00..680.02 rows=16402 width=314) (actual time=0.008..31.205 rows=16402 loops=1) Total runtime: 53.635 ms ------------------------------------------------------------------------------ scanorama=# SELECT * from pg_stat_all_tables WHERE relname LIKE 'hosts'; relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+----------- 105805 | public | hosts | 1996430 | 32360280252 | 2736391 | 3301856 | 948 | 1403325 | 737 The information from pg_stat_all_tables is from the last 20 days. ------------------------------------------------------------------------------ INFO: analyzing "public.hosts" INFO: "hosts": scanned 2536 of 2536 pages, containing 16410 live rows and 57042 dead rows; 16410 rows in sample, 16410 estimated total rows INFO: free space map contains 191299 pages in 786 relations DETAIL: A total of 174560 page slots are in use (including overhead). 174560 page slots are required to track all free space. Current limits are: 2000000 page slots, 4000 relations, using 12131 KB. ------------------------------------------------------------------------------ The tables with this 'problem' are not big, so CLUSTER finnish very fast and it does not have an impact in the access because of locking. But we wonder why this happens. Do you need more information? Thanks in advance. regards -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > This is a question about something we have seen sometimes in the last > months. It happens with tables with a large amount of updates/selects > compared with the amount of inserts/deletes. The sizes of these tables > are small and the amount of rows too. > > The 'problem' is that performance decrease during the day and the only > thing that helps is to run CLUSTER on the table with problems. VACUUM > ANALYZE does not help. > > Some information that can help to find out why this happens: > > - PostgreSQL version: 8.1.9 > > ------------------------------------------------------------------------------ > scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts')); > > pg_size_pretty > ---------------- > 12 MB > ------------------------------------------------------------------------------ > scanorama=# SELECT count(*) FROM hosts ; > > count > ------- > 16402 > ------------------------------------------------------------------------------ > scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; > > Seq Scan on hosts (cost=0.00..2771.56 rows=66756 width=314) (actual > time=0.008..2013.415 rows=16402 loops=1) > Total runtime: 2048.486 ms > ------------------------------------------------------------------------------ > scanorama=# VACUUM ANALYZE ; > VACUUM > ------------------------------------------------------------------------------ > scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; > > Seq Scan on hosts (cost=0.00..2718.57 rows=61357 width=314) (actual > time=0.008..1676.283 rows=16402 loops=1) > Total runtime: 1700.826 ms > ------------------------------------------------------------------------------ > scanorama=# CLUSTER hosts_pkey ON hosts ; > CLUSTER > ------------------------------------------------------------------------------ > scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; > > Seq Scan on hosts (cost=0.00..680.02 rows=16402 width=314) (actual > time=0.008..31.205 rows=16402 loops=1) > Total runtime: 53.635 ms > ------------------------------------------------------------------------------ > scanorama=# SELECT * from pg_stat_all_tables WHERE relname LIKE 'hosts'; > relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | > idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del > --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+----------- > 105805 | public | hosts | 1996430 | 32360280252 | 2736391 | > 3301856 | 948 | 1403325 | 737 > > The information from pg_stat_all_tables is from the last 20 days. > ------------------------------------------------------------------------------ > INFO: analyzing "public.hosts" > INFO: "hosts": scanned 2536 of 2536 pages, containing 16410 live rows > and 57042 dead rows; 16410 rows in sample, 16410 estimated total rows > INFO: free space map contains 191299 pages in 786 relations > DETAIL: A total of 174560 page slots are in use (including overhead). > 174560 page slots are required to track all free space. > Current limits are: 2000000 page slots, 4000 relations, using 12131 KB. > ------------------------------------------------------------------------------ > > The tables with this 'problem' are not big, so CLUSTER finnish very fast > and it does not have an impact in the access because of locking. But we > wonder why this happens. 2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have you increased shared_buffers from the default? Which operating system are you using? Shared memory access is known to be slower on Windows. On a small table like that you could run VACUUM every few minutes without much impact on performance. That should keep the table size in check. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Performance problems with heavily modified tables (UPDATE or DELETE) are usually caused by not vacuuming. There are two main modes the VACUUM can run in (plain or full) and the former works in a much more aggressive way (exclusive locking, etc). Try to run VACUUM FULL VERBOSE on the table and see if it helps. A way to fix this is usually a proper setting of pg_autovacuum daemon - it may work on the tables that are not modified heavily, but it does not work for the heavily modified ones. Do you have the autovacuum daemon enabled? What are the settings of it? Try to set it a little bit more aggressive (this can be done on a table level). The stats from pg_stat_all_tables are nice, but I guess the stats that matter are located in pg_class catalog, the most interesting beeing reltuples and relpages columns - run SELECT relname, relpages, reltuples WHERE relname LIKE 'hosts'; and observe the number of pages before and afrer the vacuum full (or cluster). I guess the number of pages increases quite fast and the autovacuum daemon is not able to reclaim that - and this is probably the cause why scanning 12 MB of data takes 2 sec, which is way too much - the table is acrually much bigger as it contains a lot of dead data). Tomas > Hello > > This is a question about something we have seen sometimes in the last > months. It happens with tables with a large amount of updates/selects > compared with the amount of inserts/deletes. The sizes of these tables > are small and the amount of rows too. > > The 'problem' is that performance decrease during the day and the only > thing that helps is to run CLUSTER on the table with problems. VACUUM > ANALYZE does not help. > > Some information that can help to find out why this happens: > > - PostgreSQL version: 8.1.9 > > ------------------------------------------------------------------------------ > scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts')); > > pg_size_pretty > ---------------- > 12 MB > ------------------------------------------------------------------------------ > scanorama=# SELECT count(*) FROM hosts ; > > count > ------- > 16402 > ------------------------------------------------------------------------------ > scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; > > Seq Scan on hosts (cost=0.00..2771.56 rows=66756 width=314) (actual > time=0.008..2013.415 rows=16402 loops=1) > Total runtime: 2048.486 ms > ------------------------------------------------------------------------------ > scanorama=# VACUUM ANALYZE ; > VACUUM > ------------------------------------------------------------------------------ > scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; > > Seq Scan on hosts (cost=0.00..2718.57 rows=61357 width=314) (actual > time=0.008..1676.283 rows=16402 loops=1) > Total runtime: 1700.826 ms > ------------------------------------------------------------------------------ > scanorama=# CLUSTER hosts_pkey ON hosts ; > CLUSTER > ------------------------------------------------------------------------------ > scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts; > > Seq Scan on hosts (cost=0.00..680.02 rows=16402 width=314) (actual > time=0.008..31.205 rows=16402 loops=1) > Total runtime: 53.635 ms > ------------------------------------------------------------------------------ > scanorama=# SELECT * from pg_stat_all_tables WHERE relname LIKE 'hosts'; > relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | > idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del > --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+----------- > 105805 | public | hosts | 1996430 | 32360280252 | 2736391 | > 3301856 | 948 | 1403325 | 737 > > The information from pg_stat_all_tables is from the last 20 days. > ------------------------------------------------------------------------------ > INFO: analyzing "public.hosts" > INFO: "hosts": scanned 2536 of 2536 pages, containing 16410 live rows > and 57042 dead rows; 16410 rows in sample, 16410 estimated total rows > INFO: free space map contains 191299 pages in 786 relations > DETAIL: A total of 174560 page slots are in use (including overhead). > 174560 page slots are required to track all free space. > Current limits are: 2000000 page slots, 4000 relations, using 12131 KB. > ------------------------------------------------------------------------------ > > The tables with this 'problem' are not big, so CLUSTER finnish very fast > and it does not have an impact in the access because of locking. But we > wonder why this happens. > > Do you need more information? > > Thanks in advance. > regards
Heikki Linnakangas wrote: > Rafael Martinez wrote: >> The tables with this 'problem' are not big, so CLUSTER finnish very fast >> and it does not have an impact in the access because of locking. But we >> wonder why this happens. > > 2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have > you increased shared_buffers from the default? Which operating system > are you using? Shared memory access is known to be slower on Windows. > This is a server with 8GB of ram, we are using 25% as shared_buffers. Linux RHELAS4 with a 2.6.9-55.0.9.ELsmp kernel / x86_64. > On a small table like that you could run VACUUM every few minutes > without much impact on performance. That should keep the table size in > check. > Ok, we run VACUUM ANALYZE only one time a day, every night. But we would espect the performance to get ok again after running vacuum, and it doesn't. Only CLUSTER helps. I can not see we need to change the max_fsm_pages parameter and pg_class and analyze give us this information today (not long ago a CLUSTER was executed): ------------------------------------------------------------------------------ scanorama=# VACUUM VERBOSE ANALYZE hosts; INFO: vacuuming "public.hosts" INFO: index "hosts_pkey" now contains 20230 row versions in 117 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "hosts": found 0 removable, 20230 nonremovable row versions in 651 pages DETAIL: 3790 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: vacuuming "pg_toast.pg_toast_376127" INFO: index "pg_toast_376127_index" now contains 131 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_376127": found 0 removable, 131 nonremovable row versions in 33 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.hosts" INFO: "hosts": scanned 651 of 651 pages, containing 16440 live rows and 3790 dead rows; 16440 rows in sample, 16440 estimated total rows VACUUM scanorama=# SELECT relname, relpages, reltuples from pg_class WHERE relname LIKE 'hosts'; relname | relpages | reltuples ---------+----------+----------- hosts | 651 | 20230 ------------------------------------------------------------------------------ Anymore ideas? regards, -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > Hello > > This is a question about something we have seen sometimes in the last > months. It happens with tables with a large amount of updates/selects > compared with the amount of inserts/deletes. The sizes of these tables > are small and the amount of rows too. > > The 'problem' is that performance decrease during the day and the only > thing that helps is to run CLUSTER on the table with problems. VACUUM > ANALYZE does not help. Probably because all the live tuples are clustered at the end of the table, and the initial pages are polluted with dead tuples. Try vacuuming the table much more often, say every few minutes. Your table is 2536 pages long, but it could probably be in the vicinity of 700 ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Rafael Martinez wrote: > Heikki Linnakangas wrote: >> On a small table like that you could run VACUUM every few minutes >> without much impact on performance. That should keep the table size in >> check. > > Ok, we run VACUUM ANALYZE only one time a day, every night. But we would > espect the performance to get ok again after running vacuum, and it > doesn't. Only CLUSTER helps. If the table is already bloated, a VACUUM won't usually shrink it. It only makes the space available for reuse, but a sequential scan still needs to go through a lot of pages. CLUSTER on the other hand repacks the tuples and gets rid of all the unused space on pages. You need to run CLUSTER or VACUUM FULL once to shrink the relation, but after that frequent-enough VACUUMs should keep the table size down. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera wrote: > Rafael Martinez wrote: >> The 'problem' is that performance decrease during the day and the only >> thing that helps is to run CLUSTER on the table with problems. VACUUM >> ANALYZE does not help. > > Probably because all the live tuples are clustered at the end of the > table, and the initial pages are polluted with dead tuples. Try > vacuuming the table much more often, say every few minutes. > > Your table is 2536 pages long, but it could probably be in the vicinity > of 700 ... > We run VACUUM ANALYZE every 10 minuttes during 2-3 days to see if it helped, but when it didn't we when back to the old configuration (1 time everyday) Yes, after a CLUSTER we are using 517 pages. But the table does not grow much, it is always around 12-20MB, it looks like vacuum works without problems. regards, -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
In response to Rafael Martinez <r.m.guerrero@usit.uio.no>: > Heikki Linnakangas wrote: > > Rafael Martinez wrote: > > >> The tables with this 'problem' are not big, so CLUSTER finnish very fast > >> and it does not have an impact in the access because of locking. But we > >> wonder why this happens. > > > > 2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have > > you increased shared_buffers from the default? Which operating system > > are you using? Shared memory access is known to be slower on Windows. > > > > This is a server with 8GB of ram, we are using 25% as shared_buffers. > Linux RHELAS4 with a 2.6.9-55.0.9.ELsmp kernel / x86_64. > > > On a small table like that you could run VACUUM every few minutes > > without much impact on performance. That should keep the table size in > > check. > > > > Ok, we run VACUUM ANALYZE only one time a day, every night. But we would > espect the performance to get ok again after running vacuum, and it > doesn't. Only CLUSTER helps. If you have a large value for max_fsm_pages, but only vacuum once a day, you could end up with considerable bloat on a small table, but not enough to exceed max_fsm_pages (thus you wouldn't see any warning/errors) I recommend either: a) autovaccum, with aggressive settings for that table b) a more aggressive schedule for that particular table, maybe a cron that vacuums that table every 5 minutes. You could also do a combination, i.e. enable autovacuum with conservative settings and set a cron to vacuum the table every 10 minutes. Vacuuming once a day is usually only enough if you have very minimal updates. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Heikki Linnakangas wrote: > > If the table is already bloated, a VACUUM won't usually shrink it. It > only makes the space available for reuse, but a sequential scan still > needs to go through a lot of pages. > > CLUSTER on the other hand repacks the tuples and gets rid of all the > unused space on pages. You need to run CLUSTER or VACUUM FULL once to > shrink the relation, but after that frequent-enough VACUUMs should keep > the table size down. > Ok, thanks for the advice. We will try this and will come back with more information. -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > Heikki Linnakangas wrote: >> On a small table like that you could run VACUUM every few minutes >> without much impact on performance. That should keep the table size in >> check. > Ok, we run VACUUM ANALYZE only one time a day, every night. There's your problem. Reading between the lines I gather that you think an update is "free" in the sense of not creating a need for vacuum. It's not --- it's exactly equivalent to an insert + a delete, and it leaves behind a dead row that needs to be vacuumed. If you do a lot of updates, you need to vacuum. regards, tom lane
r.m.guerrero@usit.uio.no (Rafael Martinez) writes: > Heikki Linnakangas wrote: >> On a small table like that you could run VACUUM every few minutes >> without much impact on performance. That should keep the table size in >> check. >> > > Ok, we run VACUUM ANALYZE only one time a day, every night. But we would > espect the performance to get ok again after running vacuum, and it > doesn't. Only CLUSTER helps. You have characterized the shape of the problem Right There. If you only VACUUM that table once a day, then it has a whole day to get cluttered with dead tuples, which increases its size to encompass 651 pages, and NOTHING ever allows it to shrink back to a small size. Plain VACUUM (or VACUUM ANALYZE) does not attempt to shrink table sizes. Only VACUUM FULL and CLUSTER do that. Here are some options to "parameterize" your choices: - If you vacuum the table often enough that only 10% of the table consists of dead tuples, then you can expect the table to perpetually have 10% of dead space. - If you vacuum the table seldom enough that 90% of the table may be expected to consist of dead tuples, then you can expect this table to consistently have 90% of its space be "dead." It sounds like this particular table needs to be vacuumed quite a bit more frequently than once a day. On our systems, we have certain tables where tuples get killed off so frequently that we find it worthwhile to vacuum those tables once every two to three minutes. If we didn't, we'd see application performance bog down until it forced us to CLUSTER or VACUUM FULL the table. -- "cbbrowne","@","acm.org" http://linuxfinances.info/info/linux.html "How much more helpful could I be than to provide you with the appropriate e-mail address? I could engrave it on a clue-by-four and deliver it to you in Chicago, I suppose." -- Seen on Slashdot...
Tom Lane wrote: > Rafael Martinez <r.m.guerrero@usit.uio.no> writes: >> Heikki Linnakangas wrote: >>> On a small table like that you could run VACUUM every few minutes >>> without much impact on performance. That should keep the table size in >>> check. > >> Ok, we run VACUUM ANALYZE only one time a day, every night. > > There's your problem. > > Reading between the lines I gather that you think an update is "free" > in the sense of not creating a need for vacuum. It's not --- it's > exactly equivalent to an insert + a delete, and it leaves behind a > dead row that needs to be vacuumed. If you do a lot of updates, you > need to vacuum. > Hello again We have more information about this 'problem'. Tom, we have many other tables which are much bigger and have larger amount of updates/deletes and are working very well with our actual vacuum configuration. We are aware of how important is to run vacuum jobs and we think we have a good understanding of how/why vacuum works. We think the problem we are seeing sometimes with these small tables is another thing. We increased the vacuum analyze jobs, as you all pointed, from one a day to four every hour (we did not run cluster at all since we started with this new configuration). We started with this after a fresh 'cluster' of the table. This has been in production since last week and the performance of this table only gets worst and worst. After 4 days with the new maintenance jobs, it took more than 4 sec to run a select on this table. After running a cluster we are down to around 50ms. again. I can not believe 4 vacuum jobs every hour is not enough for this table. If we see the statistics, it has only ca.67000 updates/day, ca.43 deletes/day and ca.48 inserts/day. This is nothing compare with many of the systems we are administrating. What we see in common between these tables (we have seen this a couple of times before) is: - Small table size. - Small amount of tuples in the table (almost constant). - Large amount of updates compared to inserts/deletes and compared to the amount of tuples in the table. You that know the interns of postgres :), can you think of anything that can be causing this behavior? Any more suggestions? do you need more data? Thanks in advance :) We are sending all data we had before the last cluster command and after it. ---------------------------------------------------------------------- **** BEFORE CLUSTER **** ---------------------------------------------------------------------- INFO: vacuuming "public.hosts" INFO: index "hosts_pkey" now contains 99933 row versions in 558 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "hosts": found 0 removable, 99933 nonremovable row versions in 3875 pages DETAIL: 83623 dead row versions cannot be removed yet. There were 12079 unused item pointers. 0 pages are entirely empty. CPU 0.02s/0.03u sec elapsed 0.06 sec. INFO: vacuuming "pg_toast.pg_toast_376272" INFO: index "pg_toast_376272_index" now contains 133 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_376272": found 0 removable, 133 nonremovable row versions in 65 pages DETAIL: 2 dead row versions cannot be removed yet. There were 127 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.hosts" INFO: "hosts": scanned 3875 of 3875 pages, containing 16310 live rows and 83623 dead rows; 16310 rows in sample, 16310 estimated total rows scanorama=# SELECT age(now(), pg_postmaster_start_time()); age ------------------------- 25 days 22:40:01.241036 (1 row) scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts')); pg_size_pretty ---------------- 30 MB (1 row) scanorama=# SELECT count(*) from hosts; count ------- 16311 (1 row) scanorama=# SELECT relname,relpages,reltuples,reltoastrelid,reltoastidxid from pg_class where relname = 'hosts'; relname | relpages | reltuples | reltoastrelid | reltoastidxid ---------+----------+-----------+---------------+--------------- hosts | 3875 | 100386 | 376276 | 0 (1 row) scanorama=# SELECT * from pg_stat_all_tables where schemaname = 'public' and relname = 'hosts'; relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+----------- 105805 | public | hosts | 2412159 | 39109243131 | 3244406 | 9870886 | 1208 | 1685525 | 1088 (1 row) scanorama=# EXPLAIN ANALYZE SELECT * from hosts; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on hosts (cost=0.00..4878.86 rows=100386 width=314) (actual time=0.025..4719.082 rows=16311 loops=1) Total runtime: 4742.754 ms (2 rows) scanorama=# CLUSTER hosts_pkey ON hosts ; CLUSTER ---------------------------------------------------------------------- **** AFTER CLUSTER **** ---------------------------------------------------------------------- scanorama=# VACUUM VERBOSE ANALYZE hosts; INFO: vacuuming "public.hosts" INFO: index "hosts_pkey" now contains 16321 row versions in 65 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "hosts": found 0 removable, 16321 nonremovable row versions in 514 pages DETAIL: 10 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_383759" INFO: index "pg_toast_383759_index" now contains 131 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_383759": found 0 removable, 131 nonremovable row versions in 33 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.hosts" INFO: "hosts": scanned 514 of 514 pages, containing 16311 live rows and 10 dead rows; 16311 rows in sample, 16311 estimated total rows VACUUM scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts')); pg_size_pretty ---------------- 4112 kB (1 row) scanorama=# SELECT count(*) from hosts; count ------- 16311 (1 row) scanorama=# SELECT relname,relpages,reltuples,reltoastrelid,reltoastidxid from pg_class where relname = 'hosts'; relname | relpages | reltuples | reltoastrelid | reltoastidxid ---------+----------+-----------+---------------+--------------- hosts | 514 | 16321 | 383763 | 0 (1 row) scanorama=# SELECT * from pg_stat_all_tables where schemaname = 'public' and relname = 'hosts'; relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+----------- 105805 | public | hosts | 2412669 | 39117480187 | 3244962 | 9887752 | 1208 | 1685857 | 1088 (1 row) scanorama=# EXPLAIN ANALYZE SELECT * from hosts; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on hosts (cost=0.00..678.53 rows=16353 width=314) (actual time=0.006..32.143 rows=16311 loops=1) Total runtime: 57.408 ms (2 rows) ---------------------------------------------------------------------- -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > > We have more information about this 'problem'. > Sending this just in case it can help .... Checking all the log files from these vacuum jobs we have been running, we found one that looks difference from the rest, specially on the amount of removed pages. We are sending also the output before and after the one we are talking about: ############################################### 2007-11-11_0245.log ############################################### COMMAND: /local/opt/pgsql-8.1/bin/psql -h /tmp/pg_sockets/dbpg-meridien -p 5432 scanorama -c 'VACUUM VERBOSE ANALYZE hosts' CODE: 0 OUTPUT: INFO: vacuuming "public.hosts" INFO: index "hosts_pkey" now contains 110886 row versions in 554 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.02s/0.00u sec elapsed 0.87 sec. INFO: "hosts": found 0 removable, 110886 nonremovable row versions in 3848 pages DETAIL: 94563 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.05s/0.03u sec elapsed 0.94 sec. INFO: vacuuming "pg_toast.pg_toast_376272" INFO: index "pg_toast_376272_index" now contains 260 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_376272": found 0 removable, 260 nonremovable row versions in 65 pages DETAIL: 129 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.02 sec. INFO: analyzing "public.hosts" INFO: "hosts": scanned 3848 of 3848 pages, containing 16323 live rows and 94563 dead rows; 16323 rows in sample, 16323 estimated total rows VACUUM ############################################### 2007-11-11_0301.log ############################################### COMMAND: /local/opt/pgsql-8.1/bin/psql -h /tmp/pg_sockets/dbpg-meridien -p 5432 scanorama -c 'VACUUM VERBOSE ANALYZE hosts' CODE: 0 OUTPUT: INFO: vacuuming "public.hosts" INFO: index "hosts_pkey" now contains 16782 row versions in 556 pages DETAIL: 94551 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.04s/0.09u sec elapsed 590.48 sec. INFO: "hosts": removed 94551 row versions in 3835 pages DETAIL: CPU 0.00s/0.03u sec elapsed 0.10 sec. INFO: "hosts": found 94551 removable, 16695 nonremovable row versions in 3865 pages DETAIL: 372 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.08s/0.16u sec elapsed 590.99 sec. INFO: vacuuming "pg_toast.pg_toast_376272" INFO: index "pg_toast_376272_index" now contains 131 row versions in 2 pages DETAIL: 129 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_376272": removed 129 row versions in 33 pages DETAIL: CPU 0.00s/0.00u sec elapsed 32.05 sec. INFO: "pg_toast_376272": found 129 removable, 131 nonremovable row versions in 65 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 51.96 sec. INFO: analyzing "public.hosts" INFO: "hosts": scanned 3875 of 3875 pages, containing 16323 live rows and 576 dead rows; 16323 rows in sample, 16323 estimated total rows VACUUM ############################################### 2007-11-11_0315.log ############################################### COMMAND: /local/opt/pgsql-8.1/bin/psql -h /tmp/pg_sockets/dbpg-meridien -p 5432 scanorama -c 'VACUUM VERBOSE ANALYZE hosts' CODE: 0 OUTPUT: INFO: vacuuming "public.hosts" INFO: index "hosts_pkey" now contains 17363 row versions in 556 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.01s/0.00u sec elapsed 1.39 sec. INFO: "hosts": found 0 removable, 17362 nonremovable row versions in 3875 pages DETAIL: 1039 dead row versions cannot be removed yet. There were 94074 unused item pointers. 0 pages are entirely empty. CPU 0.02s/0.02u sec elapsed 1.43 sec. INFO: vacuuming "pg_toast.pg_toast_376272" INFO: index "pg_toast_376272_index" now contains 131 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_376272": found 0 removable, 131 nonremovable row versions in 65 pages DETAIL: 0 dead row versions cannot be removed yet. There were 129 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.05 sec. INFO: analyzing "public.hosts" INFO: "hosts": scanned 3875 of 3875 pages, containing 16323 live rows and 1040 dead rows; 16323 rows in sample, 16323 estimated total rows VACUUM After this last job the amount of dead rows just continued growing until today. -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > DETAIL: 83623 dead row versions cannot be removed yet. Looks like you have a long-running transaction in the background, so VACUUM can't remove all dead tuples. I didn't see that in the vacuum verbose outputs you sent earlier. Is there any backends in "Idle in transaction" state, if you run ps? In 8.1, CLUSTER will remove those tuples anyway, but it's actually not correct. If the long-running transaction decides to do a select on hosts-table later on, it will see an empty table because of that. That's been fixed in 8.3, but it also means that CLUSTER might no longer help you on 8.3. VACUUM FULL is safe in that sense in 8.1 as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Nov 12, 2007 10:11 AM, Rafael Martinez <r.m.guerrero@usit.uio.no> wrote: > Sending this just in case it can help .... > > Checking all the log files from these vacuum jobs we have been running, > we found one that looks difference from the rest, specially on the > amount of removed pages. > > We are sending also the output before and after the one we are talking > about: > > ############################################### > 2007-11-11_0245.log > ############################################### > COMMAND: /local/opt/pgsql-8.1/bin/psql -h /tmp/pg_sockets/dbpg-meridien > -p 5432 scanorama -c 'VACUUM VERBOSE ANALYZE hosts' > CODE: 0 > > OUTPUT: > INFO: vacuuming "public.hosts" > INFO: index "hosts_pkey" now contains 110886 row versions in 554 pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.02s/0.00u sec elapsed 0.87 sec. > INFO: "hosts": found 0 removable, 110886 nonremovable row versions in > 3848 pages > DETAIL: 94563 dead row versions cannot be removed yet. > There were 0 unused item pointers. You see that right there? You've got 94k dead rows that cannot be removed. Then, later on, they can: > CPU 0.04s/0.09u sec elapsed 590.48 sec. > INFO: "hosts": removed 94551 row versions in 3835 pages > DETAIL: CPU 0.00s/0.03u sec elapsed 0.10 sec. > INFO: "hosts": found 94551 removable, 16695 nonremovable row versions > in 3865 pages So, between the first and second vacuum you had a long running transaction that finally ended and let you clean up the dead rows. > After this last job the amount of dead rows just continued growing until > today. I think you've got a long running transaction that's preventing you from recovering dead rows.
In response to Heikki Linnakangas <heikki@enterprisedb.com>: > Rafael Martinez wrote: > > DETAIL: 83623 dead row versions cannot be removed yet. > > Looks like you have a long-running transaction in the background, so > VACUUM can't remove all dead tuples. I didn't see that in the vacuum > verbose outputs you sent earlier. Is there any backends in "Idle in > transaction" state, if you run ps? > > In 8.1, CLUSTER will remove those tuples anyway, but it's actually not > correct. If the long-running transaction decides to do a select on > hosts-table later on, it will see an empty table because of that. That's > been fixed in 8.3, but it also means that CLUSTER might no longer help > you on 8.3. VACUUM FULL is safe in that sense in 8.1 as well. Considering how small the table is, you may want to just program the process holding the transaction open to do a vacuum full of that table when it's done with it's work. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
Scott Marlowe wrote: > On Nov 12, 2007 10:11 AM, Rafael Martinez <r.m.guerrero@usit.uio.no> wrote: > >> Sending this just in case it can help .... >> >> Checking all the log files from these vacuum jobs we have been running, >> we found one that looks difference from the rest, specially on the >> amount of removed pages. >> >> We are sending also the output before and after the one we are talking >> about: >> >> ############################################### >> 2007-11-11_0245.log >> ############################################### >> COMMAND: /local/opt/pgsql-8.1/bin/psql -h /tmp/pg_sockets/dbpg-meridien >> -p 5432 scanorama -c 'VACUUM VERBOSE ANALYZE hosts' >> CODE: 0 >> >> OUTPUT: >> INFO: vacuuming "public.hosts" >> INFO: index "hosts_pkey" now contains 110886 row versions in 554 pages >> DETAIL: 0 index pages have been deleted, 0 are currently reusable. >> CPU 0.02s/0.00u sec elapsed 0.87 sec. >> INFO: "hosts": found 0 removable, 110886 nonremovable row versions in >> 3848 pages >> DETAIL: 94563 dead row versions cannot be removed yet. >> There were 0 unused item pointers. > > You see that right there? You've got 94k dead rows that cannot be removed. > > Then, later on, they can: > >> CPU 0.04s/0.09u sec elapsed 590.48 sec. >> INFO: "hosts": removed 94551 row versions in 3835 pages >> DETAIL: CPU 0.00s/0.03u sec elapsed 0.10 sec. >> INFO: "hosts": found 94551 removable, 16695 nonremovable row versions >> in 3865 pages > > So, between the first and second vacuum you had a long running > transaction that finally ended and let you clean up the dead rows. No, before 8.3, CLUSTER throws away non-removable dead tuples. So the long running transaction might still be there. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Nov 12, 2007 11:01 AM, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > > Scott Marlowe wrote: > > So, between the first and second vacuum you had a long running > > transaction that finally ended and let you clean up the dead rows. > > No, before 8.3, CLUSTER throws away non-removable dead tuples. So the > long running transaction might still be there. Wow, good to know. Why would it have changed in 8.3? Was it considered broken behaviour?
Scott Marlowe wrote: > On Nov 12, 2007 11:01 AM, Heikki Linnakangas <heikki@enterprisedb.com> wrote: >> Scott Marlowe wrote: >>> So, between the first and second vacuum you had a long running >>> transaction that finally ended and let you clean up the dead rows. >> No, before 8.3, CLUSTER throws away non-removable dead tuples. So the >> long running transaction might still be there. > > Wow, good to know. Why would it have changed in 8.3? Was it > considered broken behaviour? I certainly considered it broken, though it was a known issue all along. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Rafael Martinez wrote: >> DETAIL: 83623 dead row versions cannot be removed yet. > > Looks like you have a long-running transaction in the background, so > VACUUM can't remove all dead tuples. I didn't see that in the vacuum > verbose outputs you sent earlier. Is there any backends in "Idle in > transaction" state, if you run ps? > I don't see any long transaction in progress (<IDLE> in transaction) and if we run the vacuum jobb manual just after checking this, it still cannot remove the dead rows. Any suggestions cause vacuum cannot remove these dead rows? > In 8.1, CLUSTER will remove those tuples anyway, but it's actually not > correct. With other words, .... we have to be very carefull to not run CLUSTER on a table been modified inside a transaction if we do not want to lose data? ... Does this mean that if we run a transaction which update/delete many rows, run cluster before the transaction is finnish, and then rollback the transaction after cluster has been executed, all dead rows updated/deleted by the transaction can not be rollbacked back because they are not there anymore? -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > Heikki Linnakangas wrote: > >> In 8.1, CLUSTER will remove those tuples anyway, but it's actually not >> correct. > > With other words, .... we have to be very carefull to not run CLUSTER on > a table been modified inside a transaction if we do not want to lose > data? ... > > Does this mean that if we run a transaction which update/delete many > rows, run cluster before the transaction is finnish, and then rollback > the transaction after cluster has been executed, all dead rows > updated/deleted by the transaction can not be rollbacked back because > they are not there anymore? > Stupid question, I could have checked this myself. CLUSTER will wait to be executed until the transaction is finish. I have just checked this. -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/