Обсуждение: Unusually high IO for autovacuum worker
Hi everyone, We are experiencing a strange situation after upgrading our Postgres from 8.4 to 9.1 and our OS from Debian 6 to Ubuntu 12.04 LTS. We have an application where we are collecting timestamp-based data. Since a month of data goes over 30M rows (and growing each month), we decided to partition the data on an year-and-month basis. We basically have one master table and partition tables like master_y2013m01 and so on. Because of our application stack (Hibernate ORM) we use a before_insert trigger to insert rows into the right partition and in the master table and then an after_insert trigger to delete them from master table (we need the original insert to return the inserted row info). This means the master table becomes quickly populated with dead tuples and triggers the autovacuum frequently (we are using default settings for autovacuum). After upgrading our DB from 8.4 to 9.1 (and moving from Debian 6 to Ubuntu 12.04) we noticed a large increase in IO and traced it back to auto vacuum: iotop and collectd report bursts of 4-6M/s taking 90 seconds followed by a 30 seconds pause. That's a lot of data to chew on especially since the master table is only a couple of megabytes (indexes included). Can anyone please explain why this is happening and how we can get around it? Thank you, Vlad
2013/2/1 Vlad Bailescu <vlad@mojitosoftware.com>: > Hi everyone, > > We are experiencing a strange situation after upgrading our Postgres from > 8.4 to 9.1 and our OS from Debian 6 to Ubuntu 12.04 LTS. > > We have an application where we are collecting timestamp-based data. Since a > month of data goes over 30M rows (and growing each month), we decided to > partition the data on an year-and-month basis. We basically have one master > table and partition tables like master_y2013m01 and so on. > > Because of our application stack (Hibernate ORM) we use a before_insert > trigger to insert rows into the right partition and in the master table and > then an after_insert trigger to delete them from master table (we need the > original insert to return the inserted row info). This means the master > table becomes quickly populated with dead tuples and triggers the autovacuum > frequently (we are using default settings for autovacuum). > > After upgrading our DB from 8.4 to 9.1 (and moving from Debian 6 to Ubuntu > 12.04) we noticed a large increase in IO and traced it back to auto vacuum: > iotop and collectd report bursts of 4-6M/s taking 90 seconds followed by a > 30 seconds pause. That's a lot of data to chew on especially since the > master table is only a couple of megabytes (indexes included). we had similar problem - you can try to move stat file to ramdisk with setting stats_temp_directory with target on somewhere on ramdisk http://www.postgresql.org/message-id/CAFj8pRCRPv3i-9WwsSCa=my9M+J5B7K1i1c7TWi0pFLRyjkpWQ@mail.gmail.com Regards Pavel Stehule > > Can anyone please explain why this is happening and how we can get around > it? > > Thank you, > Vlad
On Fri, Feb 1, 2013 at 1:44 PM, Vlad Bailescu <vlad@mojitosoftware.com> wrote: > Hi everyone, > > We are experiencing a strange situation after upgrading our Postgres from > 8.4 to 9.1 and our OS from Debian 6 to Ubuntu 12.04 LTS. > > We have an application where we are collecting timestamp-based data. Since a > month of data goes over 30M rows (and growing each month), we decided to > partition the data on an year-and-month basis. We basically have one master > table and partition tables like master_y2013m01 and so on. > > Because of our application stack (Hibernate ORM) we use a before_insert > trigger to insert rows into the right partition and in the master table and > then an after_insert trigger to delete them from master table (we need the > original insert to return the inserted row info). This means the master > table becomes quickly populated with dead tuples and triggers the autovacuum > frequently (we are using default settings for autovacuum). > > After upgrading our DB from 8.4 to 9.1 (and moving from Debian 6 to Ubuntu > 12.04) we noticed a large increase in IO and traced it back to auto vacuum: > iotop and collectd report bursts of 4-6M/s taking 90 seconds followed by a > 30 seconds pause. That's a lot of data to chew on especially since the > master table is only a couple of megabytes (indexes included). > > Can anyone please explain why this is happening and how we can get around > it? Do you know for sure that its the master table that generating all the vacuum traffic ? What about the partition tables ? Do they get any updates/deletes ? It might be useful if you could turn autovacuum logging ON and see which tables are getting vacuumed and correlate that with the increased IO activity that you're seeing. Did you change any other parameters like checkpoint timeout/segments etc ? It might be worthwhile to log checkpoint activities as well to be doubly sure. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Fri, Feb 1, 2013 at 11:01 AM, Pavan Deolasee <pavan.deolasee@gmail.com>wrote: > > Do you know for sure that its the master table that generating all the > vacuum traffic ? What about the partition tables ? Do they get any > updates/deletes ? It might be useful if you could turn autovacuum > logging ON and see which tables are getting vacuumed and correlate > that with the increased IO activity that you're seeing. > > Did you change any other parameters like checkpoint timeout/segments > etc ? It might be worthwhile to log checkpoint activities as well to > be doubly sure. > > Thanks, > Pavan > I'm pretty sure the io is from the autovacuum on master table because it's last_autovacuum stats update almost every minute and iotop shows something like: Total DISK READ: 5.80 M/s | Total DISK WRITE: 115.85 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 7681 be/4 postgres 5.93 M/s 0.00 B/s 0.00 % 63.62 % postgres: autovacuum worker process fleet 15837 be/4 postgres 0.00 B/s 23.17 K/s 0.00 % 4.98 % postgres: fleet fleet 127.0.0.1(53537) idle 16683 be/4 postgres 0.00 B/s 23.17 K/s 0.00 % 2.68 % postgres: fleet fleet 127.0.0.1(53978) idle 18599 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 1.25 % postgres: fleet fleet 127.0.0.1(59529) idle 15608 be/4 postgres 0.00 B/s 23.17 K/s 0.00 % 0.04 % postgres: fleet fleet 127.0.0.1(53349) idle 10253 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet fleet 127.0.0.1(50639) idle 16524 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet fleet 127.0.0.1(53939) idle 10248 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet fleet 127.0.0.1(50634) idle 18606 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet fleet 127.0.0.1(59535) idle 14849 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet fleet 127.0.0.1(52832) idle 30308 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: wal writer process 21528 be/4 www-data 0.00 B/s 3.86 K/s 0.00 % 0.00 % nginx: worker process 30307 be/4 postgres 0.00 B/s 38.62 K/s 0.00 % 0.00 % postgres: writer process I just checked and our config only sets: max_connections = 210 superuser_reserved_connections = 10 wal_buffers = 8MB checkpoint_segments = 16 checkpoint_completion_target = 0.9 random_page_cost = 3.0 shared_buffers = 9GB effective_cache_size = 28GB work_mem = 88MB maintenance_work_mem = 1GB We're running on a server with 40GB RAM and a RAID 10 over 6 10k HDDs. Stats show only 3 tables get frequently autovacuumed Table Name Tuples inserted Tuples updated Tuples deleted Tuples HOT updated Live Tuples Dead Tuples Last vacuum Last autovacuum Last analyze Last autoanalyze Vacuum counter Autovacuum counter Analyze counter Autoanalyze counter Size vehicle_position 4689127 0 4689127 0 0 1985 2013-02-01 11:46:46.441227+02 2013-01-31 14:48:00.775864+02 2013-02-01 11:46:46.494234+02 0 3761 1 3754 4016 kB vehicle 2124 934640 0 924003 2124 390 2013-02-01 11:47:46.067695+02 2013-02-01 11:47:46.325444+02 0 1293 0 2038 2448 kB input_current 10032 1649206 0 1635902 10032 1728 2013-02-01 11:45:46.0681+02 2013-02-01 11:45:46.215048+02 0 442 0 1294 2336 kB
On Fri, Feb 1, 2013 at 3:24 PM, Vlad Bailescu <vlad@mojitosoftware.com>wrote: > > > Stats show only 3 tables get frequently autovacuumed > > Table Name Tuples inserted Tuples updated Tuples deleted Tuples HOT > updated Live Tuples Dead Tuples Last vacuum Last autovacuum Last analyze Last > autoanalyze Vacuum counter Autovacuum counter Analyze counter Autoanalyze > counter Size vehicle_position 4689127 0 4689127 0 0 1985 > 2013-02-01 11:46:46.441227+02 2013-01-31 14:48:00.775864+02 2013-02-01 > 11:46:46.494234+02 0 3761 1 3754 4016 kB vehicle 2124 934640 0 924003 > 2124 390 > 2013-02-01 11:47:46.067695+02 2013-02-01 11:47:46.325444+02 0 1293 0 2038 2448 > kB input_current 10032 1649206 0 1635902 10032 1728 > 2013-02-01 11:45:46.0681+02 2013-02-01 11:45:46.215048+02 0 442 0 1294 2336 > kB > So sure there are more than just one table involved here. Also, the last column in the output above shows that these are fairly large tables. In fact, if the last column is to be trusted, they are as large 37GB, 20GB and 12GB respectively. Note the kB at the end of those values. Also given the large number of inserts/deletes, I would doubt if the table can really remain that small. May be auto-vacuum is truncating most of it and thats why you see only a few MBs before it swells again by the time next vacuum cycles runs. Whats your rate of insertion/deletion in the first table ? You also seem to be updating the last two tables quite heavily, but most of them are HOT updates, so your tables should not bloat too much. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Fri, Feb 1, 2013 at 1:08 PM, Pavan Deolasee <pavan.deolasee@gmail.com>wrote: > > > On Fri, Feb 1, 2013 at 3:24 PM, Vlad Bailescu <vlad@mojitosoftware.com>wrote: > >> >> >> Stats show only 3 tables get frequently autovacuumed >> >> Table Name Tuples inserted Tuples updated Tuples deleted Tuples HOT >> updated Live Tuples Dead Tuples Last vacuum Last autovacuum Last analyze Last >> autoanalyze Vacuum counter Autovacuum counter Analyze counter Autoanalyze >> counter Size vehicle_position 4689127 0 4689127 0 0 1985 >> 2013-02-01 11:46:46.441227+02 2013-01-31 14:48:00.775864+02 2013-02-01 >> 11:46:46.494234+02 0 3761 1 3754 4016 kB vehicle 2124 934640 0 924003 >> 2124 390 >> 2013-02-01 11:47:46.067695+02 2013-02-01 11:47:46.325444+02 0 1293 0 2038 2448 >> kB input_current 10032 1649206 0 1635902 10032 1728 >> 2013-02-01 11:45:46.0681+02 2013-02-01 11:45:46.215048+02 0 442 0 1294 2336 >> kB >> > > So sure there are more than just one table involved here. Also, the last > column in the > output above shows that these are fairly large tables. In fact, if the > last column is to be > trusted, they are as large 37GB, 20GB and 12GB respectively. Note the kB > at the end of > those values. Also given the large number of inserts/deletes, I would > doubt if the table can > really remain that small. May be auto-vacuum is truncating most of it and > thats why you > see only a few MBs before it swells again by the time next vacuum cycles > runs. > > Whats your rate of insertion/deletion in the first table ? You also seem > to be updating the > last two tables quite heavily, but most of them are HOT updates, so your > tables should not > bloat too much. > > Thanks, > Pavan > > -- > Pavan Deolasee > http://www.linkedin.com/in/pavandeolasee > Pavan, it seems there's a small issue with copy-paste and column text-align. Table sizes are: 4136 kB 2448 kB 2336 kB The actual partition tables are very large (ie: vehicle_position_y2013_m01 is 6448 MB), but they don't seem to receive so much attention. The vehicle table gets updated very often because it has a foreign key to vehicle_position and that changes with every batch of data. We were able to reproduce this on a test VM with minimum workload to trigger autovacuum. Here's the relevant log portion: 2012-12-05 00:32:57 EET LOG: database system is ready to accept connections 2012-12-05 00:32:57 EET LOG: autovacuum launcher started 2012-12-05 00:36:38 EET LOG: automatic analyze of table "fleet.fleet.vehicle_position" system usage: CPU 0.00s/0.00u sec elapsed 0.03 sec 2012-12-05 00:37:39 EET LOG: automatic analyze of table "fleet.fleet.vehicle_position_default" system usage: CPU 0.00s/0.15u sec elapsed 0.99 sec 2012-12-05 00:44:23 EET LOG: automatic analyze of table "fleet.fleet.vehicle_position" system usage: CPU 4.46s/0.61u sec elapsed 465.09 sec 2012-12-05 00:44:38 EET LOG: automatic vacuum of table "fleet.fleet.vehicle_position": index scans: 1 pages: 8 removed, 0 remain tuples: 64 removed, 0 remain system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec 2012-12-05 00:44:38 EET LOG: automatic analyze of table "fleet.fleet.vehicle_position" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec 2012-12-05 00:51:40 EET LOG: automatic analyze of table "fleet.fleet.vehicle_position" system usage: CPU 4.34s/0.53u sec elapsed 422.20 sec 2012-12-05 00:52:38 EET LOG: automatic vacuum of table "fleet.fleet.vehicle_position": index scans: 1 pages: 10 removed, 0 remain tuples: 30 removed, 0 remain system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec 2012-12-05 00:52:38 EET LOG: automatic analyze of table "fleet.fleet.vehicle_position" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec During the analyze the IO was ~800-1000 K/s Vlad
On Fri, Feb 1, 2013 at 5:19 PM, Vlad Bailescu <vlad@mojitosoftware.com>wrote: > > > Pavan, it seems there's a small issue with copy-paste and column > text-align. Table sizes are: > > 4136 kB 2448 kB 2336 kB Ah OK. I see. > > 2012-12-05 00:44:23 EET LOG: automatic analyze of table > "fleet.fleet.vehicle_position" system usage: CPU 4.46s/0.61u sec elapsed > 465.09 sec > This is the interesting piece of information. So its the auto analyze thats causing all the IO activity. That explains why it was a read only IO that we noticed earlier. Whats happening here, and something that changed from 8.4 to 9.1, is that whenever the parent table is analyzed, the child tables are also automatically analyzed. I don't remember the rational for doing this change, but in your case the analyze on the parent table itself is quite useless because even though you inserting a large number of new tuples, you are also immediately deleting them. I don't want to comment on the design aspect of that, but you should be able to fix this problem by disabling auto-analyze on the parent table. Having said that, I don't see an easy way to just disable auto-analyze on a table. You can run ALTER TABLE foo SET (autovacuum_enabled = false), but that would also disable auto-vacuum, which you certainly don't want to do because the parent table would just keep growing. You can set autovacuum_analyze_threshold to an artificially high value to mitigate the problem and reduce the frequency of auto-analyze on the table or see if you can completely avoid insert/delete on the parent table. ALTER TABLE vehicle_position SET (autovacuum_analyze_threshold = 2000000000); Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Vlad Bailescu <vlad@mojitosoftware.com> wrote:=0A=0A> Because of our applic= ation stack (Hibernate ORM) we use a=0A> before_insert trigger to insert ro= ws into the right partition and=0A> in the master table and then an after_i= nsert trigger to delete=0A> them from master table (we need the original in= sert to return the=0A> inserted row info). This means the master table beco= mes quickly=0A> populated with dead tuples and triggers the autovacuum freq= uently=0A> (we are using default settings for autovacuum).=0A>=0A> After up= grading our DB from 8.4 to 9.1 (and moving from Debian 6=0A> to Ubuntu 12.0= 4) we noticed a large increase in IO and traced it=0A> back to auto vacuum:= iotop and collectd report bursts of 4-6M/s=0A> taking 90 seconds followed = by a 30 seconds pause. That's a lot of=0A> data to chew on especially since= the master table is only a=0A> couple of megabytes (indexes included).=0A>= =0A> Can anyone please explain why this is happening and how we can=0A> get= around it?=0A=0AIt sounds like it could be an issue which will be fixed in= the next=0Aminor release:=0A=0Ahttp://git.postgresql.org/gitweb/?p=3Dpostg= resql.git;a=3Dcommit;h=3D5454344b968d6a189219cfd49af609a3e7d6af33=0A=0AIf t= his is the issue, a periodic manual VACUUM of the master table=0Ashould hel= p for now.=A0 When 9.1.8 is released you might want to=0Aupgrade without a = lot of delay, and see if you are then OK without=0Athe manual vacuums.=0A= =0A-Kevin
On Fri, Feb 1, 2013 at 2:40 PM, Pavan Deolasee <pavan.deolasee@gmail.com>wrote: > > > On Fri, Feb 1, 2013 at 5:19 PM, Vlad Bailescu <vlad@mojitosoftware.com>wrote: > >> >> >> Pavan, it seems there's a small issue with copy-paste and column >> text-align. Table sizes are: >> >> 4136 kB 2448 kB 2336 kB > > > Ah OK. I see. > > >> >> 2012-12-05 00:44:23 EET LOG: automatic analyze of table >> "fleet.fleet.vehicle_position" system usage: CPU 4.46s/0.61u sec elapsed >> 465.09 sec >> > > This is the interesting piece of information. So its the auto analyze > thats causing all > the IO activity. That explains why it was a read only IO that we noticed > earlier. Whats > happening here, and something that changed from 8.4 to 9.1, is that > whenever the parent > table is analyzed, the child tables are also automatically analyzed. I > don't remember the > rational for doing this change, but in your case the analyze on the parent > table itself is > quite useless because even though you inserting a large number of new > tuples, you are > also immediately deleting them. I don't want to comment on the design > aspect of that, > but you should be able to fix this problem by disabling auto-analyze on > the parent table. > > Having said that, I don't see an easy way to just disable auto-analyze on > a table. You can > run ALTER TABLE foo SET (autovacuum_enabled = false), but that would also > disable > auto-vacuum, which you certainly don't want to do because the parent table > would just > keep growing. > > You can set autovacuum_analyze_threshold to an artificially high value > to mitigate the > problem and reduce the frequency of auto-analyze on the table or see if > you can completely > avoid insert/delete on the parent table. > > ALTER TABLE vehicle_position SET (autovacuum_analyze_threshold = > 2000000000); > > Thanks, > Pavan > Great insight Pavan, I see this change in ANALYZE is documented since 9.0: If the table being analyzed has one or more children, ANALYZE will gather > statistics twice: once on the rows of the parent table only, and a second > time on the rows of the parent table with all of its children. The > autovacuum daemon, however, will only consider inserts or updates on the > parent table when deciding whether to trigger an automatic analyze. If that > table is rarely inserted into or updated, the inheritance statistics will > not be up to date unless you run ANALYZE manually. We will try setting a higher threshold for autovacuum on the parent table. As for design, we could have gone this way or use a rule to redirect inserts from master to children. My understanding is that performance for rules is not so good (especially since we're doing mostly single INSERTs). Unfortunately, the normal trigger will not work for us at the moment. I saw your proposal on -hackers, I will keep an eye on it! Thanks you, Vlad