Обсуждение: Server vacuuming the same table again and again
Hi!
Best regards,
Dmitriy Shalashov
Half a day ago one of our production PG servers (arguably busiest one) become very slow; I went to investigate the issue and found that it runs simultaneously '(auto)VACUUM ANALYZE recommendations' - largest table on that server - and checkpoint, giving a 100% disk load, that resulted with queue of queries which only made things worse of course.
For a while I tried to set different ionice settings to wal writer and checkpointer processes (-c 2 -n [5-7]) for no visible effect. Then I cancelled autovacuum and it seems to help.
When things settled up and day was reaching end I started VACUUM ANALYZE of this table by hand and continued observations.
Vacuum ended in about 2 hours and half. But soon I noticed that server started another autovacuum of the same table...
Problems returned and resolved after it finished (not 100% sure it was the reason though).
In the morning autovacuum was back. And then it finished and I gone to work. And now I'm here and there is autovacuum again %)
And load too. But I had to say, sometimes there is autovacuum and no load. I'm not really sure autovacuum is the culprit, but there is correlation and it behaves strange anyway.
In the app code nothing changed I believe.
Any recommendations where to dig further?
PG version: 9.2.8
Server hardware: E5-2690 x 2, 96GB RAM, 146GB 15k SAS x 8, HP P420i 2G RAID controller, raid 1 for system and raid 50 for DB.
Perfomance settings changed:
shared_buffers = 24GB
temp_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 1GB
effective_cache_size = 48GB
effective_io_concurrency = 6 (I just realised I have to set it to 4, right?)
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 10min
checkpoint_completion_target = 0.8
checkpoint_warning = 3600s
Plus I set vm.dirty_background_bytes to 134217728 and vm.dirty_bytes to 1073741824.
Also I believe now that raid 1 for system might be a mistake. Maybe give it for WAL?
Dmitriy Shalashov
Hi Dmitry, On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов <skaurus@gmail.com> wrote: > cancelled autovacuum and it seems to help. > In the morning autovacuum was back. And then it finished and I gone to work. Actually, thise two things are tightly bound and there is no chance to avoid vacuum, you can only postpone it, this kind of work eventually supposed to be done. What you really need to do as a first thing - configure your autovacuum aggressively enough and then mayde ionice autovacuum instead of mission critical ckeckpointer or bgwriter. Which exact values have you in the following settings: autovacuum_analyze_scale_factor autovacuum_analyze_threshold autovacuum_freeze_max_age autovacuum_max_workers autovacuum_naptime autovacuum_vacuum_cost_delay autovacuum_vacuum_cost_limit autovacuum_vacuum_scale_factor autovacuum_vacuum_threshold log_autovacuum_min_duration ? Best regards, Ilya > > Best regards, > Dmitriy Shalashov -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
Hi Ilya!
> Actually, thise two things are tightly bound and there is no chance to avoid vacuum, you can only postpone it, this kind of work eventually supposed to be done.
I understand that autovacuum has to be done, but not right after previous autovacuum? And then again and again.
And after cancelling that first autovacuum I started another one by hand; from there no autovacuum was cancelled.
> ionice autovacuum instead of mission critical ckeckpointer or bgwriter
Yeah, that was desperate. I restarted server when I had a chance - to drop my ionice settings back to defaults.
> Which exact values have you in the following settings:
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50
autovacuum_freeze_max_age = 200000000
autovacuum_max_workers = 3
autovacuum_naptime = 60
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = -1
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 50
log_autovacuum_min_duration = 0
All defaults except last one I believe.
Minwhile I noticed in the night logs:
checkpoints are occurring too frequently (138 seconds apart)
Consider increasing the configuration parameter "checkpoint_segments".
Increased checkpoint_segments to 256 and reloaded config.
Dmitriy Shalashov
2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky <ilya.kosmodemiansky@postgresql-consulting.com>:
Hi Dmitry,
On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов <skaurus@gmail.com> wrote:
> cancelled autovacuum and it seems to help.> In the morning autovacuum was back. And then it finished and I gone to work.Actually, thise two things are tightly bound and there is no chance to
avoid vacuum, you can only postpone it, this kind of work eventually
supposed to be done.
What you really need to do as a first thing - configure your
autovacuum aggressively enough and then mayde ionice autovacuum
instead of mission critical ckeckpointer or bgwriter.
Which exact values have you in the following settings:
autovacuum_analyze_scale_factor
autovacuum_analyze_threshold
autovacuum_freeze_max_age
autovacuum_max_workers
autovacuum_naptime
autovacuum_vacuum_cost_delay
autovacuum_vacuum_cost_limit
autovacuum_vacuum_scale_factor
autovacuum_vacuum_threshold
log_autovacuum_min_duration
?
Best regards, Ilya
>
> Best regards,
> Dmitriy Shalashov
--
Ilya Kosmodemiansky,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
And right now we have a new kind of problem.
Previously during load disk was 100% busy; now we have around 100 active state queries, 100% loaded proc, but disk is virtually idle... Normally we have under 10 active queries.
Any hints on that?
Dmitriy Shalashov
2014-04-25 12:22 GMT+04:00 Дмитрий Шалашов <skaurus@gmail.com>:
Hi Ilya!> Actually, thise two things are tightly bound and there is no chance to avoid vacuum, you can only postpone it, this kind of work eventually supposed to be done.I understand that autovacuum has to be done, but not right after previous autovacuum? And then again and again.And after cancelling that first autovacuum I started another one by hand; from there no autovacuum was cancelled.> ionice autovacuum instead of mission critical ckeckpointer or bgwriterYeah, that was desperate. I restarted server when I had a chance - to drop my ionice settings back to defaults.> Which exact values have you in the following settings:autovacuum_analyze_scale_factor = 0.1autovacuum_analyze_threshold = 50autovacuum_freeze_max_age = 200000000autovacuum_max_workers = 3autovacuum_naptime = 60autovacuum_vacuum_cost_delay = 20autovacuum_vacuum_cost_limit = -1autovacuum_vacuum_scale_factor = 0.2autovacuum_vacuum_threshold = 50log_autovacuum_min_duration = 0All defaults except last one I believe.Minwhile I noticed in the night logs:checkpoints are occurring too frequently (138 seconds apart)Consider increasing the configuration parameter "checkpoint_segments".Increased checkpoint_segments to 256 and reloaded config.Best regards,
Dmitriy Shalashov2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky <ilya.kosmodemiansky@postgresql-consulting.com>:Hi Dmitry,
On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов <skaurus@gmail.com> wrote:
> cancelled autovacuum and it seems to help.> In the morning autovacuum was back. And then it finished and I gone to work.Actually, thise two things are tightly bound and there is no chance to
avoid vacuum, you can only postpone it, this kind of work eventually
supposed to be done.
What you really need to do as a first thing - configure your
autovacuum aggressively enough and then mayde ionice autovacuum
instead of mission critical ckeckpointer or bgwriter.
Which exact values have you in the following settings:
autovacuum_analyze_scale_factor
autovacuum_analyze_threshold
autovacuum_freeze_max_age
autovacuum_max_workers
autovacuum_naptime
autovacuum_vacuum_cost_delay
autovacuum_vacuum_cost_limit
autovacuum_vacuum_scale_factor
autovacuum_vacuum_threshold
log_autovacuum_min_duration
?
Best regards, Ilya
>
> Best regards,
> Dmitriy Shalashov
--
Ilya Kosmodemiansky,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
Dmitry, How is you filesystem under database exactly mount? (mount -l) And just in case, while increasing checkpoint_segments, better to increase checkpoint_timeout, otherwise all checkpoints will be still frequent because segment threshold will be never reached. You could monitor your pg_stat_bgwriter to understand which type of checkpoint happens more frequent. On Fri, Apr 25, 2014 at 10:22 AM, Дмитрий Шалашов <skaurus@gmail.com> wrote: > I understand that autovacuum has to be done, but not right after previous > autovacuum? And then again and again. That is exactly what happen: your autovacuum is not aggresive enough and that is why it runs constantly instead of doing it s job by small portions. you should try something like this: autovacuum | on autovacuum_analyze_scale_factor | 0.05 autovacuum_analyze_threshold | 5 autovacuum_freeze_max_age | 200000000 autovacuum_max_workers | 10 # set 10 for example and then you could see - if they all working constantly, maybe you need more. or less if not. autovacuum_multixact_freeze_max_age | 400000000 autovacuum_naptime | 1 autovacuum_vacuum_cost_delay | 10 autovacuum_vacuum_cost_limit | -1 autovacuum_vacuum_scale_factor | 0.01 autovacuum_vacuum_threshold | 10 log_autovacuum_min_duration | -1 Best regards, Ilya > Best regards, > Dmitriy Shalashov > > > 2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky > <ilya.kosmodemiansky@postgresql-consulting.com>: > >> Hi Dmitry, >> >> On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов <skaurus@gmail.com> >> wrote: >> > cancelled autovacuum and it seems to help. >> >> > In the morning autovacuum was back. And then it finished and I gone to >> > work. >> >> Actually, thise two things are tightly bound and there is no chance to >> avoid vacuum, you can only postpone it, this kind of work eventually >> supposed to be done. >> >> What you really need to do as a first thing - configure your >> autovacuum aggressively enough and then mayde ionice autovacuum >> instead of mission critical ckeckpointer or bgwriter. >> >> Which exact values have you in the following settings: >> >> autovacuum_analyze_scale_factor >> autovacuum_analyze_threshold >> autovacuum_freeze_max_age >> autovacuum_max_workers >> autovacuum_naptime >> autovacuum_vacuum_cost_delay >> autovacuum_vacuum_cost_limit >> autovacuum_vacuum_scale_factor >> autovacuum_vacuum_threshold >> log_autovacuum_min_duration >> >> ? >> >> Best regards, Ilya >> > >> > Best regards, >> > Dmitriy Shalashov >> >> >> >> -- >> Ilya Kosmodemiansky, >> >> PostgreSQL-Consulting.com >> tel. +14084142500 >> cell. +4915144336040 >> ik@postgresql-consulting.com > > -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
On Fri, Apr 25, 2014 at 10:29 AM, Дмитрий Шалашов <skaurus@gmail.com> wrote: > Previously during load disk was 100% busy; now we have around 100 active > state queries, 100% loaded proc, but disk is virtually idle... That was happen after changing checkpoit_segments setting? > > > Best regards, > Dmitriy Shalashov > > > 2014-04-25 12:22 GMT+04:00 Дмитрий Шалашов <skaurus@gmail.com>: > >> Hi Ilya! >> >> > Actually, thise two things are tightly bound and there is no chance to >> > avoid vacuum, you can only postpone it, this kind of work eventually >> > supposed to be done. >> >> I understand that autovacuum has to be done, but not right after previous >> autovacuum? And then again and again. >> And after cancelling that first autovacuum I started another one by hand; >> from there no autovacuum was cancelled. >> >> > ionice autovacuum instead of mission critical ckeckpointer or bgwriter >> Yeah, that was desperate. I restarted server when I had a chance - to drop >> my ionice settings back to defaults. >> >> > Which exact values have you in the following settings: >> >> autovacuum_analyze_scale_factor = 0.1 >> autovacuum_analyze_threshold = 50 >> autovacuum_freeze_max_age = 200000000 >> autovacuum_max_workers = 3 >> autovacuum_naptime = 60 >> autovacuum_vacuum_cost_delay = 20 >> autovacuum_vacuum_cost_limit = -1 >> autovacuum_vacuum_scale_factor = 0.2 >> autovacuum_vacuum_threshold = 50 >> log_autovacuum_min_duration = 0 >> >> All defaults except last one I believe. >> >> >> Minwhile I noticed in the night logs: >> checkpoints are occurring too frequently (138 seconds apart) >> Consider increasing the configuration parameter "checkpoint_segments". >> >> Increased checkpoint_segments to 256 and reloaded config. >> >> >> Best regards, >> Dmitriy Shalashov >> >> >> 2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky >> <ilya.kosmodemiansky@postgresql-consulting.com>: >> >>> Hi Dmitry, >>> >>> On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов <skaurus@gmail.com> >>> wrote: >>> > cancelled autovacuum and it seems to help. >>> >>> > In the morning autovacuum was back. And then it finished and I gone to >>> > work. >>> >>> Actually, thise two things are tightly bound and there is no chance to >>> avoid vacuum, you can only postpone it, this kind of work eventually >>> supposed to be done. >>> >>> What you really need to do as a first thing - configure your >>> autovacuum aggressively enough and then mayde ionice autovacuum >>> instead of mission critical ckeckpointer or bgwriter. >>> >>> Which exact values have you in the following settings: >>> >>> autovacuum_analyze_scale_factor >>> autovacuum_analyze_threshold >>> autovacuum_freeze_max_age >>> autovacuum_max_workers >>> autovacuum_naptime >>> autovacuum_vacuum_cost_delay >>> autovacuum_vacuum_cost_limit >>> autovacuum_vacuum_scale_factor >>> autovacuum_vacuum_threshold >>> log_autovacuum_min_duration >>> >>> ? >>> >>> Best regards, Ilya >>> > >>> > Best regards, >>> > Dmitriy Shalashov >>> >>> >>> >>> -- >>> Ilya Kosmodemiansky, >>> >>> PostgreSQL-Consulting.com >>> tel. +14084142500 >>> cell. +4915144336040 >>> ik@postgresql-consulting.com >> >> > -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
> How is you filesystem under database exactly mount?
ext3 (rw)
Thanks, we'll try new autovacuum settings!
First, I have to say that load comes and go in waves - we don't yet understood why.
All new waves have that behaviour - free disk, idle cpu.
First such wave was before checkpoit_segments change, next waves after. No more warnings about too often checkpoints though.
Dmitriy Shalashov
2014-04-25 13:22 GMT+04:00 Ilya Kosmodemiansky <ilya.kosmodemiansky@postgresql-consulting.com>:
On Fri, Apr 25, 2014 at 10:29 AM, Дмитрий Шалашов <skaurus@gmail.com> wrote:That was happen after changing checkpoit_segments setting?
> Previously during load disk was 100% busy; now we have around 100 active
> state queries, 100% loaded proc, but disk is virtually idle...
>
>
> Best regards,
> Dmitriy Shalashov
>
>
> 2014-04-25 12:22 GMT+04:00 Дмитрий Шалашов <skaurus@gmail.com>:
>
>> Hi Ilya!
>>
>> > Actually, thise two things are tightly bound and there is no chance to
>> > avoid vacuum, you can only postpone it, this kind of work eventually
>> > supposed to be done.
>>
>> I understand that autovacuum has to be done, but not right after previous
>> autovacuum? And then again and again.
>> And after cancelling that first autovacuum I started another one by hand;
>> from there no autovacuum was cancelled.
>>
>> > ionice autovacuum instead of mission critical ckeckpointer or bgwriter
>> Yeah, that was desperate. I restarted server when I had a chance - to drop
>> my ionice settings back to defaults.
>>
>> > Which exact values have you in the following settings:
>>
>> autovacuum_analyze_scale_factor = 0.1
>> autovacuum_analyze_threshold = 50
>> autovacuum_freeze_max_age = 200000000
>> autovacuum_max_workers = 3
>> autovacuum_naptime = 60
>> autovacuum_vacuum_cost_delay = 20
>> autovacuum_vacuum_cost_limit = -1
>> autovacuum_vacuum_scale_factor = 0.2
>> autovacuum_vacuum_threshold = 50
>> log_autovacuum_min_duration = 0
>>
>> All defaults except last one I believe.
>>
>>
>> Minwhile I noticed in the night logs:
>> checkpoints are occurring too frequently (138 seconds apart)
>> Consider increasing the configuration parameter "checkpoint_segments".
>>
>> Increased checkpoint_segments to 256 and reloaded config.
>>
>>
>> Best regards,
>> Dmitriy Shalashov
>>
>>
>> 2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky
>> <ilya.kosmodemiansky@postgresql-consulting.com>:
>>
>>> Hi Dmitry,
>>>
>>> On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов <skaurus@gmail.com>
>>> wrote:
>>> > cancelled autovacuum and it seems to help.
>>>
>>> > In the morning autovacuum was back. And then it finished and I gone to
>>> > work.
>>>
>>> Actually, thise two things are tightly bound and there is no chance to
>>> avoid vacuum, you can only postpone it, this kind of work eventually
>>> supposed to be done.
>>>
>>> What you really need to do as a first thing - configure your
>>> autovacuum aggressively enough and then mayde ionice autovacuum
>>> instead of mission critical ckeckpointer or bgwriter.
>>>
>>> Which exact values have you in the following settings:
>>>
>>> autovacuum_analyze_scale_factor
>>> autovacuum_analyze_threshold
>>> autovacuum_freeze_max_age
>>> autovacuum_max_workers
>>> autovacuum_naptime
>>> autovacuum_vacuum_cost_delay
>>> autovacuum_vacuum_cost_limit
>>> autovacuum_vacuum_scale_factor
>>> autovacuum_vacuum_threshold
>>> log_autovacuum_min_duration
>>>
>>> ?
>>>
>>> Best regards, Ilya
>>> >
>>> > Best regards,
>>> > Dmitriy Shalashov
>>>
>>>
>>>
>>> --
>>> Ilya Kosmodemiansky,
>>>
>>> PostgreSQL-Consulting.com
>>> tel. +14084142500
>>> cell. +4915144336040
>>> ik@postgresql-consulting.com
>>
>>
>
--
Ilya Kosmodemiansky,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
Dmitry, On Fri, Apr 25, 2014 at 11:31 AM, Дмитрий Шалашов <skaurus@gmail.com> wrote: > Thanks, we'll try new autovacuum settings! I think things with vacuum will be much better. If not, try to find out if you have long running transaction (several minutes or more) and try to avoid such them. > > First, I have to say that load comes and go in waves - we don't yet > understood why. > All new waves have that behaviour - free disk, idle cpu. > First such wave was before checkpoit_segments change, next waves after. That could be a complicate problem caused by many things from suboptimal sql-queries to network issues, could be not easy to guess. - how many locks you have during the wave in comparison with normal workload? - do you use some connection pooling (pgbouncer etc)? - how about long running transactions I have mentioned above? - are you using pg_stat_statements or any other method for detecting slow queries? > > > Best regards, > Dmitriy Shalashov > > > 2014-04-25 13:22 GMT+04:00 Ilya Kosmodemiansky > <ilya.kosmodemiansky@postgresql-consulting.com>: > >> On Fri, Apr 25, 2014 at 10:29 AM, Дмитрий Шалашов <skaurus@gmail.com> >> wrote: >> > Previously during load disk was 100% busy; now we have around 100 active >> > state queries, 100% loaded proc, but disk is virtually idle... >> >> That was happen after changing checkpoit_segments setting? >> >> > >> > >> > Best regards, >> > Dmitriy Shalashov >> > >> > >> > 2014-04-25 12:22 GMT+04:00 Дмитрий Шалашов <skaurus@gmail.com>: >> > >> >> Hi Ilya! >> >> >> >> > Actually, thise two things are tightly bound and there is no chance >> >> > to >> >> > avoid vacuum, you can only postpone it, this kind of work eventually >> >> > supposed to be done. >> >> >> >> I understand that autovacuum has to be done, but not right after >> >> previous >> >> autovacuum? And then again and again. >> >> And after cancelling that first autovacuum I started another one by >> >> hand; >> >> from there no autovacuum was cancelled. >> >> >> >> > ionice autovacuum instead of mission critical ckeckpointer or >> >> > bgwriter >> >> Yeah, that was desperate. I restarted server when I had a chance - to >> >> drop >> >> my ionice settings back to defaults. >> >> >> >> > Which exact values have you in the following settings: >> >> >> >> autovacuum_analyze_scale_factor = 0.1 >> >> autovacuum_analyze_threshold = 50 >> >> autovacuum_freeze_max_age = 200000000 >> >> autovacuum_max_workers = 3 >> >> autovacuum_naptime = 60 >> >> autovacuum_vacuum_cost_delay = 20 >> >> autovacuum_vacuum_cost_limit = -1 >> >> autovacuum_vacuum_scale_factor = 0.2 >> >> autovacuum_vacuum_threshold = 50 >> >> log_autovacuum_min_duration = 0 >> >> >> >> All defaults except last one I believe. >> >> >> >> >> >> Minwhile I noticed in the night logs: >> >> checkpoints are occurring too frequently (138 seconds apart) >> >> Consider increasing the configuration parameter "checkpoint_segments". >> >> >> >> Increased checkpoint_segments to 256 and reloaded config. >> >> >> >> >> >> Best regards, >> >> Dmitriy Shalashov >> >> >> >> >> >> 2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky >> >> <ilya.kosmodemiansky@postgresql-consulting.com>: >> >> >> >>> Hi Dmitry, >> >>> >> >>> On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов <skaurus@gmail.com> >> >>> wrote: >> >>> > cancelled autovacuum and it seems to help. >> >>> >> >>> > In the morning autovacuum was back. And then it finished and I gone >> >>> > to >> >>> > work. >> >>> >> >>> Actually, thise two things are tightly bound and there is no chance to >> >>> avoid vacuum, you can only postpone it, this kind of work eventually >> >>> supposed to be done. >> >>> >> >>> What you really need to do as a first thing - configure your >> >>> autovacuum aggressively enough and then mayde ionice autovacuum >> >>> instead of mission critical ckeckpointer or bgwriter. >> >>> >> >>> Which exact values have you in the following settings: >> >>> >> >>> autovacuum_analyze_scale_factor >> >>> autovacuum_analyze_threshold >> >>> autovacuum_freeze_max_age >> >>> autovacuum_max_workers >> >>> autovacuum_naptime >> >>> autovacuum_vacuum_cost_delay >> >>> autovacuum_vacuum_cost_limit >> >>> autovacuum_vacuum_scale_factor >> >>> autovacuum_vacuum_threshold >> >>> log_autovacuum_min_duration >> >>> >> >>> ? >> >>> >> >>> Best regards, Ilya >> >>> > >> >>> > Best regards, >> >>> > Dmitriy Shalashov >> >>> >> >>> >> >>> >> >>> -- >> >>> Ilya Kosmodemiansky, >> >>> >> >>> PostgreSQL-Consulting.com >> >>> tel. +14084142500 >> >>> cell. +4915144336040 >> >>> ik@postgresql-consulting.com >> >> >> >> >> > >> >> >> >> -- >> Ilya Kosmodemiansky, >> >> PostgreSQL-Consulting.com >> tel. +14084142500 >> cell. +4915144336040 >> ik@postgresql-consulting.com > > -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
On 25/04/14 09:47, Дмитрий Шалашов wrote: > Half a day ago one of our production PG servers (arguably busiest one) > become very slow; I went to investigate the issue and found that it runs > simultaneously '(auto)VACUUM ANALYZE recommendations' - largest table on > that server - and checkpoint, giving a 100% disk load Maybe the table has reached the state where it needs a VACUUM FREEZE. Autovacuum does that for you but it requires a complete scan of the table. Torsten
Turns out yesterday we fixed a bug and introduced a new bug, which was previously hidden by yet another bug which in turn we had fixed last week... %)
In result last fix led to greatly increased number of requests to the database.
But still, thanks for that, we found out about too frequent checkpoints and that our recommendations table has three times more dead tuples than live ones.
We will fix our autovacuum configuration.
As for other problem - 100% cpu load with idle disks - it is no more reproducing and we don't want it to :)
Thanks Ilya and Torsten!
Dmitriy Shalashov
2014-04-25 13:47 GMT+04:00 Torsten Förtsch <torsten.foertsch@gmx.net>:
On 25/04/14 09:47, Дмитрий Шалашов wrote:Maybe the table has reached the state where it needs a VACUUM FREEZE.
> Half a day ago one of our production PG servers (arguably busiest one)
> become very slow; I went to investigate the issue and found that it runs
> simultaneously '(auto)VACUUM ANALYZE recommendations' - largest table on
> that server - and checkpoint, giving a 100% disk load
Autovacuum does that for you but it requires a complete scan of the table.
Torsten