Обсуждение: Ineffective autovacuum

Поиск
Список
Период
Сортировка

Ineffective autovacuum

От
Royce Ausburn
Дата:
Hi all,

I have a problem with autovacuum apparently not doing the job I need it to do.

I have a table named datasession that is frequently inserted, updated and deleted from.  Typically the table will have a few thousand rows in it.  Each row typically survives a few days and is updated every 5 - 10 mins.  The application receives unreliable, potentially duplicate data from its source, so this table is heavily used for synchronising application threads as well.  A typical access pattern is:

- tx begin
- SELECT FOR UPDATE on a single row
- Do some application processing (1 - 100 ms)
- Possibly UPDATE the row
- tx commit

In a few instances of our application we're seeing this table grow obscenely to the point where our monitoring servers get us out of bed to manually vacuum.  I like sleep, so I want to fix this =D

I've read some recent threads and found a discussion (below) on auto vacuum that mentions auto vacuum will be cancelled when a client requests a lock that auto vacuum is using… My questions:

1) Does it look like I'm affected by the same problem as in the below discussion?

2) Are there better solutions to this problem than a periodic task that vacuums/truncates-and-rebuilds the table?  


Perhaps relevant info:


# select version();
                                             version                                              
--------------------------------------------------------------------------------------------------
 PostgreSQL 8.3.12 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
(1 row)

Auto vacuum and vacuum parameters are set to the factory defaults.

Cheers,

--Royce

From: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [GENERAL] Vacuum as "easily obtained" locks 
Date: 4 August 2011 1:52:02 AM AEST
To: Michael Graham <mgraham@bloxx.com>

On Wed, 2011-08-03 at 11:40 -0400, Tom Lane wrote:
The other problem is that once autovacuum has gotten the lock, it has
to keep it for long enough to re-scan the truncatable pages (to make
sure they're still empty).  And it is set up so that any access to the
table will kick autovacuum off the lock.  An access pattern like that
would very likely prevent it from ever truncating, if there are a lot
of pages that need to be truncated.  (There's been some discussion of
modifying this behavior, but nothing's been done about it yet.) 

Michael Graham <mgraham@bloxx.com> writes:
Ah!  This looks like it is very much the issue.  Since I've got around
150GB of data that should be truncatable and a select every ~2s.

Just to confirm would postgres write:

2011-08-03 16:09:55 BST ERROR:  canceling autovacuum task
2011-08-03 16:09:55 BST CONTEXT:  automatic vacuum of table
"traffic.public.logdata5queue"

Under those circumstances?

Yup ...

If you do a manual VACUUM, it won't allow itself to get kicked off the
lock ... but as noted upthread, that will mean your other queries get
blocked till it's done.  Not sure there's any simple fix for this that
doesn't involve some downtime.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Ineffective autovacuum

От
Tom Lane
Дата:
Royce Ausburn <royce.ml@inomial.com> writes:
> I have a problem with autovacuum apparently not doing the job I need it to do.

Hm, I wonder whether you're getting bit by bug #5759, which was fixed
after 8.3.12.

> I have a table named datasession that is frequently inserted, updated and deleted from.  Typically the table will
havea few thousand rows in it.  Each row typically survives a few days and is updated every 5 - 10 mins.  The
applicationreceives unreliable, potentially duplicate data from its source, so this table is heavily used for
synchronisingapplication threads as well.  A typical access pattern is: 

> - tx begin
> - SELECT FOR UPDATE on a single row
> - Do some application processing (1 - 100 ms)
> - Possibly UPDATE the row
> - tx commit

Transactions of that form would not interfere with autovacuum.  You'd
need something that wants exclusive lock, like a schema change.

> I've read some recent threads and found a discussion (below) on auto vacuum that mentions auto vacuum will be
cancelledwhen a client requests a lock that auto vacuum is using� My questions: 
> 1) Does it look like I'm affected by the same problem as in the below discussion?

Not unless you're seeing a lot of "canceling autovacuum task" messages
in the postmaster log.

            regards, tom lane

Re: Ineffective autovacuum

От
Royce Ausburn
Дата:


On 27/09/2011, at 2:21 PM, Tom Lane wrote:

> Royce Ausburn <royce.ml@inomial.com> writes:
>> I have a problem with autovacuum apparently not doing the job I need it to do.
>
> Hm, I wonder whether you're getting bit by bug #5759, which was fixed
> after 8.3.12.

If this were the case, would I see lots of auto vacuum worker processes in ps that are essentially doing nothing
becausethey're sleeping all the time?  If so, then I think perhaps not. 

>
>> I have a table named datasession that is frequently inserted, updated and deleted from.  Typically the table will
havea few thousand rows in it.  Each row typically survives a few days and is updated every 5 - 10 mins.  The
applicationreceives unreliable, potentially duplicate data from its source, so this table is heavily used for
synchronisingapplication threads as well.  A typical access pattern is: 
>
>> - tx begin
>> - SELECT FOR UPDATE on a single row
>> - Do some application processing (1 - 100 ms)
>> - Possibly UPDATE the row
>> - tx commit
>
> Transactions of that form would not interfere with autovacuum.  You'd
> need something that wants exclusive lock, like a schema change.
>
>> I've read some recent threads and found a discussion (below) on auto vacuum that mentions auto vacuum will be
cancelledwhen a client requests a lock that auto vacuum is using∑ My questions: 
>> 1) Does it look like I'm affected by the same problem as in the below discussion?
>
> Not unless you're seeing a lot of "canceling autovacuum task" messages
> in the postmaster log.

Okay - This is not the case.

Since sending this first email I've up'd the autovacuum log level and I've noticed that the same tables seem to be auto
vacuum'dover and over again… Some of the tables are a bit surprising in that they're updated semi-regularly, but not
enough(I'd think) to warrant an autovacuum every few minutes… Is this unusual? 


Perhaps unrelated: I've done some digging around and happened across a nightly task doing:

select pg_stat_reset()

on each of the databases in the cluster…. I've no idea why we're doing that (and our usual sysadmin / DBA has resigned,
soI doubt I'll ever know).  There must have been a reason at the time, but I wonder if this might be interfering with
things?

At any rate, I think the logs might glean some more interesting information, I'll let it alone for a few hours and
hopefullyI'll have some more useful information. 

--Royce


Re: Ineffective autovacuum

От
Marti Raudsepp
Дата:
1. First things first: vacuum cannot delete tuples that are still
visible to any old running transactions. You might have some very long
queries or transactions that prevent it from cleaning properly:

select * from pg_stat_activity where xact_start < now()-interval '10 minutes';

2. On 8.3 and earlier servers with large tables, it's critical that
your max_fsm_pages and max_fsm_relations are tuned properly. Failing
that, autovacuum will permanently leak space that can only be fixed
with a VACUUM FULL (which will take an exclusive lock and run for a
very long time)

PostgreSQL version 8.4 addressed this problem, but for the
unfortunate, you have to follow the tuning advice here:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#autovacuum_max_fsm_pages.2C_max_fsm_relations

On Tue, Sep 27, 2011 at 08:08, Royce Ausburn <royce.ml@inomial.com> wrote:
> I've noticed that the same tables seem to be auto vacuum'd over and over again… Some of the tables are a bit
surprisingin that they're updated semi-regularly, but not enough (I'd think) to warrant an autovacuum every few
minutes…Is this unusual? 

Maybe they're just auto-analyze processes? Those get triggered on
insert-only tables too, when vacuum normally wouldn't run.

> Perhaps unrelated: I've done some digging around and happened across a nightly task doing:
> select pg_stat_reset()

AFAIK (but I could be wrong), vacuum uses a separate set of statistics
not affected by pg_stat_reset.

Regards,
Marti

Re: Ineffective autovacuum

От
Tom Lane
Дата:
Royce Ausburn <royce.ml@inomial.com> writes:
> Since sending this first email I've up'd the autovacuum log level and I've noticed that the same tables seem to be
autovacuum'd over and over again… Some of the tables are a bit surprising in that they're updated semi-regularly, but
notenough (I'd think) to warrant an autovacuum every few minutes… Is this unusual? 

Well, that proves autovacuum isn't getting blocked anyway.  At this
point I suspect that Marti has fingered the correct issue: you likely
need to increase the FSM settings.  You should try running a manual
VACUUM VERBOSE and see if it suggests that more FSM space is needed
(there'll be some FSM stats at the end of the verbose printout).

> Perhaps unrelated: I've done some digging around and happened across a nightly task doing:
> select pg_stat_reset()
> on each of the databases in the cluster…. I've no idea why we're doing that (and our usual sysadmin / DBA has
resigned,so I doubt I'll ever know).  There must have been a reason at the time, but I wonder if this might be
interferingwith things? 

Hmm, it's not helping any.  Anything that needs vacuuming, but less
often than once a day, would get missed due to the stats getting
forgotten.

            regards, tom lane

Re: Ineffective autovacuum

От
Scott Marlowe
Дата:
On Tue, Sep 27, 2011 at 7:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Royce Ausburn <royce.ml@inomial.com> writes:
>> Since sending this first email I've up'd the autovacuum log level and I've noticed that the same tables seem to be
autovacuum'd over and over again… Some of the tables are a bit surprising in that they're updated semi-regularly, but
notenough (I'd think) to warrant an autovacuum every few minutes… Is this unusual? 
>
> Well, that proves autovacuum isn't getting blocked anyway.  At this
> point I suspect that Marti has fingered the correct issue: you likely
> need to increase the FSM settings.  You should try running a manual
> VACUUM VERBOSE and see if it suggests that more FSM space is needed
> (there'll be some FSM stats at the end of the verbose printout).

That's the probably the best first step, a good second one might be to
increase the aggressiveness of autovac by lowering the delay, and
increasing the cost limit.

OP: You need to watch it a little closer during the day.  first do as
suggested and increase the max_fsm_pages.  High settings on it don't
cost a lot as they're only 6 bytes per page.  So 1M max_fsm_pages
costs 6M of shared RAM.  After that run vacuum verbose every hour or
two to keep an eye on the trend of how many pages it says are needed.
If that number doesn't stabilize, but just keeps growing then you're
not vacuuming aggressively enough.  Up autovacuum_vacuum_cost_limit by
a couple of factors, and lower autovacuum_vacuum_cost_delay to 5ms or
less.  Make sure you don't swamp your IO subsystem.  On big machines
with lots of spindles it's hard to swamp the IO.  On smaller
workstation class machines it's pretty easy.

Re: Ineffective autovacuum

От
Royce Ausburn
Дата:

On 27/09/2011, at 8:29 PM, Marti Raudsepp wrote:

1. First things first: vacuum cannot delete tuples that are still
visible to any old running transactions. You might have some very long
queries or transactions that prevent it from cleaning properly:

select * from pg_stat_activity where xact_start < now()-interval '10 minutes';

Thanks -- that query is very handy.  I suspect this might be the cause of our woes as this query results in a handful of long lived connections, however they're connections to databases other than the one that I'm having trouble with.  

I've checked up on the FSM as you suggested, I don't think that's the problem as there're no warnings in the verbose output nor the logs.  But another clue:

DETAIL:  93 dead row versions cannot be removed yet.

After clearing those stuffed transactions vacuum verbose manages to clear away all the dead rows… That's confirmation enough for me - Now to find the application bugs - Thanks Tom, Marti & Scott for your help!

--Royce