Обсуждение: long running transactions
While doing a verbose vacuum, I'm constantly hitting things like: DETAIL: 3606 dead row versions cannot be removed yet. I believe this is a problem, because I still do have some empty tables requireing up to 3-400 ms just to check if the table is empty (see thread "slow queue-like empty table"). If pg_stat_activity.query_start actually is the start time of the transaction, then we've gotten rid of all the real long-running transactions. Then again, if pg_stat_activity.query_start actually was the start time of the transaction, the attribute would have been called pg_stat_activity.transaction_start, right? Is there any way to find the longest running transaction?
Tobias Brox <tobias@nordicbet.com> writes:
> If pg_stat_activity.query_start actually is the start time of the
> transaction,
... but it isn't.
> Is there any way to find the longest running transaction?
Look in pg_locks to see the lowest-numbered transaction ID --- each
transaction will be holding exclusive lock on its own XID. You can
correlate that back to pg_stat_activity via the PID.
regards, tom lane
[Tom Lane - Tue at 12:23:40PM -0400] > Look in pg_locks to see the lowest-numbered transaction ID --- each > transaction will be holding exclusive lock on its own XID. You can > correlate that back to pg_stat_activity via the PID. Thanks a lot for the quick reply - I've already identified one long-running transaction. (I'm not allowed to order by xid, and not allowed to cast it to anything, how come?)
[Tobias Brox - Tue at 06:39:13PM +0200] > Thanks a lot for the quick reply - I've already identified one > long-running transaction. belonging to autovacuum ... how come?
Tobias Brox <tobias@nordicbet.com> writes:
> [Tobias Brox - Tue at 06:39:13PM +0200]
>> Thanks a lot for the quick reply - I've already identified one
>> long-running transaction.
> belonging to autovacuum ... how come?
Blocked on someone else's lock, maybe?
regards, tom lane
[Tom Lane - Tue at 12:42:52PM -0400] > > belonging to autovacuum ... how come? > > Blocked on someone else's lock, maybe? hardly, the autovacuum is the only one having such a low transaction id, and also the only one hanging around when waiting a bit and rechecking the pg_locks table.
Tobias Brox <tobias@nordicbet.com> writes:
>> Blocked on someone else's lock, maybe?
> hardly, the autovacuum is the only one having such a low transaction id,
> and also the only one hanging around when waiting a bit and rechecking
> the pg_locks table.
Hmph. Is the autovac process actually doing anything (strace would be
revealing)? If not, can you attach to the autovac process with gdb and
get a stack trace to see where it's blocked?
regards, tom lane
[Tom Lane - Tue at 01:09:52PM -0400] > Hmph. Is the autovac process actually doing anything (strace would be > revealing)? If not, can you attach to the autovac process with gdb and > get a stack trace to see where it's blocked? Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the config would restart autovacuum. Well, whatever, we still have the nightly vacuum crontab.
Tobias Brox <tobias@nordicbet.com> writes:
> [Tom Lane - Tue at 01:09:52PM -0400]
>> Hmph. Is the autovac process actually doing anything (strace would be
>> revealing)? If not, can you attach to the autovac process with gdb and
>> get a stack trace to see where it's blocked?
> Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the
> config would restart autovacuum.
It'll come back after the autovacuum naptime. If it gets stuck again,
please investigate.
regards, tom lane
[Tom Lane - Tue at 01:18:27PM -0400] > >> Hmph. Is the autovac process actually doing anything (strace would be > >> revealing)? It's definitively doing something; mostly reading, but also some few writes, semops and opens. > If not, can you attach to the autovac process with gdb and > >> get a stack trace to see where it's blocked? (gdb) bt #0 0xb7c599f8 in select () from /lib/tls/libc.so.6 #1 0x08253c53 in pg_usleep () #2 0x0812ee93 in vacuum_delay_point () #3 0x0812f2a5 in lazy_vacuum_rel () #4 0x0812ef7b in lazy_vacuum_rel () #5 0x0812b4b6 in vac_update_relstats () #6 0x0812a995 in vacuum () #7 0x0818d2ca in autovac_stopped () #8 0x0818ceae in autovac_stopped () #9 0x0818c848 in autovac_stopped () #10 0x0818c4e2 in autovac_start () #11 0x08192c11 in PostmasterMain () #12 0x08191dcf in PostmasterMain () #13 0x081541b1 in main () > It'll come back after the autovacuum naptime. If it gets stuck again, > please investigate. It seems stuck, has had the same transid for a long while, and the number of undeletable dead rows in our tables are increasing.
Tobias Brox <tobias@nordicbet.com> writes:
> (gdb) bt
> #0 0xb7c599f8 in select () from /lib/tls/libc.so.6
> #1 0x08253c53 in pg_usleep ()
> #2 0x0812ee93 in vacuum_delay_point ()
> #3 0x0812f2a5 in lazy_vacuum_rel ()
> #4 0x0812ef7b in lazy_vacuum_rel ()
> #5 0x0812b4b6 in vac_update_relstats ()
That doesn't look particularly blocked, and if you are seeing
reads/writes too, then it's doing something.
> It seems stuck, has had the same transid for a long while, and the
> number of undeletable dead rows in our tables are increasing.
Perhaps you have overly aggressive vacuum cost delay settings?
regards, tom lane
[Tom Lane - Tue at 02:04:55PM -0400] > > It seems stuck, has had the same transid for a long while, and the > > number of undeletable dead rows in our tables are increasing. > > Perhaps you have overly aggressive vacuum cost delay settings? Perhaps, though I wouldn't expect it to sleep in the middle of a transaction - and also, it really did seem to me that it's doing work rather than only sleeping. The transaction id for the vacuum process is the same now as when I wrote the previous email, and the number of dead unremovable rows have increased steadily. The settings in effect are: autovacuum_vacuum_cost_delay = 500 autovacuum_vacuum_cost_limit = 200
Tobias Brox <tobias@nordicbet.com> writes:
>> Perhaps you have overly aggressive vacuum cost delay settings?
> autovacuum_vacuum_cost_delay = 500
> autovacuum_vacuum_cost_limit = 200
Well, that's going to cause it to sleep half a second after every dozen
or so page I/Os. I think you'd be well advised to reduce the delay.
regards, tom lane
[Tom Lane - Tue at 02:26:53PM -0400] > > autovacuum_vacuum_cost_delay = 500 > > autovacuum_vacuum_cost_limit = 200 > > Well, that's going to cause it to sleep half a second after every dozen > or so page I/Os. I think you'd be well advised to reduce the delay. Modified it to 20/250, and it definitively helped. Sorry for the list verbosity; I should have been able to resolve this myself already some 2-3 emails ago :-) I wanted a "soft" introduction of autovac in production, and assumed that it was better to begin with too much sleep than too little! Well, well.