Обсуждение: Deadlocks in HS (on 9.0 :( )

Поиск
Список
Период
Сортировка

Deadlocks in HS (on 9.0 :( )

От
Greg Stark
Дата:
We've observed a 9.0 database have undetected deadlocks repeatedly in
hot standby mode.

I think what's happening is that autovacuum is kicking off a VACUUM of
some system catalogs -- it seems to usually be pg_statistics' toast
table actually. At the end of the vacuum it briefly gets the exclusive
lock to truncate the table. On the standby it replays that and records
the exclusive lock being taken. It then sees a cleanup record that
pauses replay because a HS standby transaction is running that can see
the xid being cleaned up. That transaction then blocks against the
exclusive lock and deadlocks against recovery.

We expect upgrading to 9.3 to fix the problem for us due to the xid
feedback mechanism. But is this still a known problem when feedback is
not enabled? And is it a problem we should try to find a backpatchable
fix for?

I'm pondering whether we really need to log the exclusive lock taken
by vacuum when truncating. Worst case is a scan is in progress,
perhaps we can make scans understand how to handle tables that have
been truncated concurrently? We could always make the truncate replay
command acquire the lock and release it itself right away.

-- 
greg



Re: Deadlocks in HS (on 9.0 :( )

От
Noah Misch
Дата:
On Tue, Jul 15, 2014 at 04:54:05PM +0100, Greg Stark wrote:
> We've observed a 9.0 database have undetected deadlocks repeatedly in
> hot standby mode.
> 
> I think what's happening is that autovacuum is kicking off a VACUUM of
> some system catalogs -- it seems to usually be pg_statistics' toast
> table actually. At the end of the vacuum it briefly gets the exclusive
> lock to truncate the table. On the standby it replays that and records
> the exclusive lock being taken. It then sees a cleanup record that
> pauses replay because a HS standby transaction is running that can see
> the xid being cleaned up. That transaction then blocks against the
> exclusive lock and deadlocks against recovery.
> 
> We expect upgrading to 9.3 to fix the problem for us due to the xid
> feedback mechanism. But is this still a known problem when feedback is
> not enabled?

This is the first I've heard of the problem.

> And is it a problem we should try to find a backpatchable
> fix for?

Yes.  Undetected deadlock entirely within the confines of the system is a
clear bug, so let's back-patch if the fix proves suitable for that.

> I'm pondering whether we really need to log the exclusive lock taken
> by vacuum when truncating. Worst case is a scan is in progress,
> perhaps we can make scans understand how to handle tables that have
> been truncated concurrently? We could always make the truncate replay
> command acquire the lock and release it itself right away.

Perhaps so.  Heikki had a broader design in that area:
http://www.postgresql.org/message-id/flat/5193AB47.3070801@vmware.com

The lock VACUUM takes before truncating a relation is the main (only?) source
of spontaneous recovery conflicts not addressed by hot_standby_feedback, so
any of the above would constitute a nice step forward.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Deadlocks in HS (on 9.0 :( )

От
Andres Freund
Дата:
On 2014-07-15 16:54:05 +0100, Greg Stark wrote:
> We've observed a 9.0 database have undetected deadlocks repeatedly in
> hot standby mode.
> 
> I think what's happening is that autovacuum is kicking off a VACUUM of
> some system catalogs -- it seems to usually be pg_statistics' toast
> table actually. At the end of the vacuum it briefly gets the exclusive
> lock to truncate the table. On the standby it replays that and records
> the exclusive lock being taken. It then sees a cleanup record that
> pauses replay because a HS standby transaction is running that can see
> the xid being cleaned up. That transaction then blocks against the
> exclusive lock and deadlocks against recovery.

Hm. Does that resolve itself after max_standby_streaming_delay? Because
I don't really see how it'd actually have an undetected deadlock in that
case.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services