Обсуждение: Reducing Catalog Locking

Поиск
Список
Период
Сортировка

Reducing Catalog Locking

От
Simon Riggs
Дата:
Recent work on parallel query has opened my eyes to exactly how
frequently we request locks on various catalog tables. (Attached file
is a lock analysis on a representative Pg server).

Given these are catalog tables, we aren't doing much to them that
requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
touch those tables like that. When we do that, pretty much everything
else hangs, cos you can't get much done while fundamental tables are
locked.

So my proposal is that we invent a "big catalog lock". The benefit of
this is that we greatly reduce lock request traffic, as well as being
able to control exactly when such requests occur. (Fine grained
locking isn't always helpful).

Currently, SearchCatCache() requests locks on individual catalog
tables. Alternatively, we could request an AccessShareLock on a "big
catalog lock" that must be accessed first before a strong
relation-specific lock is requested. We just need to change the lockid
used for each cache.

We can still CREATE, ALTER, DROP and VACUUM all catalog tables - but
this idea would block VACUUM FULL, but that would have been blocked
anyway by general activity.

We reduce lock traffic by having SearchCatCache() use a new call
heap_catalog_open() which calls a new LockCatalogLock() which
specifically caches whether we have already locked the BigCatalogLock
or not. That cache can be cleared easily and cheaply at EOX. And it
can be set quickly and easily in parallel worker tasks.

We then add a special locking clause for VAC FULL on catalog tables.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Reducing Catalog Locking

От
Robert Haas
Дата:
On Fri, Oct 31, 2014 at 6:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Recent work on parallel query has opened my eyes to exactly how
> frequently we request locks on various catalog tables. (Attached file
> is a lock analysis on a representative Pg server).

That analysis is interesting.

> Given these are catalog tables, we aren't doing much to them that
> requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
> touch those tables like that. When we do that, pretty much everything
> else hangs, cos you can't get much done while fundamental tables are
> locked.

True, although it's currently the case that catalog tables are only
locked for the minimum time necessary.  So, VF on pg_class will block
nearly any new query from starting up, but already-running queries may
be able to keep going, and idle transactions don't cause a problem.
If we held system catalogs until transaction commit, a VF on pg_class
would basically wait until every other transaction in the system
completed and preclude any other transaction from starting until it
finished.  That's significantly more problematic in my view.

I think that the fast-path locking mechanism prevents the overwhelming
majority of lock-related pain for these kinds of things.  Most system
catalog locks are "weak" within the meaning of fast-path locking, so
the main lock table is rarely touched at all, and manipulating our own
PGPROC - which generally nobody else is touching - just isn't that
expensive.

On a related note, I've previously had the thought that it would be
nice to have a "big DDL lock" - that is, a lock that prevents
concurrent DDL without preventing anything else - so that pg_dump
could get just that one lock and then not worry about the state of the
world changing under it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Reducing Catalog Locking

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Recent work on parallel query has opened my eyes to exactly how
> frequently we request locks on various catalog tables. (Attached file
> is a lock analysis on a representative Pg server).

> Given these are catalog tables, we aren't doing much to them that
> requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
> touch those tables like that. When we do that, pretty much everything
> else hangs, cos you can't get much done while fundamental tables are
> locked.

So don't do that --- I'm not aware that either operation is ever
considered recommended on catalogs.

> So my proposal is that we invent a "big catalog lock". The benefit of
> this is that we greatly reduce lock request traffic, as well as being
> able to control exactly when such requests occur. (Fine grained
> locking isn't always helpful).

> Currently, SearchCatCache() requests locks on individual catalog
> tables. Alternatively, we could request an AccessShareLock on a "big
> catalog lock" that must be accessed first before a strong
> relation-specific lock is requested. We just need to change the lockid
> used for each cache.

I doubt that this can ever be safe, because it will effectively assume
that all operations on catalog tables are done by code that knows that it
is accessing a catalog.  What about manual DML, or even DDL, on a catalog?
Miss even one place that can modify a table, and you have a problem.

More to the point, how would using a big lock not make the contention
situation *worse* rather than better?  At least if you decide you need
to cluster pg_statistic, you aren't blocking sessions that don't need
to touch pg_statistic --- and furthermore, they aren't blocking you.
I think the proposal would render it completely impossible to ever get a
strong lock on a catalog table in a busy system, not even a little-used
catalog.

In fact, since we can assume that a transaction trying to do "CLUSTER
pg_class" will have touched at least one syscache during startup, this
proposal would absolutely guarantee that would fail (even in a completely
idle system) because it would already hold the BigLock, and that would
have to be seen as existing use of the table.
        regards, tom lane



Re: Reducing Catalog Locking

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On a related note, I've previously had the thought that it would be
> nice to have a "big DDL lock" - that is, a lock that prevents
> concurrent DDL without preventing anything else - so that pg_dump
> could get just that one lock and then not worry about the state of the
> world changing under it.

Hm ... how would that work exactly?  Every DDL operation has to take
the BigDDLLock in shared mode, and then pg_dump takes it in exclusive
mode?  That would preclude two pg_dumps running in parallel, which
maybe isn't a mainstream usage but still there's never been such a
restriction before.  Parallel pg_dump might have an issue in particular.

But more to the point, this seems like optimizing pg_dump startup by
adding overhead everywhere else, which doesn't really sound like a
great tradeoff to me.
        regards, tom lane



Re: Reducing Catalog Locking

От
Andres Freund
Дата:
On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On a related note, I've previously had the thought that it would be
> > nice to have a "big DDL lock" - that is, a lock that prevents
> > concurrent DDL without preventing anything else - so that pg_dump
> > could get just that one lock and then not worry about the state of the
> > world changing under it.
> 
> Hm ... how would that work exactly?  Every DDL operation has to take
> the BigDDLLock in shared mode, and then pg_dump takes it in exclusive
> mode?
>
> That would preclude two pg_dumps running in parallel, which
> maybe isn't a mainstream usage but still there's never been such a
> restriction before.  Parallel pg_dump might have an issue in particular.

It should probably be a heavyweight lock. Then every DDL operation can
take it in RowExclusiveLock mode and pg_dump can take ShareLock. As
RowExclusive is a fastpath elegible lock, that'll not even hit the
global lock table most of the time.

> But more to the point, this seems like optimizing pg_dump startup by
> adding overhead everywhere else, which doesn't really sound like a
> great tradeoff to me.

Well, it'd finally make pg_dump "correct" under concurrent DDL. That's
quite a worthwile thing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Reducing Catalog Locking

От
Robert Haas
Дата:
On Fri, Oct 31, 2014 at 9:54 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> But more to the point, this seems like optimizing pg_dump startup by
>> adding overhead everywhere else, which doesn't really sound like a
>> great tradeoff to me.
>
> Well, it'd finally make pg_dump "correct" under concurrent DDL. That's
> quite a worthwile thing.

Yeah, exactly.  I agree with Tom that the overhead might be a concern.
But on the other hand, nobody has been more concerned about the
failure of pg_dump to handle this issue correctly than Tom.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Reducing Catalog Locking

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
>> But more to the point, this seems like optimizing pg_dump startup by
>> adding overhead everywhere else, which doesn't really sound like a
>> great tradeoff to me.

> Well, it'd finally make pg_dump "correct" under concurrent DDL. That's
> quite a worthwile thing.

I lack adequate caffeine at the moment, so explain to me how this adds
any guarantees whatsoever?  It sounded like only a performance
optimization from here.
        regards, tom lane



Re: Reducing Catalog Locking

От
Simon Riggs
Дата:
On 31 October 2014 13:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I doubt that this can ever be safe, because it will effectively assume
> that all operations on catalog tables are done by code that knows that it
> is accessing a catalog.

> What about manual DML, or even DDL, on a catalog?

I've never really understood why you think its a good idea to allow
such commands.

It's pretty easy to see that can screw things up a million ways.

It would be easy enough to make the superuser check acquire the
BigCatalogLock before it does anything else. That way only the
superuser code path would be affected by the special case required to
get around that problem.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Reducing Catalog Locking

От
Andres Freund
Дата:
On 2014-10-31 10:02:28 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
> >> But more to the point, this seems like optimizing pg_dump startup by
> >> adding overhead everywhere else, which doesn't really sound like a
> >> great tradeoff to me.
> 
> > Well, it'd finally make pg_dump "correct" under concurrent DDL. That's
> > quite a worthwile thing.
> 
> I lack adequate caffeine at the moment, so explain to me how this adds
> any guarantees whatsoever?  It sounded like only a performance
> optimization from here.

A performance optimization might be what Simon intended, but it isn't
primarily what I (and presumably Robert) thought it be useful for.

Consider the example in
http://archives.postgresql.org/message-id/20130507141526.GA6117%40awork2.anarazel.de

If pg_dump were to take the 'ddl lock' *before* acquiring the snapshot
to lock all tables, that scenario couldn't happen anymore. As soon as
pg_dump has acquired the actual locks the ddl lock could be released
again.

Taking the ddl lock from SQL would probably require some 'backup' or
superuser permission, but luckily there seems to be movement around
that.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Reducing Catalog Locking

От
Simon Riggs
Дата:
On 31 October 2014 13:03, Robert Haas <robertmhaas@gmail.com> wrote:

>> Given these are catalog tables, we aren't doing much to them that
>> requires a strong lock. Specifically, only CLUSTER and VACUUM FULL
>> touch those tables like that. When we do that, pretty much everything
>> else hangs, cos you can't get much done while fundamental tables are
>> locked.
>
> True, although it's currently the case that catalog tables are only
> locked for the minimum time necessary.  So, VF on pg_class will block
> nearly any new query from starting up, but already-running queries may
> be able to keep going, and idle transactions don't cause a problem.
> If we held system catalogs until transaction commit, a VF on pg_class
> would basically wait until every other transaction in the system
> completed and preclude any other transaction from starting until it
> finished.  That's significantly more problematic in my view.

No, not really. As soon as you put that VF in there, queries will
begin to block. It doesn't really matter at what point they block, so
it doesn't make the problem worse.

VFs on pg_class are very rare and not usually run while trying to make
a normal workload happen, so its a strange thing to care about how
well that is optimized.

VACUUM FULL on pg_class only ever happens because of temp tables
anyway. I have been investigating that for other purposes, see new
thread soon.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Reducing Catalog Locking

От
Simon Riggs
Дата:
On 31 October 2014 14:49, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-10-31 10:02:28 -0400, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>> > On 2014-10-31 09:48:52 -0400, Tom Lane wrote:
>> >> But more to the point, this seems like optimizing pg_dump startup by
>> >> adding overhead everywhere else, which doesn't really sound like a
>> >> great tradeoff to me.
>>
>> > Well, it'd finally make pg_dump "correct" under concurrent DDL. That's
>> > quite a worthwile thing.
>>
>> I lack adequate caffeine at the moment, so explain to me how this adds
>> any guarantees whatsoever?  It sounded like only a performance
>> optimization from here.
>
> A performance optimization might be what Simon intended, but it isn't
> primarily what I (and presumably Robert) thought it be useful for.
>
> Consider the example in
> http://archives.postgresql.org/message-id/20130507141526.GA6117%40awork2.anarazel.de
>
> If pg_dump were to take the 'ddl lock' *before* acquiring the snapshot
> to lock all tables, that scenario couldn't happen anymore. As soon as
> pg_dump has acquired the actual locks the ddl lock could be released
> again.
>
> Taking the ddl lock from SQL would probably require some 'backup' or
> superuser permission, but luckily there seems to be movement around
> that.

Good idea. But it is a different idea. I can do that as well...

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services