Обсуждение: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
I wrote: > I bet Alvaro's spotted the problem. ALTER INDEX RENAME doesn't seem to > take any lock on the index's parent table, only on the index itself. > That means that a query on "onek" could be trying to read the pg_class > entries for onek's indexes concurrently with someone trying to commit > a pg_class update to rename an index. If the query manages to visit > the new and old versions of the row in that order, and the commit > happens between, *neither* of the versions would look valid. MVCC > doesn't save us because this is all SnapshotNow. On further reflection, it seems this is a special case of the general problem that we ought to lock a relation *before* we attempt to load the relcache entry, not after. We've seen previous complaints about this, for instance a process erroring out with a message about pg_trigger having a different number of entries than it expected because someone had committed an ADD/DROP TRIGGER between it reading pg_class and reading pg_trigger. A long time ago (30 Oct 2002) I wrote: > Thinking further, it's really kinda bogus that LockRelation() works on > an already-opened Relation; if possible we should acquire the lock > before attempting to create a relcache entry. (We only need to know the > OID and the relisshared status before we can make a locktag, so it'd be > possible to acquire the lock using only the contents of the pg_class row.) > Not sure how much code restructuring might be involved to make this > happen, but it'd be worth thinking about for 7.4. but that never got off the back burner. The current example shows that "read the pg_class row and then lock" doesn't work anyway, because we might fail to find any version of the pg_class row that passes SnapshotNow. One thing that has improved since 7.3 is that we only do relcache loads by OID, never by name. This means the only thing stopping us from taking lock before we invoke relcache is lack of knowledge about the rel's relisshared status. Given that the set of shared relations is pretty small, and fixed in any given backend version, it wouldn't be unreasonable to handle this by keeping a hardwired list of shared relation OIDs somewhere in the backend. Anyway I really think we need to move to a coding rule that says thou shalt not access a relcache entry without *already* having some kind of lock on the relation. Comments? regards, tom lane
I wrote: > ... This means the only thing stopping us from > taking lock before we invoke relcache is lack of knowledge about the > rel's relisshared status. Given that the set of shared relations is > pretty small, and fixed in any given backend version, it wouldn't be > unreasonable to handle this by keeping a hardwired list of shared > relation OIDs somewhere in the backend. On further investigation, there is one small stumbling block in that plan. We currently have hard-wired OIDs for shared catalogs and their indexes ... but not for their toast tables (nor the indexes thereon). I think the best solution for this might be to put the responsibility for creating system catalogs' toast tables into the bootstrap phase instead of making initdb do it afterwards. This would be a Good Thing anyway since currently we are incapable of dealing with bootstrap-time insertions of values large enough to need toasting. I'm imagining adding macros to the include/catalog/*.h files along the lines of TOAST_CATALOG(pg_class,4242,4243) where the numbers are hand-assigned OIDs for the toast table and index. The existing mechanisms for creating the .bki bootstrap script would turn these into commands in the .bki script. We could then get rid of ALTER TABLE ... CREATE TOAST TABLE as a SQL command altogether, which seems like a good thing to me. Anyone want to argue for keeping it? There really shouldn't be any case where a user needs to invoke it. Thoughts, objections? regards, tom lane
I wrote: >> ... This means the only thing stopping us from >> taking lock before we invoke relcache is lack of knowledge about the >> rel's relisshared status. While digging through all the places that open relcache entries, I've realized that there's another problem, specifically the way that we lock indexes. The current theory is that index_open() takes no lock, and then we establish a lock just for the duration of an index scan. The comments for index_open explain: * Note: we acquire no lock on the index. A lock is not needed when* simply examining the index reldesc; the index'sschema information* is considered to be protected by the lock that the caller had better* be holding on theparent relation. Some type of lock should be* obtained on the index before physically accessing it, however.* Thisis handled automatically for most uses by index_beginscan* and index_endscan for scan cases, or by ExecOpenIndicesand* ExecCloseIndices for update cases. Other callers will need to* obtain their own locks. However, the lionfish failure makes the folly of this approach evident (it was in fact index_open that failed in that example, unless my theory about it is all wrong). If we're going to move to lock-before-open then we've got to fix index_open too. I envision making index_open just about like heap_open, ie add a lockmode parameter, and then get rid of the separate lock step in index_beginscan. There is one small problem with doing that, which is this code in ExecOpenIndices: indexDesc = index_open(indexOid); if (indexDesc->rd_am->amconcurrent) LockRelation(indexDesc, RowExclusiveLock); else LockRelation(indexDesc,AccessExclusiveLock); IOW you need to already have the index open to find out what sort of lock to take on it. I have a modest proposal for fixing that: let's get rid of amconcurrent, so that RowExclusiveLock is always the right lock to take here. All of the currently supported AMs have amconcurrent = true, and so does the proposed bitmap index patch, and given the project's current focus on high concurrent performance I cannot imagine that we'd accept a future patch to add a nonconcurrent index type. So let's just legislate that all AMs have to support concurrent updates (or at least that they can't rely on the main system to protect them from the case). Any objections? regards, tom lane
Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > I think the best solution for this might be to put the responsibility > for creating system catalogs' toast tables into the bootstrap phase > instead of making initdb do it afterwards. This would be a Good Thing > anyway since currently we are incapable of dealing with bootstrap-time > insertions of values large enough to need toasting. I'm imagining > adding macros to the include/catalog/*.h files along the lines of Would this make it much more difficult to support user-defined indexes on system catalogs? It looks like we don't support that at the moment but as we see larger Postgres installations it seems likely we'll need to. I don't really consider myself a very heavy Postgres user but I've got databases w/ > 30k entries in pg_class and near 300k in pg_attribute... Those aren't shared but it sounds like you were talking about all of them above anyway. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I think the best solution for this might be to put the responsibility >> for creating system catalogs' toast tables into the bootstrap phase >> instead of making initdb do it afterwards. > Would this make it much more difficult to support user-defined indexes > on system catalogs? AFAICS the problems with that are orthogonal to this. You'll never have user-defined (as in "added after initdb") indexes on shared catalogs, because there is no way to update their pg_class descriptions in all databases at once. For non-shared catalogs there's nothing except access permissions stopping you from adding ordinary indexes now. We don't support partial or functional indexes on system catalogs, but the implementation reasons for that are unrelated to what I'm doing. > It looks like we don't support that at the moment > but as we see larger Postgres installations it seems likely we'll need > to. I don't really consider myself a very heavy Postgres user but I've > got databases w/ > 30k entries in pg_class and near 300k in > pg_attribute... And are you seeing any performance issues related to lack of indexes? For the system catalogs we understand the access patterns pretty well (I think), and I thought we pretty much had the right indexes on them already. regards, tom lane
Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> I think the best solution for this might be to put the responsibility > >> for creating system catalogs' toast tables into the bootstrap phase > >> instead of making initdb do it afterwards. > > > Would this make it much more difficult to support user-defined indexes > > on system catalogs? > > AFAICS the problems with that are orthogonal to this. You'll never have > user-defined (as in "added after initdb") indexes on shared catalogs, > because there is no way to update their pg_class descriptions in all > databases at once. Ok. > For non-shared catalogs there's nothing except > access permissions stopping you from adding ordinary indexes now. I had thought this might be the case since I had some recollection of indexes on catalogs either being speculated about or suggested on -perform. The error-message isn't entirely clear about this fact though: src/backend/catalog/index.c:495 (or so) (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),errmsg("user-defined indexes on system catalog tables are not supported"))); > And are you seeing any performance issues related to lack of indexes? Depends on the eye of the beholder to some extent I suppose. > For the system catalogs we understand the access patterns pretty well > (I think), and I thought we pretty much had the right indexes on them > already. The case that I was specifically thinking about was the relowner in pg_class not being indexed. tsf=> explain analyze select cl.relname from pg_authid a join pg_class cl on (a.oid = cl.relowner) where a.rolname = 'postgres'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------Hash Join (cost=2.54..1970.25 rows=383 width=64) (actual time=0.113..77.950 rows=223 loops=1) Hash Cond: ("outer".relowner = "inner".oid) -> Seq Scan on pg_class cl (cost=0.00..1881.59rows=16459 width=68) (actual time=0.036..46.607 rows=17436 loops=1) -> Hash (cost=2.54..2.54 rows=1 width=4) (actual time=0.057..0.057 rows=1 loops=1) -> Seq Scan on pg_authid a (cost=0.00..2.54 rows=1 width=4) (actual time=0.047..0.050 rows=1 loops=1) Filter: (rolname = 'postgres'::name)Total runtime: 78.358 ms (7 rows) It's not exactly *slow* but an index might speed it up. I was trying to create one and couldn't figure out the right incantation to make it happen. 'allow_system_table_mods = true' wasn't working in postgresql.conf (it wouldn't start) for some reason... Other system-catalog queries that I've been a little unhappy about the performance of (though I don't know if indexes would help, so this is really just me complaining) are: initial table list in ODBC w/ Access (takes *forever* when you have alot of tables...); schema/table lists in phppgadmin when there are alot of schemas/tables; information_schema queries (try looking at information_schema.columns for a given table when you've got alot of tables... over 10x slower than looking at pg_class/pg_attribute directly, 3 seconds vs. 200ms, or so). Thanks, Stephen