Re: Low hanging fruit in lazy-XID-assignment patch?
От | Tom Lane |
---|---|
Тема | Re: Low hanging fruit in lazy-XID-assignment patch? |
Дата | |
Msg-id | 23762.1189203341@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Low hanging fruit in lazy-XID-assignment patch? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Low hanging fruit in lazy-XID-assignment patch?
("Florian G. Pflug" <fgp@phlo.org>)
|
Список | pgsql-hackers |
Here's some revised text for the README file, based on using Florian's idea of a global latestCompletedXid variable. As I worked through it I realized that in this design, XidGenLock gates entry of new XIDs into the ProcArray while ProcArrayLock gates their removal. Which is an interesting sort of symmetry property. It also turns out that the reason we need to gate entry with XidGenLock is to keep from breaking GetOldestXmin, rather than to ensure correctness of snapshots per se. (Note: I refer in the text to ProcArrayEndTransaction(), which is a function I'm thinking of putting into procarray.c to replace the current inline-in-xact.c code that clears xid and related fields.) Comments? regards, tom lane Interlocking transaction begin, transaction end, and snapshots -------------------------------------------------------------- We try hard to minimize the amount of overhead and lock contention involved in the frequent activities of beginning/ending a transaction and taking a snapshot. Unfortunately, we must have some interlocking for this, because we must ensure consistency about the commit order of transactions. For example, suppose an UPDATE in xact A is blocked by xact B's prior update of the same row, and xact B is doing commit while xact C gets a snapshot. Xact A can complete and commit as soon as B releases its locks. If xact C's GetSnapshotData sees xact B as still running, then it had better see xact A as still running as well, or it will be able to see two tuple versions - one deleted by xact B and one inserted by xact A. Another reason why this would be bad is that C would see (in the row inserted by A) earlier changes by B, and it would be inconsistent for C not to see any of B's changes elsewhere in the database. Formally, the correctness requirement is "if a snapshot A considers transaction X as committed, and any of transaction X's snapshots considered transaction Y as committed, then snapshot A must consider transaction Y as committed". What we actually enforce is strict serialization of commits and rollbacks with snapshot-taking: we do not allow any transaction to exit the set of running transactions while a snapshot is being taken. (This rule is stronger than necessary for consistency, but is relatively simple to enforce, and it assists with some other issues as explained below.) The implementation of this is that GetSnapshotData takes the ProcArrayLock in shared mode (so that multiple backends can take snapshots in parallel), but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode while clearing MyProc->xid at transaction end (either commit or abort). ProcArrayEndTransaction also holds the lock while advancing the shared latestCompletedXid variable. This allows GetSnapshotData to use latestCompletedXid + 1 as xmax for its snapshot: there can be no transaction >= this xid value that the snapshot needs to consider as completed. In short, then, the rule is that no transaction may exit the set of currently-running transactions between the time we fetch latestCompletedXid and the time we finish building our snapshot. However, this restriction only applies to transactions that have an XID --- read-only transactions can end without acquiring ProcArrayLock, since they don't affect anyone else's snapshot nor latestCompletedXid. Transaction start, per se, doesn't have any interlocking with these considerations, since we no longer assign an XID immediately at transaction start. But when we do decide to allocate an XID, GetNewTransactionId must store the new XID into the shared ProcArray before releasing XidGenLock. This ensures that all top-level XIDs <= latestCompletedXid are either present in the ProcArray, or not running anymore. (This guarantee doesn't apply to subtransaction XIDs, because of the possibility that there's not room for them in the subxid array; instead we guarantee that they are present or the overflow flag is set.) If a backend released XidGenLock before storing its XID into MyProc, then it would be possible for another backend to allocate and commit a later XID, causing latestCompletedXid to pass the first backend's XID, before that value became visible in the ProcArray. That would break GetOldestXmin, as discussed below. We allow GetNewTransactionId to store the XID into MyProc->xid (or the subxid array) without taking ProcArrayLock. This was once necessary to avoid deadlock; while that is no longer the case, it's still beneficial for performance. We are thereby relying on fetch/store of an XID to be atomic, else other backends might see a partially-set XID. This also means that readers of the ProcArray xid fields must be careful to fetch a value only once, rather than assume they can read it multiple times and get the same answer each time. Another important activity that uses the shared ProcArray is GetOldestXmin, which must determine a lower bound for the oldest xmin of any active MVCC snapshot, system-wide. Each individual backend advertises the smallest xmin of its own snapshots in MyProc->xmin, or zero if it currently has no live snapshots (eg, if it's between transactions or hasn't yet set a snapshot for a new transaction). GetOldestXmin takes the MIN() of the valid xmin fields. It does this with only shared lock on ProcArrayLock, which means there is a potential race condition against other backends doing GetSnapshotData concurrently: we must be certain that a concurrent backend that is about to set its xmin does not compute an xmin less than what GetOldestXmin returns. We ensure that by including all the active XIDs into the MIN() calculation, along with the valid xmins. The rule that transactions can't exit without taking exclusive ProcArrayLock ensures that concurrent holders of shared ProcArrayLock will compute the same minimum of currently-active XIDs: no xact, in particular not the oldest, can exit while we hold shared ProcArrayLock. So GetOldestXmin's view of the minimum active XID will be the same as that of any concurrent GetSnapshotData, and so it can't produce an overestimate. If there is no active transaction at all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound for the xmin that might be computed by concurrent or later GetSnapshotData calls. (We know that no XID less than this could be about to appear in the ProcArray, because of the XidGenLock interlock discussed above.) GetSnapshotData also performs an oldest-xmin calculation (which had better match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used for some tuple age cutoff checks where a fresh call of GetOldestXmin seems too expensive. Note that while it is certain that two concurrent executions of GetSnapshotData will compute the same xmin for their own snapshots, as argued above, it is not certain that they will arrive at the same estimate of RecentGlobalXmin. This is because we allow XID-less transactions to clear their MyProc->xmin asynchronously (without taking ProcArrayLock), so one execution might see what had been the oldest xmin, and another not. This is OK since RecentGlobalXmin need only be a valid lower bound. As noted above, we are already assuming that fetch/store of the xid fields is atomic, so assuming it for xmin as well is no extra risk.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Hannu KrosingДата:
Сообщение: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)