Обсуждение: XMIN semantic at peril ?
Hi all, we use XMIN in our application (http://wiki.gnumed.de) as a lightweight database-provided "data-changed" marker in a cross-transaction optimistic locking scheme. The one characteristic of XMIN we rely on in this scheme is that "XMIN changes when a (writing) transaction commits successfully". We don't care *how* it changes, just *that* it does so. Now, one kind soul has reviewed our model and suggested that we really shouldn't rely on XMIN at the end of the day for various fairly good reasons: - XMIN is an implementation detail of PostgreSQL (similar to row OIDs) - GNUmed shouldn't rely on PostgreSQL implementation details I argued that - GNUmed only relies on a very general assumption about XMIN behaviour - PG is likely to always need a way to mark tuples committed one way or another as that is part of tuple visibility (it needn't be or be named XMIN, though) - PG is unlikely (given its track record) of obscuring visibility of that commit marker Surely, good counter-arguments were brought forth: - row OIDs did disappear (or, rather, were defaulted to off) - it is uncertain whether tuple visibility semantics of PG will in the future continue to lend themselves to our cross-tx collision detection/optimistic locking scheme - particularly in the light of 8.3 HOT patches - more so in the light of the "Skytools extended transaction ID module" Of course, we do know how to alleviate the situation technically by just defining our own marker column and adding appropriate triggers to the relevant tables to update said marker. Which, in a perfect world of unlimited resources, we'd just go ahead and do. In the real world, however, I'd like to assess the likelihood of our need to embark on such a mission. So, eventually, in the near- to mid-future: How likely is XMIN (or equivalent) to become invisible to SQL level user space ? How likely is XMIN (or equivalent) to NOT change on each successful (write) transaction commit anymore ? Thanks for any insight, Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes: > How likely is XMIN (or equivalent) to become invisible to > SQL level user space ? No one has suggested this. I suppose the argument could be made that the system columns are an unwarranted intrusion on users' column namespace, but we'd probably handle that by demoting them to second-class citizens, not hiding them entirely --- there are far too many apps that rely on ctid, for instance, and I think some that are doing like you do with xmin. So as long as you don't create a user column named xmin in your tables, you could expect to access the system column. > How likely is XMIN (or equivalent) to NOT change on each > successful (write) transaction commit anymore ? No chance of that, unless we abandon MVCC for something else, which again seems highly unlikely. One question I'd have though is whether "freezing" of old tuples is likely to confuse your app. That process might get more aggressive in the future (it already is more aggressive in 8.2 than before, depending on where vacuum_freeze_min_age is set). The only argument you cited that seems impressive to me is the one about it being a Postgres-ism. Are you willing to have GNUmed tied tightly to Postgres? regards, tom lane
On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote: > The only argument you cited that seems impressive to me is the one > about it being a Postgres-ism. Are you willing to have GNUmed tied > tightly to Postgres? Personally, yes, no problem and thanks for it :-)) Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
2007/10/11, Tom Lane <tgl@sss.pgh.pa.us>: > The only argument you cited that seems impressive to me is the one > about it being a Postgres-ism. Are you willing to have GNUmed tied > tightly to Postgres? Well, at least not in all aspects, if I may cut in. (from http://www.gnumed.org/) error insert into WebLog values(586,31,"2007-10-12","80.177.203.190") Can't open file: 'WebLog.MYD'. (errno: 145) SCNR Filip
On Thu, Oct 11, 2007 at 04:18:30PM +0100, Filip Rembiałkowski wrote: > Subject: Re: [GENERAL] XMIN semantic at peril ? > > 2007/10/11, Tom Lane <tgl@sss.pgh.pa.us>: > > The only argument you cited that seems impressive to me is the one > > about it being a Postgres-ism. Are you willing to have GNUmed tied > > tightly to Postgres? > > Well, at least not in all aspects, if I may cut in. > > (from http://www.gnumed.org/) > error > insert into WebLog values(586,31,"2007-10-12","80.177.203.190") > Can't open file: 'WebLog.MYD'. (errno: 145) Well, that's just a tool we use and got nothing to do with GNUmed proper. We don't have control over the domain since the current holder doesn't seem willing to hand it over (not that that's much of a problem, we do have edit rights to that homegrown mysql based wiki). This is where all the action is, documentation-wise: http://wiki.gnumed.de Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote: > > How likely is XMIN (or equivalent) to NOT change on each > > successful (write) transaction commit anymore ? > > No chance of that, unless we abandon MVCC for something else, which > again seems highly unlikely. I figured as much but I'm just a loney MD. > One question I'd have though is whether "freezing" of old tuples is > likely to confuse your app. That process might get more aggressive > in the future (it already is more aggressive in 8.2 than before, > depending on where vacuum_freeze_min_age is set). Well, what we do is this: - read row including XMIN - do some UI stuff without open transactions - update row with "... where pk = ... and XMIN = old_xmin_from_read" If in the meantime another writer changed the data we originally read we would detect that by xmin having changed hence no row to be updated. So, yes, there is a *tiny* failure condition: - read row including XMIN - do UI stuff w/o open transaction - wait a LONG time in which - old_xmin_from_read gets frozen - and recycled on the *very same* row - attempt to write initial data with ... WHERE XMIN = old_xmin_from_initial_read In those very circumstances we would not be able to detect the collision. Now, how likely is that given sane values for vacuum_freeze_min_age and average ambulatory care patient encounter times of 5-20 minutes (which would in GNUmed be equivalent to cross-tx medical record open times - IOW the time our XMIN based optimistic locks would normally be held). Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes: > On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote: >> One question I'd have though is whether "freezing" of old tuples is >> likely to confuse your app. > Well, what we do is this: > - read row including XMIN > - do some UI stuff without open transactions > - update row with "... where pk = ... and XMIN = old_xmin_from_read" > If in the meantime another writer changed the data we > originally read we would detect that by xmin having changed > hence no row to be updated. So, yes, there is a *tiny* > failure condition: Hmm. I think the failure condition is not what you are thinking: in your example, you'd correctly conclude that some other transaction modified the row. The problem case is - read (a rather old) row including XMIN - VACUUM comes along and decides to set XMIN = FrozenTransactionId - update row with "... where pk = ... and XMIN = old_xmin_from_read" - update fails, when there is no need to fail As long as the failure is "soft", ie, you recover reasonably, this shouldn't be a big problem. But it's certainly not a scenario you should dismiss as not credible because of timescales. regards, tom lane
On Oct 11, 2007, at 11:03 AM, Tom Lane wrote: > Karsten Hilbert <Karsten.Hilbert@gmx.net> writes: >> On Thu, Oct 11, 2007 at 10:44:17AM -0400, Tom Lane wrote: >>> One question I'd have though is whether "freezing" of old tuples is >>> likely to confuse your app. > >> Well, what we do is this: > >> - read row including XMIN >> - do some UI stuff without open transactions >> - update row with "... where pk = ... and XMIN = old_xmin_from_read" > >> If in the meantime another writer changed the data we >> originally read we would detect that by xmin having changed >> hence no row to be updated. So, yes, there is a *tiny* >> failure condition: > > Hmm. I think the failure condition is not what you are thinking: in > your example, you'd correctly conclude that some other transaction > modified the row. The problem case is > > - read (a rather old) row including XMIN > - VACUUM comes along and decides to set XMIN = FrozenTransactionId > - update row with "... where pk = ... and XMIN = old_xmin_from_read" > - update fails, when there is no need to fail > > As long as the failure is "soft", ie, you recover reasonably, this > shouldn't be a big problem. But it's certainly not a scenario you > should dismiss as not credible because of timescales. If the query is always based on a primary key + XMIN, and since vacuum is the only thing that sets FrozenTransactionId, would it be unsane to change the update to - update row with "... where pk=... and XMIN IN (old_xmin_from_read, FrozenTransactionId) ? Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Erik Jones <erik@myemma.com> writes: > If the query is always based on a primary key + XMIN, and since > vacuum is the only thing that sets FrozenTransactionId, would it be > unsane to change the update to > - update row with "... where pk=... and XMIN IN (old_xmin_from_read, > FrozenTransactionId) I wouldn't risk it, because that *would* fail in the case Karsten mentioned of an update sequence taking so long that a concurrent update got frozen. regards, tom lane
On Thu, Oct 11, 2007 at 01:48:10PM -0400, Tom Lane wrote: > > If the query is always based on a primary key + XMIN, It is. > > and since > > vacuum is the only thing that sets FrozenTransactionId, would it be > > unsane to change the update to > > > - update row with "... where pk=... and XMIN IN (old_xmin_from_read, > > FrozenTransactionId) > > I wouldn't risk it, because that *would* fail in the case Karsten > mentioned of an update sequence taking so long that a concurrent > update got frozen. Which is the failure case we should be more worried about: "not detecting a valid collision" rather than "detecting a non-collision" as could happen in the scenario Tom pointed out. False-Negative vs. False-Positive, that is to say ;-) I was unable to find any reference to FrozenTransationId in the 8.2 documentation (found it in the source, though). Hence one would assume its value cannot be derived (at the SQL level) at runtime and hence it needed to be hardcoded in our Python code. Which, I dare say, would mean relying on even more obscure implementation details which was part of the argument that triggered the whole process of thought. Ne'ertheless, thanks for the suggestion. I am gaining a much better understanding of the factors involved. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Thu, Oct 11, 2007 at 12:03:47PM -0400, Tom Lane wrote: > > If in the meantime another writer changed the data we > > originally read we would detect that by xmin having changed > > hence no row to be updated. So, yes, there is a *tiny* > > failure condition: > > Hmm. I think the failure condition is not what you are thinking: in > your example, you'd correctly conclude that some other transaction > modified the row. Not really, notice: - original XMIN read, transaction is closed - a loooong time passes: - original row gets frozen, XMIN changed to FrozenTransactionId - XMIN wraps - original XMIN is *reused* on the *exact same* original row by another concurrent writer without changing the primary key - the original row is written back to the database with "... where xmin = original_xmin and pk = original_pk ..." Now the update succeeds, although the data DID change ! This is a worst-case failure but should be *very* rare. > The problem case is > > - read (a rather old) row including XMIN > - VACUUM comes along and decides to set XMIN = FrozenTransactionId > - update row with "... where pk = ... and XMIN = old_xmin_from_read" > - update fails, when there is no need to fail > > As long as the failure is "soft", ie, you recover reasonably, this > shouldn't be a big problem. But it's certainly not a scenario you > should dismiss as not credible because of timescales. Very true. I documented this in our code and set up a TODO item to switch to a AFTER-trigger updated non-system oplock column. Thanks, Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346