Re: [PATCH] Transaction traceability - txid_status(bigint)
От | Craig Ringer |
---|---|
Тема | Re: [PATCH] Transaction traceability - txid_status(bigint) |
Дата | |
Msg-id | CAMsr+YFOLdvWbKSUwWh+JZvYu==ieT6qeJAks3Sk0doEzpr_Rw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PATCH] Transaction traceability - txid_status(bigint) (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [PATCH] Transaction traceability - txid_status(bigint)
(Andres Freund <andres@anarazel.de>)
Re: [PATCH] Transaction traceability - txid_status(bigint) (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 24 August 2016 at 03:10, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > > Also fine by me. You're right, keep it simple. It means the potential set of > > values isn't discoverable the same way, but ... meh. Using it usefully means > > reading the docs anyway. > > > > The remaining 2 patches of interest are attached - txid_status() and > > txid_convert_if_recent(). Thanks for committing txid_current_if_assigned(). > > > > Now I'd best stop pretending I'm in a sensible timezone. > > I reviewed this version some more and found some more problems. Thanks. It took me a few days to prep a new patch as I found another issue in the process. Updated patch attached. The updated series starts (0001) with a change to slru.c to release the control lock when throwing an exception so that we don't deadlock with ourselves when re-entering slru.c; explanation below. Then there's the txid_status (0002) patch with fixes, then txid_convert_if_recent(0003). I omitted txid_incinerate() ; I have an updated version that sets xact status to aborted for burned xacts instead of leaving them at 0 (in-progress), but haven't had time to finish it so it doesn't try to blindly set the status of xacts on pages where it didn't hold XidGenLock when the page was added to the clog. > + uint32 xid_epoch = (uint32)(xid_with_epoch >>32); > + TransactionId xid = (TransactionId)(xid_with_epoch); > > I think this is not project style. In particular, I think that the > first one needs a space after the cast and another space before the > 32; and I think the second one has an unnecessary set of parentheses > and needs a space added. OK, no problems. I didn't realise spacing around casts was specified. > > +/* > + * Underlying implementation of txid_status, which is mapped to an enum in > + * system_views.sql. > + */ > > Not any more... That's why I shouldn't revise a patch at 1am ;) > > + if (TransactionIdIsCurrentTransactionId(xid) || > TransactionIdIsInProgress(xid)) > + status = gettext_noop("in progress"); > + else if (TransactionIdDidCommit(xid)) > + status = gettext_noop("committed"); > + else if (TransactionIdDidAbort(xid)) > + status = gettext_noop("aborted"); > + else > + /* shouldn't happen */ > + ereport(ERROR, > + (errmsg_internal("unable to determine commit status of > xid "UINT64_FORMAT, xid))); > > Maybe I'm all wet here, but it seems like there might be a problem > here if the XID is older than the CLOG truncation point but less than > one epoch old. get_xid_in_recent_past only guarantees that the > transaction is less than one epoch old, not that we still have CLOG > data for it. Good point. The call would then fail with something like ERROR: could not access status of transaction 778793573 DETAIL: could not open file "pg_clog/02E6": No such file or directory This probably didn't come up in my wraparound testing because I'm aggressively forcing wraparound by writing a lot of clog very quickly under XidGenLock, and because I'm mostly looking at xacts that are either recent or past the xid boundary. I've added better add coverage for xacts around 2^30 behind the nextXid to the wraparound tests; can't add them to txid.sql since the xid never gets that far in normal regression testing. What I'd really like is to be able to ask transam.c to handle the xid_in_recent_past logic, treating an attempt to read an xid from beyond the clog truncation threshold as a soft error indicating unknown xact state. But that involves delving into slru.c, and I really, really don't want to touch that for what should be a simple and pretty noninvasive utility function. A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient, except for two issues: * I see no accepted way to access the errcode etc from within the PG_CATCH block, though. PG_RE_THROW() can do it because pg_re_throw() is in elog.c. I couldn't find any existing code that seems to check details about an error thrown in a PG_TRY block, only SPI calls. I don't want to ignore all types of errors and potentially hide problems, so I just used geterrcode() - which is meant for errcontext callbacks - and changed the comment to say it can be used in PG_CATCH too. I don't see why it shouldn't be. We should probably have some sort of PG_CATCH_INFO(varname) that exposes the top ErrorData, but that's not needed for this patch so I left it alone. * TransactionIdGetStatus() releases the CLogControlLock taken by SimpleLruReadPage_ReadOnly() on normal exit but not after an exception thrown from SlruReportIOError(). It seems appropriate for SimpleLruReadPage() to release the LWLock before calling SlruReportIOError(), so I've done that as a separate patch (now 0001). I also removed the TransactionIdInProgress check in txid_status and just assume it's in progress if it isn't our xid, committed or aborted. TransactionIdInProgress looks like it's potentially more expensive, and most of the time we'll be looking at committed or aborted xacts anyway. I can't sanity-check TransactionIdInProgress after commited/aborted, because there's then a race where the xact can commit or abort after we decide it's not committed/aborted so it's not in progress when we go to check that. > And there's nothing to keep NextXID from advancing under > us, so if somebody asks about a transaction that's just under 2^32 > transactions old, then get_xid_in_recent_past could say it's valid, > then NextXID advances and we look up the XID extracted from the txid > and get the status of the new transaction instead of the old one! Hm, yeah. Though due to the clog truncation issue you noticed it probably won't happen. We could require that XidGenLock be held at least as LW_SHARED when entering get_xid_in_recent_past(), but I'd rather not since that'd be an otherwise-unnecessary lwlock for txid_convert_ifrecent(). Instead, I think I'll rename the wraparound flag to too_old and set it if the xact is more than say 2^30 from the epoch struct's last_xid, leaving a race window so ridiculously improbable that the nearly impossible chance of failing with a clog access error is not a worry. If the server's managing to have a proc stuck that long then it's already on fire. We're only interested in reasonably recent xacts since we can only work with xacts before wraparound / clog truncation. This just moves the threshold for "reasonably recent" a bit closer. All this certainly reinforces my view that users handling 'xid' directly or trying to extract it from a bigint epoch-extended xid is a bad idea that needs to go away soon. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: