Обсуждение: [PATCH] Transaction traceability - txid_status(bigint)

Поиск
Список
Период
Сортировка

[PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
Hi all

Following on from 

bigint txids vs 'xid' type, new txid_recent(bigint) => xid


here's a patch that implements a txid_status(bigint) function to report the commit status of a function.

If an application is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned. 

A future protocol enhancement to report txid assignment would be very useful, but quite separate to this.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 20 August 2016 at 21:24, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all

Following on from 

bigint txids vs 'xid' type, new txid_recent(bigint) => xid


Ahem. Forgot to squash in a fixup commit. Updated patch of txid_status(bigint) attachd.

A related patch follows, adding a new txid_current_ifassigned(bigint) function as suggested by Jim Nasby. It's usefully connected to txid_status() and might as well be added at the same time.

Since it builds on the same history I've also attached an updated version of txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the above-linked thread.

Finally, and not intended for commit, is a useful test function I wrote to cause extremely rapid xid wraparound, bundled up into a src/test/regress test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2 seconds if fsync is off, making it handy for testing.  Posting so others can use it for their own test needs later and because it's useful for testing these patches that touch on the xid epoch.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Sat, Aug 20, 2016 at 9:46 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Ahem. Forgot to squash in a fixup commit. Updated patch of
> txid_status(bigint) attachd.
>
> A related patch follows, adding a new txid_current_ifassigned(bigint)
> function as suggested by Jim Nasby. It's usefully connected to txid_status()
> and might as well be added at the same time.
>
> Since it builds on the same history I've also attached an updated version of
> txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the
> above-linked thread.
>
> Finally, and not intended for commit, is a useful test function I wrote to
> cause extremely rapid xid wraparound, bundled up into a src/test/regress
> test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2
> seconds if fsync is off, making it handy for testing.  Posting so others can
> use it for their own test needs later and because it's useful for testing
> these patches that touch on the xid epoch.

I think you should use underscores to separate all of the words
instead of only some of them.

Also, note that the corresponding internal function is
GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
rather than txid_current_if_assigned(), but you could argue that your
naming is better...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com> wrote:
 
I think you should use underscores to separate all of the words
instead of only some of them.

Right. Will fix.

Thanks for taking a look.
 
Also, note that the corresponding internal function is
GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
rather than txid_current_if_assigned(), but you could argue that your
naming is better.

Yeah, I do argue that in this case. Not a hugely strong opinion, but we refer to txid_current() in the docs as:

"get current transaction ID, assigning a new one if the current transaction does not have one"

so I thought it'd be worth being consistent with that. It's not like txid_current() mirrors the name of GetTopTransactionId() after all ;) and I think it's more important to be consistent with what the user sees than with the code.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com> wrote:
 

I think you should use underscores to separate all of the words
instead of only some of them.


ifassigned => if_assigned

ifrecent=> if_recent

Updated patch series attached. As before, 0-4 intended for commit, 5 just because it'll be handy to have around for people doing wraparound related testing.

Again, thanks for taking a look. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Petr Jelinek
Дата:
On 23/08/16 02:55, Craig Ringer wrote:
> On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>> wrote:
>
>
>
>     I think you should use underscores to separate all of the words
>     instead of only some of them.
>
>
> ifassigned => if_assigned
>
> ifrecent=> if_recent
>
> Updated patch series attached. As before, 0-4 intended for commit, 5
> just because it'll be handy to have around for people doing wraparound
> related testing.

I guess you mean 0-3 for commit and 4 is just handy?
From the point of code this patch seems good to me.

I do wonder about the 3rd patch though. I wonder if it would not be 
better to have the opposite function instead - converting xid to txid as 
that will always work and does not have to have the NULL case and would 
be simpler in terms of code.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
<p dir="ltr">On 23 Aug 2016 16:02, "Petr Jelinek" <<a
href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>>wrote:<br /> ><br /> > On 23/08/16 02:55, Craig
Ringerwrote:<br /> >><br /> >> On 23 August 2016 at 01:03, Robert Haas <<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a><br/> >> <mailto:<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>>wrote:<br /> >><br /> >><br />
>><br/> >>     I think you should use underscores to separate all of the words<br /> >>     instead
ofonly some of them.<br /> >><br /> >><br /> >> ifassigned => if_assigned<br /> >><br />
>>ifrecent=> if_recent<br /> >><br /> >> Updated patch series attached. As before, 0-4 intended
forcommit, 5<br /> >> just because it'll be handy to have around for people doing wraparound<br /> >>
relatedtesting.<br /> ><br /> ><br /> > I guess you mean 0-3 for commit and 4 is just handy?<p dir="ltr">Er.
Right.1-3. 4 just as handy test/tool.<p dir="ltr">1 most important and useful. Then 2. Then 3.<p dir="ltr">> From
thepoint of code this patch seems good to me.<p dir="ltr">Thanks.<p dir="ltr">> I do wonder about the 3rd patch
though.I wonder if it would not be better to have the opposite function instead - converting xid to txid as that will
alwayswork and does not have to have the NULL case and would be simpler in terms of code.<p dir="ltr">Yeah, but it
wouldn'tsolve the need to take txid_current() output and do stuff with it other than ordinal comparison. Like pass to
committs functions and others that take xid. If we extend all funcs that take xid to take bigint instead, they just get
touse the same epoch logic in them, complete with some way to deal with wrapped xids sensibly. It has to be done
somewhere.Though it's prettier if hidden from the user.<p dir="ltr">More importantly imo, txid => bigint has to
assumethe current epoch. We have no way to make sure the user doesn't try to use something already wrapped.<p
dir="ltr">Idon't mind if everyone decides it's better to make xid go away and use bigint everywhere user facing. Or
evena new bigxid type. More work than I can really afford but can manage; shouldn't block #1 and #2 though as they
alreadyuse bigint. 

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Petr Jelinek
Дата:
On 23/08/16 11:27, Craig Ringer wrote:
> On 23 Aug 2016 16:02, "Petr Jelinek" <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>>
>> On 23/08/16 02:55, Craig Ringer wrote:
>>>
>>> On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>
>>> <mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>> wrote:
>>>
>>>
>>>
>>>     I think you should use underscores to separate all of the words
>>>     instead of only some of them.
>>>
>>>
>>> ifassigned => if_assigned
>>>
>>> ifrecent=> if_recent
>>>
>>> Updated patch series attached. As before, 0-4 intended for commit, 5
>>> just because it'll be handy to have around for people doing wraparound
>>> related testing.
>>
>>
>> I guess you mean 0-3 for commit and 4 is just handy?
>
> Er. Right. 1-3. 4 just as handy test/tool.
>
> 1 most important and useful. Then 2. Then 3.
>
>> From the point of code this patch seems good to me.
>
> Thanks.
>
>> I do wonder about the 3rd patch though. I wonder if it would not be
> better to have the opposite function instead - converting xid to txid as
> that will always work and does not have to have the NULL case and would
> be simpler in terms of code.
>
> Yeah, but it wouldn't solve the need to take txid_current() output and
> do stuff with it other than ordinal comparison. Like pass to commit ts
> functions and others that take xid. If we extend all funcs that take xid
> to take bigint instead, they just get to use the same epoch logic in
> them, complete with some way to deal with wrapped xids sensibly. It has
> to be done somewhere. Though it's prettier if hidden from the user.
>
> More importantly imo, txid => bigint has to assume the current epoch. We
> have no way to make sure the user doesn't try to use something already
> wrapped.
>

Okay, fair points.

> I don't mind if everyone decides it's better to make xid go away and use
> bigint everywhere user facing. Or even a new bigxid type. More work than
> I can really afford but can manage; shouldn't block #1 and #2 though as
> they already use bigint.
>

I don't think that would be very straightforward to be honest. I guess 
for what you want to achieve the approach chosen is the best one then.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Updated patch series attached. As before, 0-4 intended for commit, 5 just
> because it'll be handy to have around for people doing wraparound related
> testing.
>
> Again, thanks for taking a look.

/me reviews a bit more deeply.

In 0001, it seems to me that "in-progress" should be "in progress".  I
don't think it's normal to hyphenate that.  We have admittedly
sometimes done so, but:

[rhaas pgsql]$ git grep 'in-progress' | wc -l     63
[rhaas pgsql]$ git grep 'in progress' | wc -l    346

It may make sense to speak of an "in-progress transaction" but I would
say "the transaction is in progress" not "the transaction is
in-progress", which seems to me to argue for a space as the proper
separator here.

Also:

+CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted');
+
+CREATE FUNCTION
+  txid_status(txid bigint)
+RETURNS txid_status
+LANGUAGE sql
+VOLATILE PARALLEL SAFE
+AS $$
+SELECT CASE
+  WHEN s IS NULL THEN NULL::txid_status
+  WHEN s = -1 THEN 'aborted'::txid_status
+  WHEN s = 0 THEN 'in-progress'::txid_status
+  WHEN s = 1 THEN 'committed'::txid_status
+END
+FROM pg_catalog.txid_status_internal($1) s;
+$$;
+
+COMMENT ON FUNCTION txid_status(bigint)
+IS 'get commit status of given recent xid or null if too old';

I'm not really that keen on this approach.  I don't think we need to
introduce a new data type for this, and I would rather not use SQL,
either.  It would be faster and simpler to just return the appropriate
string from a C function defined directly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Tue, Aug 23, 2016 at 10:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Updated patch series attached. As before, 0-4 intended for commit, 5 just
>> because it'll be handy to have around for people doing wraparound related
>> testing.
>>
>> Again, thanks for taking a look.
>
> /me reviews a bit more deeply.
>
> In 0001, ...

0002 looks good, so I committed it.   You forgot a function prototype
for the new SQL-callable function, though, so I added that.  For me,
it generates a compiler warning if that's missing; you might want to
try to achieve a similar setup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 August 2016 at 22:18, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Updated patch series attached. As before, 0-4 intended for commit, 5 just
> because it'll be handy to have around for people doing wraparound related
> testing.
>
> Again, thanks for taking a look.

/me reviews a bit more deeply.

In 0001, it seems to me that "in-progress" should be "in progress". 

Fine by me. I was on the fence about it anyway.

+CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted');
 
I'm not really that keen on this approach.  I don't think we need to
introduce a new data type for this, and I would rather not use SQL,
either.  It would be faster and simpler to just return the appropriate
string from a C function defined directly.

Also fine by me. You're right, keep it simple. It means the potential set of values isn't discoverable the same way, but ... meh. Using it usefully means reading the docs anyway.

The remaining 2 patches of interest are attached - txid_status() and txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().

Now I'd best stop pretending I'm in a sensible timezone.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Also fine by me. You're right, keep it simple. It means the potential set of
> values isn't discoverable the same way, but ... meh. Using it usefully means
> reading the docs anyway.
>
> The remaining 2 patches of interest are attached - txid_status() and
> txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().
>
> Now I'd best stop pretending I'm in a sensible timezone.

I reviewed this version some more and found some more problems.

+    uint32 xid_epoch = (uint32)(xid_with_epoch >>32);
+    TransactionId xid = (TransactionId)(xid_with_epoch);

I think this is not project style.  In particular, I think that the
first one needs a space after the cast and another space before the
32; and I think the second one has an unnecessary set of parentheses
and needs a space added.

+/*
+ * Underlying implementation of txid_status, which is mapped to an enum in
+ * system_views.sql.
+ */

Not any more...

+                 errmsg("transaction ID "UINT64_FORMAT" is an invalid xid",
+                    xid_with_epoch)));

Spacing.

+    if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
+        status = gettext_noop("in progress");
+    else if (TransactionIdDidCommit(xid))
+        status = gettext_noop("committed");
+    else if (TransactionIdDidAbort(xid))
+        status = gettext_noop("aborted");
+    else
+        /* shouldn't happen */
+        ereport(ERROR,
+            (errmsg_internal("unable to determine commit status of
xid "UINT64_FORMAT, xid)));

Maybe I'm all wet here, but it seems like there might be a problem
here if the XID is older than the CLOG truncation point but less than
one epoch old. get_xid_in_recent_past only guarantees that the
transaction is less than one epoch old, not that we still have CLOG
data for it.  And there's nothing to keep NextXID from advancing under
us, so if somebody asks about a transaction that's just under 2^32
transactions old, then get_xid_in_recent_past could say it's valid,
then NextXID advances and we look up the XID extracted from the txid
and get the status of the new transaction instead of the old one!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 24 August 2016 at 03:10, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> > Also fine by me. You're right, keep it simple. It means the potential set of
> > values isn't discoverable the same way, but ... meh. Using it usefully means
> > reading the docs anyway.
> >
> > The remaining 2 patches of interest are attached - txid_status() and
> > txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().
> >
> > Now I'd best stop pretending I'm in a sensible timezone.
>
> I reviewed this version some more and found some more problems.


Thanks. It took me a few days to prep a new patch as I found another
issue in the process. Updated patch attached.

The updated series starts (0001) with a change to slru.c to release
the control lock when throwing an exception so that we don't deadlock
with ourselves when re-entering slru.c; explanation below.

Then there's the txid_status (0002) patch with fixes, then
txid_convert_if_recent(0003).

I omitted txid_incinerate() ; I have an updated version that sets xact
status to aborted for burned xacts instead of leaving them at 0
(in-progress), but haven't had time to finish it so it doesn't try to
blindly set the status of xacts on pages where it didn't hold
XidGenLock when the page was added to the clog.

> +    uint32 xid_epoch = (uint32)(xid_with_epoch >>32);
> +    TransactionId xid = (TransactionId)(xid_with_epoch);
>
> I think this is not project style.  In particular, I think that the
> first one needs a space after the cast and another space before the
> 32; and I think the second one has an unnecessary set of parentheses
> and needs a space added.


OK, no problems. I didn't realise spacing around casts was specified.

>
> +/*
> + * Underlying implementation of txid_status, which is mapped to an enum in
> + * system_views.sql.
> + */
>
> Not any more...


That's why I shouldn't revise a patch at 1am ;)

>
> +    if (TransactionIdIsCurrentTransactionId(xid) ||
> TransactionIdIsInProgress(xid))
> +        status = gettext_noop("in progress");
> +    else if (TransactionIdDidCommit(xid))
> +        status = gettext_noop("committed");
> +    else if (TransactionIdDidAbort(xid))
> +        status = gettext_noop("aborted");
> +    else
> +        /* shouldn't happen */
> +        ereport(ERROR,
> +            (errmsg_internal("unable to determine commit status of
> xid "UINT64_FORMAT, xid)));
>
> Maybe I'm all wet here, but it seems like there might be a problem
> here if the XID is older than the CLOG truncation point but less than
> one epoch old. get_xid_in_recent_past only guarantees that the
> transaction is less than one epoch old, not that we still have CLOG
> data for it.


Good point. The call would then fail with something like

  ERROR:  could not access status of transaction 778793573
  DETAIL:  could not open file "pg_clog/02E6": No such file or directory

This probably didn't come up in my wraparound testing because I'm
aggressively forcing wraparound by writing a lot of clog very quickly
under XidGenLock, and because I'm mostly looking at xacts that are
either recent or past the xid boundary. I've added better add coverage
for xacts around 2^30 behind the nextXid to the wraparound tests;
can't add them to txid.sql since the xid never gets that far in normal
regression testing.

What I'd really like is to be able to ask transam.c to handle the
xid_in_recent_past logic, treating an attempt to read an xid from
beyond the clog truncation threshold as a soft error indicating
unknown xact state. But that involves delving into slru.c, and I
really, really don't want to touch that for what should be a simple
and pretty noninvasive utility function.

A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
except for two issues:

* I see no accepted way to access the errcode etc from within the
PG_CATCH block, though. PG_RE_THROW() can do it because pg_re_throw()
is in elog.c. I couldn't find any existing code that seems to check
details about an error thrown in a PG_TRY block, only SPI calls. I
don't want to ignore all types of errors and potentially hide
problems, so I just used geterrcode() - which is meant for errcontext
callbacks - and changed the comment to say it can be used in PG_CATCH
too. I don't see why it shouldn't be.
We should probably have some sort of PG_CATCH_INFO(varname) that
exposes the top ErrorData, but that's not needed for this patch so I
left it alone.

* TransactionIdGetStatus() releases the CLogControlLock taken by
SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
thrown from SlruReportIOError(). It seems appropriate for
SimpleLruReadPage() to release the LWLock before calling
SlruReportIOError(), so I've done that as a separate patch (now 0001).


I also removed the TransactionIdInProgress check in txid_status and
just assume it's in progress if it isn't our xid, committed or
aborted. TransactionIdInProgress looks like it's potentially more
expensive, and most of the time we'll be looking at committed or
aborted xacts anyway. I can't sanity-check TransactionIdInProgress
after commited/aborted, because there's then a race where the xact can
commit or abort after we decide it's not committed/aborted so it's not
in progress when we go to check that.



> And there's nothing to keep NextXID from advancing under
> us, so if somebody asks about a transaction that's just under 2^32
> transactions old, then get_xid_in_recent_past could say it's valid,
> then NextXID advances and we look up the XID extracted from the txid
> and get the status of the new transaction instead of the old one!

Hm, yeah. Though due to the clog truncation issue you noticed it
probably won't happen.

We could require that XidGenLock be held at least as LW_SHARED when
entering get_xid_in_recent_past(), but I'd rather not since that'd be
an otherwise-unnecessary lwlock for txid_convert_ifrecent().

Instead, I think I'll rename the wraparound flag to too_old and set it
if the xact is more than say 2^30 from the epoch struct's last_xid,
leaving a race window so ridiculously improbable that the nearly
impossible chance of failing with a clog access error is not a worry.
If the server's managing to have a proc stuck that long then it's
already on fire. We're only interested in reasonably recent xacts
since we can only work with xacts before wraparound / clog truncation.
This just moves the threshold for "reasonably recent" a bit closer.

All this certainly reinforces my view that users handling 'xid'
directly or trying to extract it from a bigint epoch-extended xid is a
bad idea that needs to go away soon.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Andres Freund
Дата:
Hi,

On 2016-08-29 11:25:39 +0800, Craig Ringer wrote:
>   ERROR:  could not access status of transaction 778793573
>   DETAIL:  could not open file "pg_clog/02E6": No such file or directory
> 
> What I'd really like is to be able to ask transam.c to handle the
> xid_in_recent_past logic, treating an attempt to read an xid from
> beyond the clog truncation threshold as a soft error indicating
> unknown xact state. But that involves delving into slru.c, and I
> really, really don't want to touch that for what should be a simple
> and pretty noninvasive utility function.

Can't you "just" check this against ShmemVariableCache->oldestXid while
holding appropriate locks?


> A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
> except for two issues:

It seems like a bad idea to PG_CATCH and not re-throw an error. That
generally is quite error prone. At the very least locking and such gets
a lot more complicated (as you noticed below).

> * TransactionIdGetStatus() releases the CLogControlLock taken by
> SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
> thrown from SlruReportIOError(). It seems appropriate for
> SimpleLruReadPage() to release the LWLock before calling
> SlruReportIOError(), so I've done that as a separate patch (now 0001).

We normally prefer to handle this via the "bulk" releases in the error
handlers. It's otherwise hard to write code that handles these
situations reliably. It's different for spinlocks, but those normally
protect far smaller regions of code.


Andres



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Sun, Aug 28, 2016 at 11:25 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> What I'd really like is to be able to ask transam.c to handle the
> xid_in_recent_past logic, treating an attempt to read an xid from
> beyond the clog truncation threshold as a soft error indicating
> unknown xact state. But that involves delving into slru.c, and I
> really, really don't want to touch that for what should be a simple
> and pretty noninvasive utility function.

I think you're going to have to bite the bullet and do that, though, because ...

> A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,

...I don't think this has any chance of being acceptable.  You can't
catch errors and not re-throw them.  That's bad news.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 29 August 2016 at 11:45, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2016-08-29 11:25:39 +0800, Craig Ringer wrote:
>>   ERROR:  could not access status of transaction 778793573
>>   DETAIL:  could not open file "pg_clog/02E6": No such file or directory
>>
>> What I'd really like is to be able to ask transam.c to handle the
>> xid_in_recent_past logic, treating an attempt to read an xid from
>> beyond the clog truncation threshold as a soft error indicating
>> unknown xact state. But that involves delving into slru.c, and I
>> really, really don't want to touch that for what should be a simple
>> and pretty noninvasive utility function.
>
> Can't you "just" check this against ShmemVariableCache->oldestXid while
> holding appropriate locks?

Hm. Yeah, I should've thought of that. Thank you.

>> A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
>> except for two issues:
>
> It seems like a bad idea to PG_CATCH and not re-throw an error. That
> generally is quite error prone. At the very least locking and such gets
> a lot more complicated (as you noticed below).

Yeah, and as I remember from the "fun" of trying to write apply errors
to tables in BDR. It wasn't my first choice.

>> * TransactionIdGetStatus() releases the CLogControlLock taken by
>> SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
>> thrown from SlruReportIOError(). It seems appropriate for
>> SimpleLruReadPage() to release the LWLock before calling
>> SlruReportIOError(), so I've done that as a separate patch (now 0001).
>
> We normally prefer to handle this via the "bulk" releases in the error
> handlers. It's otherwise hard to write code that handles these
> situations reliably. It's different for spinlocks, but those normally
> protect far smaller regions of code.

Fair enough. It's not a complex path, but there are a _lot_ of
callers, and while I can't really imagine any of them relying on the
CLogControLock being held on error it's not something I was keen to
change. I thought complicating the clog with a soft-error interface
was worse and didn't come up with a better approach.

Said better approach attached in revised series. Thanks.

My only real complaint with doing this is that it's a bit more
conservative. But in practice clog truncation probably won't follow
that far behind oldestXmin so except in fairly contrived circumstances
it won't hurt. Apps that need guarantees about how old an xid they can
get status on can hold down xmin with a replication slot, a dummy
prepared xact, or whatever. If we find that becomes a common need that
should be made simpler then appropriate API to allow apps to hold down
clog truncation w/o blocking vacuuming can be added down the track.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 29 August 2016 at 15:53, Craig Ringer <craig@2ndquadrant.com> wrote:

> Said better approach attached in revised series. Thanks.

Here's another minor update to the txid_status() and
txid_convert_if_recent() patches. The only change is moving
get_xid_in_recent_past from src/backend/utils/adt/txid.c to
src/backend/access/transam/xlog.c to permit its use by other code.
Specifically, I think it'll be needed for logical decoding on standby.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 1 September 2016 at 13:08, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 29 August 2016 at 15:53, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>> Said better approach attached in revised series. Thanks.
>
> Here's another minor update to the txid_status() and
> txid_convert_if_recent() patches. The only change is moving
> get_xid_in_recent_past from src/backend/utils/adt/txid.c to
> src/backend/access/transam/xlog.c to permit its use by other code.
> Specifically, I think it'll be needed for logical decoding on standby.

Ahem, wrong patches. Attached correctly now.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
Here's another update to these patches. While working on support for
logical decoding on standbys I noticed that I needed something like
get_xid_in_recent_past(...) there too. So I've moved it to xlog.c as
TransactionIdInRecentPast too and flipped its arguments to be more
convenient. No other changes.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Simon Riggs
Дата:
On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote:

> So I've moved it to xlog.c...

I'm pretty sure it shouldn't live in xlog.c, but there may be some
good reason I can't see yet.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
<p dir="ltr"><p dir="ltr">On 2 Sep. 2016 8:30 pm, "Simon Riggs" <<a
href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> ><br /> > On 2 September 2016 at
13:16,Craig Ringer <<a href="mailto:craig@2ndquadrant.com">craig@2ndquadrant.com</a>> wrote:<br /> ><br />
>> So I've moved it to xlog.c...<br /> ><br /> > I'm pretty sure it shouldn't live in xlog.c, but there may
besome<br /> > good reason I can't see yet.<p dir="ltr"> Ugh. Yes. transam.c would be rather saner.<p dir="ltr">Only
forthe helper to determine if an xid is recent though; txid_ status stays in adt/xact.c where it belongs along with
txid_current()etc.<p dir="ltr">> --<br /> > Simon Riggs                <a
href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br/> > PostgreSQL Development, 24x7 Support,
RemoteDBA, Training & Services<br /> 

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 2 September 2016 at 20:38, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 2 Sep. 2016 8:30 pm, "Simon Riggs" <simon@2ndquadrant.com> wrote:
>>
>> On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote:
>>
>> > So I've moved it to xlog.c...
>>
>> I'm pretty sure it shouldn't live in xlog.c, but there may be some
>> good reason I can't see yet.
>
> Ugh. Yes. transam.c would be rather saner.
>
> Only for the helper to determine if an xid is recent though; txid_ status
> stays in adt/xact.c where it belongs along with txid_current() etc.

Fixed, moved to transam.c.




--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 2 September 2016 at 21:01, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 2 September 2016 at 20:38, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 2 Sep. 2016 8:30 pm, "Simon Riggs" <simon@2ndquadrant.com> wrote:
>>>
>>> On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote:
>>>
>>> > So I've moved it to xlog.c...
>>>
>>> I'm pretty sure it shouldn't live in xlog.c, but there may be some
>>> good reason I can't see yet.
>>
>> Ugh. Yes. transam.c would be rather saner.
>>
>> Only for the helper to determine if an xid is recent though; txid_ status
>> stays in adt/xact.c where it belongs along with txid_current() etc.
>
> Fixed, moved to transam.c.

Missed that this causes an undefined reference to GetNextXidAndEpoch()
which is in xlog.c. I knew there was a reason I put it there. So most
recent patch is wrong, use the prior one.

GetNextXidAndEpoch() needs to be in xlog.c because it uses XLogCtl's
shmem copy of checkPoint.nextXidEpoch. So either transam.c needs to
#include xlog.h (which seems a bit backwards) or
TransactionIdInRecentPast() should go in xlog.c.

I don't like either really. Opinion? I'm sure we'll want this
functionality in other places as part of dealing with the problems
discussed upthread with 'xid' exposed to users.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Petr Jelinek
Дата:
On 02/09/16 15:46, Craig Ringer wrote:
> On 2 September 2016 at 21:01, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 2 September 2016 at 20:38, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> On 2 Sep. 2016 8:30 pm, "Simon Riggs" <simon@2ndquadrant.com> wrote:
>>>>
>>>> On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote:
>>>>
>>>>> So I've moved it to xlog.c...
>>>>
>>>> I'm pretty sure it shouldn't live in xlog.c, but there may be some
>>>> good reason I can't see yet.
>>>
>>> Ugh. Yes. transam.c would be rather saner.
>>>
>>> Only for the helper to determine if an xid is recent though; txid_ status
>>> stays in adt/xact.c where it belongs along with txid_current() etc.
>>
>> Fixed, moved to transam.c.
>
> Missed that this causes an undefined reference to GetNextXidAndEpoch()
> which is in xlog.c. I knew there was a reason I put it there. So most
> recent patch is wrong, use the prior one.
>
> GetNextXidAndEpoch() needs to be in xlog.c because it uses XLogCtl's
> shmem copy of checkPoint.nextXidEpoch. So either transam.c needs to
> #include xlog.h (which seems a bit backwards) or
> TransactionIdInRecentPast() should go in xlog.c.
>
> I don't like either really. Opinion? I'm sure we'll want this
> functionality in other places as part of dealing with the problems
> discussed upthread with 'xid' exposed to users.
>

You could put it to txid.c where all the other txid stuff is in?

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 2 September 2016 at 23:29, Petr Jelinek <petr@2ndquadrant.com> wrote:

> You could put it to txid.c where all the other txid stuff is in?

Yeah, even though it's in adt/ I think it'll do.

I thought I'd need get_xid_in_recent_past() for catalog_xmin hot
standby feedback, but upon closer examination the needed logic isn't
the same anymore. txid_status() wants to ensure clog lookups are safe
and limit by oldest xid, wheras the walsender doesn't actually care
about that and is just avoiding wrapped xids.

I'm just going back to how it was, all in adt/txid.c, and making it
static again. We can move it and make it non-static if a need to do so
comes up.

Attached rebased patch updated and vs master.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Thu, Sep 15, 2016 at 8:52 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 2 September 2016 at 23:29, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>> You could put it to txid.c where all the other txid stuff is in?
>
> Yeah, even though it's in adt/ I think it'll do.
>
> I thought I'd need get_xid_in_recent_past() for catalog_xmin hot
> standby feedback, but upon closer examination the needed logic isn't
> the same anymore. txid_status() wants to ensure clog lookups are safe
> and limit by oldest xid, wheras the walsender doesn't actually care
> about that and is just avoiding wrapped xids.
>
> I'm just going back to how it was, all in adt/txid.c, and making it
> static again. We can move it and make it non-static if a need to do so
> comes up.
>
> Attached rebased patch updated and vs master.

You appear to have attached the wrong patch set.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 16 September 2016 at 21:28, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 15, 2016 at 8:52 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 2 September 2016 at 23:29, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>
>>> You could put it to txid.c where all the other txid stuff is in?
>>
>> Yeah, even though it's in adt/ I think it'll do.
>>
>> I thought I'd need get_xid_in_recent_past() for catalog_xmin hot
>> standby feedback, but upon closer examination the needed logic isn't
>> the same anymore. txid_status() wants to ensure clog lookups are safe
>> and limit by oldest xid, wheras the walsender doesn't actually care
>> about that and is just avoiding wrapped xids.
>>
>> I'm just going back to how it was, all in adt/txid.c, and making it
>> static again. We can move it and make it non-static if a need to do so
>> comes up.
>>
>> Attached rebased patch updated and vs master.
>
> You appear to have attached the wrong patch set.

Whoops, multitasking fail.

Sorry for the late response, hospitals are "fun".


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Mon, Sep 19, 2016 at 9:54 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> You appear to have attached the wrong patch set.
>
> Whoops, multitasking fail.
>
> Sorry for the late response, hospitals are "fun".

I did some cleanup of 0001 (see attached) and was all set to commit it
when I realized what I think is a problem: holding XidGenLock doesn't
seem to help with the race condition between this function and CLOG
truncation, because vac_truncate_clog() updates the shared memory
limits AFTER performing the truncation.  If the order of those
operations were reversed, we'd be fine, because it would get stuck
trying to update the shared memory limits and wouldn't be able to
truncate until it did - and conversely, if it updated the shared
memory limits before we examined them, that would be OK, too, because
we'd be sure not to consult the pages that are about to be truncated.
As it is, though, I don't see that there's any real interlock here.

(BTW, the stuff you moved from clog.c to clog.h doesn't actually need
to be moved; one of the changes I made here was to undo that.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 20 September 2016 at 22:46, Robert Haas <robertmhaas@gmail.com> wrote:

> I did some cleanup of 0001 (see attached) and was all set to commit it
> when I realized what I think is a problem: holding XidGenLock doesn't
> seem to help with the race condition between this function and CLOG
> truncation, because vac_truncate_clog() updates the shared memory
> limits AFTER performing the truncation.

Thanks ... and drat.

> If the order of those
> operations were reversed, we'd be fine, because it would get stuck
> trying to update the shared memory limits and wouldn't be able to
> truncate until it did - and conversely, if it updated the shared
> memory limits before we examined them, that would be OK, too, because
> we'd be sure not to consult the pages that are about to be truncated.

I'm hesitant to mess with something that fundamental for what I was
hoping was a low-impact feature, albeit one that seems to be trying
hard not to be at every turn.  It looks pretty reasonable to update
oldestXid before clog truncation but I don't want to be wrong and
create some obscure race or crash recovery issue related to wraparound
and clog truncation. It could well be a problem if we're very close to
wraparound.

So far nothing has had any reason to care about this, since there's no
way to attempt to access an older xid in a normally functioning
system. The commit timestamp lookup code doesn't care whether the xid
is still in clog or not and nothing else does lookups based on xids
supplied by the user. If anything else did or does in future it will
have the same problem as txid_status.

We've already ruled out releasing the slru LWLock in SlruReportIOError
then PG_TRY / PG_CATCH()ing clog access errors in txid_status() per my
original approach to this issue.

So I see a few options now:

* Do nothing. Permit this race to exist, document the error, and
expect apps to cope. I'm pretty tempted to go for exactly this since
it pushes the cost onto users of the feature and doesn't make a mess
elsewhere. People who use this will typically be invoking it as a
standalone toplevel function anyway, so it's mostly just a bit of
noise in the error logs if you lose the race - and we have plenty of
other sources of that already.

* Rather than calling TransactionIdDidCommit / TransactionIdDidAbort,
call clog.c's TransactionIdGetStatus with a new missing_ok flag. That
sets a bool* missing  param added to SimpleLruReadPage_ReadOnly(...)
and in turn to SimpleLruReadPage(...) that's set instead of calling
SlruReportIOError(...). This seems rather intrusive and will add
little-used params and paths to fairly hot slru.c code so I'm not
keen.

* Take CLogControlLock LW_SHARED in txid_status() to prevent
truncation before reading oldestXid. We'd need a way to pass an
"already locked" state through TransactionIdGetStatus(...) to
SimpleLruReadPage_ReadOnly(...), which isn't great since again it's
pretty hot code.

* Don't release the slru LWLock in SlruReportIOError; instead release
CLogControlLock from txid_status on clog access failure. As before
means PG_TRY / PG_CATCH without PG_RE_THROW, but it means the only
place it affects is callers of txid_status(...). For added safety,
restrict txid_status() to being called in a toplevel virtual xact so
we know we'll finish up promptly. It's a horrible layering violation
having txid_status(...) release the clog control lock though, and
seems risky.

The only non-horrible one of those is IMO to just let the caller see
an error if they lose the race. It's a function that's intended for
use when you're dealing with indeterminate transaction state after a
server or application error anyway, so it's part of an error path
already. So I say we just document the behaviour. Because slru.c
doesn't release its LWLock on error we also need to ensure
txid_status(...) is also only called from a toplevel xact so the user
doesn't attempt to wrap it in plpgsql BEGIN ... EXCEPTION block and it
causes the xact to abort.

Will follow up with just that.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Wed, Sep 21, 2016 at 3:40 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> The only non-horrible one of those is IMO to just let the caller see
> an error if they lose the race. It's a function that's intended for
> use when you're dealing with indeterminate transaction state after a
> server or application error anyway, so it's part of an error path
> already. So I say we just document the behaviour.

I am not keen on that idea.  The errors we're likely to be exposing
are going to look like low-level internal failures, which might scare
some DBA.  Even if they don't, I think it's playing with fire.  The
system is designed on the assumption that nobody will try to look up
an XID that's too old, and if we start to violate that assumption I
think we're undermining the design integrity of the system in a way
we'll likely come to regret.  To put that more plainly, when the code
is written with the assumption that X will never happen, it's usually
a bad idea to casually add code that does X.

> Because slru.c
> doesn't release its LWLock on error we also need to ensure
> txid_status(...) is also only called from a toplevel xact so the user
> doesn't attempt to wrap it in plpgsql BEGIN ... EXCEPTION block and it
> causes the xact to abort.

I think this is muddled, because an error aborts the transaction, and
AbortTransaction() and AbortSubTransaction() start with
LWLockReleaseAll().

It might not be too hard to add a second copy of oldestXid in shared
memory that is updated before truncation rather than afterward... but
yeah, like you, I'm not finding this nearly as straightforward as
might have been hoped.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 21 September 2016 at 22:16, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 3:40 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> The only non-horrible one of those is IMO to just let the caller see
>> an error if they lose the race. It's a function that's intended for
>> use when you're dealing with indeterminate transaction state after a
>> server or application error anyway, so it's part of an error path
>> already. So I say we just document the behaviour.
>
> I am not keen on that idea.  The errors we're likely to be exposing
> are going to look like low-level internal failures, which might scare
> some DBA.  Even if they don't, I think it's playing with fire.  The
> system is designed on the assumption that nobody will try to look up
> an XID that's too old, and if we start to violate that assumption I
> think we're undermining the design integrity of the system in a way
> we'll likely come to regret.  To put that more plainly, when the code
> is written with the assumption that X will never happen, it's usually
> a bad idea to casually add code that does X.

Fair point.

> [snip]
>
> It might not be too hard to add a second copy of oldestXid in shared
> memory that is updated before truncation rather than afterward... but
> yeah, like you, I'm not finding this nearly as straightforward as
> might have been hoped.

Yeah.

I suspect that'll be the way to go, to add another copy that's updated
before clog truncation. It just seems ... unclean. Like it shouldn't
be necessary, something else isn't right. But it's probably the lowest
pain option.

I'm going to take a step back on this and see if I can spot an alternative.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 27 September 2016 at 09:23, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 21 September 2016 at 22:16, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> It might not be too hard to add a second copy of oldestXid in shared
>> memory that is updated before truncation rather than afterward... but
>> yeah, like you, I'm not finding this nearly as straightforward as
>> might have been hoped.
>
> Yeah.
>
> I suspect that'll be the way to go, to add another copy that's updated
> before clog truncation. It just seems ... unclean. Like it shouldn't
> be necessary, something else isn't right. But it's probably the lowest
> pain option.

OK, so summarizing the problem:

slru.c and clog.c have no soft-failure entrypoints where we can look
it up and fail gracefully if the xid isn't in the clog.
vac_truncate_clog() calls TruncateCLOG to chop SLRU pages from the
clog before it takes XidGenLock to advance oldestXid. So we cannot use
oldestXid protected by XidGenLock as an interlock against concurrent
clog truncation to prevent SLRU access errors looking up an xid, and
there's no other candidate lock.

This hasn't been an issue before because nobody looks up arbitrary
user-supplied XIDs in clog, we only look up things that're protected
by datfrozenxid etc. But the whole point of txid_status is to look up
user-supplied xids.

We can't just take ClogControlLock from txid_status() to block
truncation because clog.c expects to own that lock, and takes it (via
slru.c) in TransactionIdGetStatus, with no way to pass an
already_locked state. We'd self-deadlock.

Adding a second copy of oldestXid  - say pendingOldestXid - won't
actually help us unless we also take some suitable LWLock before
updating it, otherwise truncation can continue after we look at
pendingOldestXid but before we do the clog lookup. That means an extra
LWLock for each clog truncation, but compared to the I/O done during
clog truncation and the cost of the SlruScanDirectory() pre-check done
by TruncateCLOG it's nothing.

We could take XidGenLock twice, once to update this new
pendingOldestXid field and once to update oldestXid, but that's a
highly contested lock I'd rather not mess with even on a path that's
not hit much.

Instead, I've added a new LWLock, ClogTruncationLock, for that
purpose. vac_truncate_clog() takes it if it decides to attempt clog
truncation. This lock is held throughout the whole process of clog
truncation and oldestXid advance, so there's no need for a new
pendingOldestXid field in ShmemVariableCache. We just take the lock
then look at oldestXid, knowing that it's guaranteed to correspond to
an existing clog page that won't go away while we're looking.
ClogTruncationLock is utterly uncontested so it's going to have no
meaningful impact compared to all the work we do scanning the clog to
decide whether we're even going to try truncating it, etc.


(BTW, it seems like a pity that lwlocknames.txt doesn't have comments
on each lock. We could have
src/backend/storage/lmgr/generate-lwlocknames.pl transform the #
comments into /* comments */ on the generated header. Thoughts?)




-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Wed, Dec 21, 2016 at 3:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Instead, I've added a new LWLock, ClogTruncationLock, for that
> purpose. vac_truncate_clog() takes it if it decides to attempt clog
> truncation. This lock is held throughout the whole process of clog
> truncation and oldestXid advance, so there's no need for a new
> pendingOldestXid field in ShmemVariableCache. We just take the lock
> then look at oldestXid, knowing that it's guaranteed to correspond to
> an existing clog page that won't go away while we're looking.
> ClogTruncationLock is utterly uncontested so it's going to have no
> meaningful impact compared to all the work we do scanning the clog to
> decide whether we're even going to try truncating it, etc.

That makes everything that happens between when we acquire that lock
and when we release it non-interruptible, which seems undesirable.  I
think that extra copy of oldestXid is a nicer approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 22 December 2016 at 00:30, Robert Haas <robertmhaas@gmail.com> wrote:

> That makes everything that happens between when we acquire that lock
> and when we release it non-interruptible, which seems undesirable.  I
> think that extra copy of oldestXid is a nicer approach.

That's a side-effect I didn't realise. Given that, yes, I agree.

Since we don't truncate clog much, do you think it's reasonable to
just take XidGenLock again before we proceed? I'm reluctant to add
another acquisition of a frequently contested lock for something 99.9%
of the codebase won't care about, so I think it's probably better to
add a new LWLock, and I'll resubmit on that basis, but figure it's
worth asking.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 22 December 2016 at 07:49, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 22 December 2016 at 00:30, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> That makes everything that happens between when we acquire that lock
>> and when we release it non-interruptible, which seems undesirable.  I
>> think that extra copy of oldestXid is a nicer approach.
>
> That's a side-effect I didn't realise. Given that, yes, I agree.
>
> Since we don't truncate clog much, do you think it's reasonable to
> just take XidGenLock again before we proceed? I'm reluctant to add
> another acquisition of a frequently contested lock for something 99.9%
> of the codebase won't care about, so I think it's probably better to
> add a new LWLock, and I'll resubmit on that basis, but figure it's
> worth asking.

Updated.

If you think it's better to just take XidGenLock again, let me know.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 22 December 2016 at 09:55, Craig Ringer <craig@2ndquadrant.com> wrote:

> Updated.
>
> If you think it's better to just take XidGenLock again, let me know.

Here's a further update that merges in Robert's changes from the patch
posted upthread.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Thu, Dec 22, 2016 at 12:12 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 22 December 2016 at 09:55, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>> Updated.
>>
>> If you think it's better to just take XidGenLock again, let me know.
>
> Here's a further update that merges in Robert's changes from the patch
> posted upthread.

This patch contains unresolved merge conflicts.  Maybe
SetPendingTransactionIdLimit could happen in TruncateCLOG rather than
the caller.  Instead of using an LWLock, how about a spin lock?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 December 2016 at 01:26, Robert Haas <robertmhaas@gmail.com> wrote:

> This patch contains unresolved merge conflicts.

Ah, it conflicts with fe0a0b59, the PostmasterRandom changes. I
thought I'd rebased more recently than that.

> Maybe SetPendingTransactionIdLimit could happen in TruncateCLOG rather than
> the caller.

I've spent a while looking at how committs and multixact handle the
race hazard of the gap between clog truncation and shmem state change
- and the related race on standby, where the hazard is the gap between
replay of CLOG_TRUNCATE records and replay of the next checkpoint
containing updated limits.

pg_xact_commit_timestamp() and the underlying
TransactionIdGetCommitTsData are supposed to accept arbitrary xids,
like txid_status(), so they should handle this.

However, as far as I can tell, committs has the same issues presented
by txid_status(). We only update ShmemVariableCache->oldestCommitTsXid
_after_ we truncate the committs SLRU, no lock is held over the
duration, and no other lock-protected var is set before truncation. We
don't record the new limit values in COMMIT_TS_TRUNCATE records so
they only becomes known to the standby at replay of the next
checkpoint.

multixact instead runs a single critical section to write xlog, set
shmem state, and _then_ truncate the SLRUs. It's more complex than
clog or commit_ts since it has two inter-related SLRUs, though.

So: I think we should be setting ShmemVariableCache->oldestXid (under
XidGenLock) from TruncateCLOG, after we write xlog but _before_
truncation. Then the rest of SetTransactionIdLimit(...) proceeds as
before. I don't _think_ we need the critical section since we only
have the one SLRU to worry about. We should also add oldestXid to
CLOG_TRUNCATE xlog records so it is up to date on standbys.

It's a little annoying to take XidGenLock twice - once for oldestXid,
once for the other xid limits - but we can just delay advancing
oldestXid entirely until we've got a clog page to truncate away, in
which case there's a big enough cost that it doesn't matter. We have
to wait to advance the other limits until after we truncate clog to
prevent new (wrapped) xids trying to set values in clog for the
before-wrap xids we're about to truncate away.

On the upside, the need for pendingOldestXid, which always felt hacky,
goes away.

While we're at it, lets do the same thing for commit_ts. Advance its
limit before truncation, not after, and record that limit in its xlog
records for redo. There's no need for any special dancing around
there.



>  Instead of using an LWLock, how about a spin lock?

I wrote an explanation of how that wouldn't work, but then I found the
problem with standby, so it no longer matters.

I'll have to follow up with a patch soon, as it's Toddler o'Clock.

(It's interesting that most of what I'm doing at the moment is
wrestling with resource retention limits and how they're often quite
implicit. All the catalog_xmin stuff for logical decoding on standby
has parallels to this, for example, though not enough to create useful
overlap of functionality.)

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 December 2016 at 18:00, Craig Ringer <craig@2ndquadrant.com> wrote:

> I'll have to follow up with a patch soon, as it's Toddler o'Clock.

Here we go.

This patch advances oldestXid, under XidGenLock, _before_ truncating clog.

txid_status() holds XidGenLock from when it tests oldestXid until it's
done looking up clog, thus eliminating the race.

CLOG_TRUNCATE records now contain the oldestXid, so they can advance
oldestXid on a standby, or when we've truncated clog since the most
recent checkpoint on the master during recovery. It's advanced under
XidGenLock during redo to protect against this race on standby.

As outlined in my prior mail I think this is the right approach. I
don't like taking XidGenLock twice, but we don't advance datfrozenxid
much so it's not a big concern. While a separate ClogTruncationLock
could be added like in my earlier patch, oldestXid is currently under
XidGenLock and I'd rather not change that.

The biggest change here is that oldestXid is advanced separately to
the vac limits in the rest of ShmemVariableCache. As far as I can tell
we don't prevent two manual VACUUMs on different DBs from trying to
concurrently run vac_truncate_clog, so this has to be safe against two
invocations racing each other. Rather than try to lock out such
concurrency, the patch ensures that oldestXid can never go backwards.
It doesn't really matter if the vac limits go backwards, it's no worse
than what can already happen in the current code.

We cannot advance the vacuum limits before we truncate the clog away,
in case someone tries to access a very new xid (if we're near
wraparound)

I'm pretty sure that commit timestamps suffer from the same flaw as
Robert identified upthread with clog. This patch fixes the clog race,
but not the similar one in commit timestamps. Unlike the clog race
with txid_status(), the commit timestamps one is already potentially
user-visible since we allow arbitrary xids to be looked up for commit
timestamps. I'll address that separately.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 28 December 2016 at 18:00, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 23 December 2016 at 18:00, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>> I'll have to follow up with a patch soon, as it's Toddler o'Clock.
>
> Here we go.
>
> This patch advances oldestXid, under XidGenLock, _before_ truncating clog.
>
> txid_status() holds XidGenLock from when it tests oldestXid until it's
> done looking up clog, thus eliminating the race.
>
> CLOG_TRUNCATE records now contain the oldestXid, so they can advance
> oldestXid on a standby, or when we've truncated clog since the most
> recent checkpoint on the master during recovery. It's advanced under
> XidGenLock during redo to protect against this race on standby.
>
> As outlined in my prior mail I think this is the right approach. I
> don't like taking XidGenLock twice, but we don't advance datfrozenxid
> much so it's not a big concern. While a separate ClogTruncationLock
> could be added like in my earlier patch, oldestXid is currently under
> XidGenLock and I'd rather not change that.
>
> The biggest change here is that oldestXid is advanced separately to
> the vac limits in the rest of ShmemVariableCache. As far as I can tell
> we don't prevent two manual VACUUMs on different DBs from trying to
> concurrently run vac_truncate_clog, so this has to be safe against two
> invocations racing each other. Rather than try to lock out such
> concurrency, the patch ensures that oldestXid can never go backwards.
> It doesn't really matter if the vac limits go backwards, it's no worse
> than what can already happen in the current code.
>
> We cannot advance the vacuum limits before we truncate the clog away,
> in case someone tries to access a very new xid (if we're near
> wraparound)
>
> I'm pretty sure that commit timestamps suffer from the same flaw as
> Robert identified upthread with clog. This patch fixes the clog race,
> but not the similar one in commit timestamps. Unlike the clog race
> with txid_status(), the commit timestamps one is already potentially
> user-visible since we allow arbitrary xids to be looked up for commit
> timestamps. I'll address that separately.

Rebased patch attached. I've split the clog changes out from
txid_status() its self.

There is relevant discussion on the commit timestamp truncation fix
thread where the similar fix for commit_ts got committed.


https://www.postgresql.org/message-id/flat/979ff13d-0b8e-4937-01e8-2925c0adc306%402ndquadrant.com#979ff13d-0b8e-4937-01e8-2925c0adc306@2ndquadrant.com

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Mon, Jan 23, 2017 at 1:32 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Rebased patch attached. I've split the clog changes out from
> txid_status() its self.

I think it's fairly surprising that TruncateCLOG() has responsibility
for writing the xlog record that protects advancing
ShmemVariableCache->oldestXid, but not the responsibility for actually
advancing that value.  In other words, I think the AdvanceOldestXid()
call in vac_truncate_clog() should be moved into TruncateClog().
(Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the
responsibility of TruncateCommitTs().)

I think it is not correct to advance oldestXid but not oldestXidDB.
Otherwise, GetNewTransactionId() might complain about the wrong
database.

The way that SetTransactionIdLimit() now works looks a bit dangerous.
xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the
passed-in oldestXid value and written straight into shared memory.
But the shared memory copy of oldestXid could have a different value.
I'm not sure if that breaks anything, but it certainly weakens any
confidence callers might have had that all those values are consistent
with each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 24 January 2017 at 23:49, Robert Haas <robertmhaas@gmail.com> wrote:

> I think it's fairly surprising that TruncateCLOG() has responsibility
> for writing the xlog record that protects advancing
> ShmemVariableCache->oldestXid, but not the responsibility for actually
> advancing that value.  In other words, I think the AdvanceOldestXid()
> call in vac_truncate_clog() should be moved into TruncateClog().
> (Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the
> responsibility of TruncateCommitTs().)

Agreed, and done in attached.

I haven't done the same for commit_ts but will do so on the thread for
the commit_ts race fix if we go ahead with this change here.

> I think it is not correct to advance oldestXid but not oldestXidDB.
> Otherwise, GetNewTransactionId() might complain about the wrong
> database.

That's a clear oversight. Fixed in attached.

> The way that SetTransactionIdLimit() now works looks a bit dangerous.
> xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the
> passed-in oldestXid value and written straight into shared memory.
> But the shared memory copy of oldestXid could have a different value.
> I'm not sure if that breaks anything, but it certainly weakens any
> confidence callers might have had that all those values are consistent
> with each other.

This was my main hesitation with the whole thing too.

It's necessary to advance oldestXmin before we xlog the advance and
truncate clog, and necessary to advance the vacuum limits only
afterwards.

I thought it sensible to be conservative about the xid limit to
minimise the change from current behaviour, i.e. advance only up to
xid limits calculated from the oldestXid determined by the currently
vac_truncate_clog. But the current one is actually wrong; in the
(unlikely if it's possible at all) case where two vac_truncate_clog()s
A and B run with ordering A(advanceOldestXmin) B(advanceOldestXmin)
A(truncate) B(truncate) B(advanceLimits) A(advanceLimits), A's advance
of the limits would clobber  b's. Not too bad, but it'd delay
advancing the limits until the next vacuum, and some xid could get
allocated before the limits go backwards.

It's safer to take XidGenLock for slightly longer in order to capture
the oldestXid from shmem and calculate using it. That ensures we never
have any risk of going backwards. The attached updated patch does so.

BTW, I find it quite amusing that this was intended to be a small,
unintrusive patch, and now it's messing with xid limits and clog
truncation. I'm almost tempted to say we should commit it with the
(tiny) race with clog truncation in place. We certainly haven't had
any complaints about the same race from people using commit_ts. But
the race window on standby is quite large and I'd rather not knowingly
introduce new bugs.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 25 January 2017 at 13:44, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 24 January 2017 at 23:49, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think it's fairly surprising that TruncateCLOG() has responsibility
>> for writing the xlog record that protects advancing
>> ShmemVariableCache->oldestXid, but not the responsibility for actually
>> advancing that value.  In other words, I think the AdvanceOldestXid()
>> call in vac_truncate_clog() should be moved into TruncateClog().
>> (Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the
>> responsibility of TruncateCommitTs().)
>
> Agreed, and done in attached.
>
> I haven't done the same for commit_ts but will do so on the thread for
> the commit_ts race fix if we go ahead with this change here.
>
>> I think it is not correct to advance oldestXid but not oldestXidDB.
>> Otherwise, GetNewTransactionId() might complain about the wrong
>> database.
>
> That's a clear oversight. Fixed in attached.

New attached also records it in xlog and applies it to the standby.

If we want to save the 4 bytes per xmin advance (probably not worth
caring) we can instead skip setting it on the standby, in which case
it'll be potentially wrong until the next checkpoint. I'd rather make
sure it stays correct.


-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Michael Paquier
Дата:
On Wed, Jan 25, 2017 at 4:02 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> If we want to save the 4 bytes per xmin advance (probably not worth
> caring) we can instead skip setting it on the standby, in which case
> it'll be potentially wrong until the next checkpoint. I'd rather make
> sure it stays correct.

Those patches still apply and no reviews have come in yet, so moved to
CF 2017-03.
-- 
Michael



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Wed, Jan 25, 2017 at 12:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> The way that SetTransactionIdLimit() now works looks a bit dangerous.
>> xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the
>> passed-in oldestXid value and written straight into shared memory.
>> But the shared memory copy of oldestXid could have a different value.
>> I'm not sure if that breaks anything, but it certainly weakens any
>> confidence callers might have had that all those values are consistent
>> with each other.
>
> This was my main hesitation with the whole thing too.
>
> It's necessary to advance oldestXmin before we xlog the advance and
> truncate clog, and necessary to advance the vacuum limits only
> afterwards.

Well, that's why I tried to advocate that their should be two separate
XID limits in shared memory: leave what's there now the way it is, and
then add a new limit for "don't try to look up XIDs before this point:
splat".  I still think that'd be less risky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 10 March 2017 at 02:55, Robert Haas <robertmhaas@gmail.com> wrote:

> Well, that's why I tried to advocate that their should be two separate
> XID limits in shared memory: leave what's there now the way it is, and
> then add a new limit for "don't try to look up XIDs before this point:
> splat".  I still think that'd be less risky.

I'm coming back to this "cold" after an extended break, so I hope I
get the details right.

TL;DR: doing it that way duplicates a bunch of stuff and is ugly
without offering significant benefit over just fixing the original.


I started out with the approach you suggested, but it turns out to be
less helpful than you'd think. Simply advancing a new lower limit
field before beginning truncation is no help; there's then a race
where the lower-limit field can be advanced after we test it but
before we actually do the SLRU read from clog. To guard against this
it's necessary for SLRU truncation to take an exclusive lock during
which it advances the lower bound. That way a concurrent reader can
take the lock in shared mode before reading the lower bound and hold
it until it finishes the clog read, knowing that vacuum cannot then
truncate the data out from under it because it'll block trying to
advance the lower limit.

A spinlock isn't suitable for this. While we can take the lock only
briefly to update the limit field in vac_truncate_clog, in
txid_status() we have to hold it from when we test the boundary
through to when we finish the SLRU clog lookup, and that lookup does
I/O and might sleep. If we release it after testing the lower bound
but before the SLRU lookup our race comes back since vacuum can jump
in and truncate it out from under us. So now we need a new LWLock used
only for vac_truncate_clog before advancing the clog truncation.

I implemented just that - a new ClogTruncateLog in the lwlocks array
and a new field in ShmemVariableCache for the lower xid bound, IIRC.
Other than requiring an extra lwlock acquisition for vac_truncate_clog
it works fine ... for the master.

But it doesn't fix the much bigger race on the standby. We only emit
WAL for xid limits after we truncate clog, and the clog truncation
record doesn't record the new limit.

So now we need a new, somewhat redundant, xlog record and redo
function for this lower clog bound pointer. Which, really, is only
actually tracking a slightly more up to date version of oldestXid.

At that point I was just papering around a race that should just be
fixed at its source instead. Advance oldestXid before truncating clog,
and write a clog truncation record that includes the new oldestXid.

So... I can go back to the old approach and just add the new xlog
record and redo method, new lwlock, new shmemvariablecache field, etc,
if you're really concerned this approach is risky. But I'd rather fix
the original problem instead.

It might be helpful if I separate out the patch that touches oldestXid
from the rest for separate review, so I'll update here with a 2-patch
series instead of a squashed single patch soon.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Fri, Mar 10, 2017 at 2:00 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 10 March 2017 at 02:55, Robert Haas <robertmhaas@gmail.com> wrote:
>> Well, that's why I tried to advocate that their should be two separate
>> XID limits in shared memory: leave what's there now the way it is, and
>> then add a new limit for "don't try to look up XIDs before this point:
>> splat".  I still think that'd be less risky.
>
> I'm coming back to this "cold" after an extended break, so I hope I
> get the details right.

Yeah, sorry I've been away from this for a while.

> TL;DR: doing it that way duplicates a bunch of stuff and is ugly
> without offering significant benefit over just fixing the original.
>
> I started out with the approach you suggested, but it turns out to be
> less helpful than you'd think. Simply advancing a new lower limit
> field before beginning truncation is no help; there's then a race
> where the lower-limit field can be advanced after we test it but
> before we actually do the SLRU read from clog. To guard against this
> it's necessary for SLRU truncation to take an exclusive lock during
> which it advances the lower bound. That way a concurrent reader can
> take the lock in shared mode before reading the lower bound and hold
> it until it finishes the clog read, knowing that vacuum cannot then
> truncate the data out from under it because it'll block trying to
> advance the lower limit.

That's a good point which I hadn't fully considered.  On the other
hand, there really are two separate notions of the "oldest" XID.
There's the oldest XID that we can safely look up, and then there's
the oldest XID that we can't reuse.  These two are the same when no
truncation is in progress, but when a truncation is in progress then
they're different: the oldest XID that's safe to look up is the first
one after whatever we're truncating away, but the oldest XID that we
can't reuse is the newest one preceding the stuff that we're busy
truncating.  IOW, when truncation is happening, there's a portion of
the XID space whose clog files are being removed - and the XIDs that
are in that range aren't safe to look up any more, but are also not
available for reuse to prevent wraparound.  Right now, all of the
relevant fields in VariableCacheData are based on the ready-for-reuse
concept, and I don't think that switching some but not all of them to
be based on the safe-to-look-up concept necessarily qualifies as an
improvement.  It's different, but I'm not sure it's better.

What if we approached this problem from the other end?  Suppose we use
a heavyweight lock on, say, transaction ID 1 to represent the right to
truncate CLOG.  We grab this lock in exclusive mode before beginning
to truncate, and in shared mode while looking up XIDs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 11 March 2017 at 05:09, Robert Haas <robertmhaas@gmail.com> wrote:

> On the other
> hand, there really are two separate notions of the "oldest" XID.
> There's the oldest XID that we can safely look up, and then there's
> the oldest XID that we can't reuse.  These two are the same when no
> truncation is in progress, but when a truncation is in progress then
> they're different: the oldest XID that's safe to look up is the first
> one after whatever we're truncating away, but the oldest XID that we
> can't reuse is the newest one preceding the stuff that we're busy
> truncating.

Right.

My view here is that the oldest xid we cannot reuse is already guarded
by xidWrapLimit, which we advance after clog truncation. Whether as
this advances at the same time as or after we advance oldestXid and
truncate clog doesn't actually matter, we must just ensure that it
never advances _before_.

So tracking a second copy of oldestXid whose only purpose is to
recalculate xidWrapLimit serves no real purpose. It's redundant except
during vac_truncate_clog, during which time local state is sufficient
*if* we add oldestXid to the clog truncation xlog record, which we
must do anyway because:

Any number of locking hoop-jumping schemes fail to solve the problem
of outdated oldestXid information on standbys. Right now we truncate
clog and xlog the truncation before we write the new oldestXid limit
to xlog. In fact, we don't write the new xid limit to xlog until the
next checkpoint. So the standby has a huge window where its idea of
oldestXid is completely wrong, and unless we at least add the new
oldestXid to the clog truncation xlog record we can't fix that.

We only get away with this now because there's no way to look up an
arbitrary xid's status.

No locking scheme on the master can solve this, because the locks on
the master do not affect the standby or vice versa.

Therefore, we _must_ advance oldestXid (or a copy of it used only for
"oldest xid still in clog) before truncating clog.

If we're going to do that we might as well just make sure the
standby's xid limits are updated correctly when we truncate clog
rather than doing it lazily at checkpoints. Advance oldestXid before
truncating clog away, and record the new xid in the clog truncation
xlog record. On redo after master crash, and on standbys, we're
guaranteed to re-do the whole clog truncation operation - advance
oldestXid, truncate clog, advance xidWrapLimit etc - and everything
stays consistent.

I'll extract this part of the patch so it can be looked at separately,
it'll be clearer that way.

I think of it as slightly contracting then slightly expanding the xid
range window during clog truncation. Advance the oldest xid slightly
before the xidWrapLimit, so temporarily the range of xids is narrower
than 2^31. xlog it first so we ensure it's all redone on crash and on
standby. Because no lock is held throughout all of vac_truncate_clog,
make sure the ordering of the different phases between concurrent
vac_truncate_xlog runs doesn't matter.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 11 March 2017 at 14:32, Craig Ringer <craig@2ndquadrant.com> wrote:

> I'll extract this part of the patch so it can be looked at separately,
> it'll be clearer that way.

Apparently I thought that last time too since I already posted it
split up. Ahem. Working on too many different things at once.

Last-posted patches apply fine, no need for an update.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Sat, Mar 11, 2017 at 1:32 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 11 March 2017 at 05:09, Robert Haas <robertmhaas@gmail.com> wrote:
>> On the other
>> hand, there really are two separate notions of the "oldest" XID.
>> There's the oldest XID that we can safely look up, and then there's
>> the oldest XID that we can't reuse.  These two are the same when no
>> truncation is in progress, but when a truncation is in progress then
>> they're different: the oldest XID that's safe to look up is the first
>> one after whatever we're truncating away, but the oldest XID that we
>> can't reuse is the newest one preceding the stuff that we're busy
>> truncating.
>
> Right.
>
> My view here is that the oldest xid we cannot reuse is already guarded
> by xidWrapLimit, which we advance after clog truncation. Whether as
> this advances at the same time as or after we advance oldestXid and
> truncate clog doesn't actually matter, we must just ensure that it
> never advances _before_.
>
> So tracking a second copy of oldestXid whose only purpose is to
> recalculate xidWrapLimit serves no real purpose.

Hmm, so what this patch is doing changed quite a bit between January
23rd and January 25th.  In the January 23rd version, oldestXid and
oldestXidDB are changed to track the oldest XID that we can safely
look up, and the remaining related fields are still relative to the
oldest XID that can be reused.  That seems, as I said before, scary.
But in the January 25th version, *all* of the related fields have been
changed to track the oldest XID that we can safely look up, because
SetTransactionIdLimit() now uses the values set by AdvanceOldestXid()
to compute all of the other values, which seems flat-out incorrect.
AdvanceOldestXid() is called to advance that limit before clog
truncation happens, and if somebody then calls SetTransactionIdLimit()
before clog truncation is complete, we'll advanced those derived
limits prematurely.

For example, suppose vacuum #1 comes along, advances the limits,
truncates clog, and then gets descheduled.  Now vacuum #2 comes along,
advances the limits further, and then gets descheduled.  Now vacuum #1
wakes up and calls SetTransactionIdLimit() and prematurely advances
xidWrapLimit.  Oops.

The way you put it is that having a second copy of oldestXid whose
only purpose is to recompute xidWrapLimit is pointless, but the way
I'd say is that you're trying to make one variable do two jobs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 14 March 2017 at 05:43, Robert Haas <robertmhaas@gmail.com> wrote:

> For example, suppose vacuum #1 comes along, advances the limits,
> truncates clog, and then gets descheduled.  Now vacuum #2 comes along,
> advances the limits further, and then gets descheduled.  Now vacuum #1
> wakes up and calls SetTransactionIdLimit() and prematurely advances
> xidWrapLimit.  Oops.

Mm, right. And without a lock held from when oldestXid advances
through to completion of clog truncation, then taking the same lock in
SetTransactionIdLimit, there's not really a way around it.

I'm embarrassed not to have seen that.

Doing things the other way around, per the earlier patch, can cause
SetTransactionIdLimit to not to advance as far as it should.

OK, I'm convinced, a new field is safer, even if it's redundant most
of the time.

I'll introduce a new LWLock, ClogTruncationLock, which will be held
from when we advance the new clogOldestXid field through to when clog
truncation completes.

Most of the rest can stay the same. In particular, the expanded xlog
record for clog truncation will still be required.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I'll introduce a new LWLock, ClogTruncationLock, which will be held
> from when we advance the new clogOldestXid field through to when clog
> truncation completes.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 14 March 2017 at 19:57, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I'll introduce a new LWLock, ClogTruncationLock, which will be held
>> from when we advance the new clogOldestXid field through to when clog
>> truncation completes.
>
> +1.

OK, here's the revised patch. Relevant comments added where needed.

It still changes the xlog record for clog truncation to carry the new
xid, but I've left advancing oldestXid until the checkpoint as is
currently the case, and only advanced oldestClogXid when we replay
clog truncation.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Mon, Mar 20, 2017 at 1:38 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 14 March 2017 at 19:57, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> I'll introduce a new LWLock, ClogTruncationLock, which will be held
>>> from when we advance the new clogOldestXid field through to when clog
>>> truncation completes.
>>
>> +1.
>
> OK, here's the revised patch. Relevant comments added where needed.
>
> It still changes the xlog record for clog truncation to carry the new
> xid, but I've left advancing oldestXid until the checkpoint as is
> currently the case, and only advanced oldestClogXid when we replay
> clog truncation.

/me smacks forehead.  Actually, it should be CLogTruncationLock, with
a capital L, for consistency with CLogControlLock.

The new lock needs to be added to the table in monitoring.sgml.

I don't think the new header comments in TransactionIdDidCommit and
TransactionIdDidAbort are super-clear.  I'm not sure you're going to
be able to explain it there in a reasonable number of words, but I
think that speaking of "testing against oldestClogXid" will leave
people wondering what exactly that means. Maybe just write "caller is
responsible for ensuring that the clog records covering XID being
looked up can't be truncated away while the lookup is in progress",
and then leave the bit about CLogTruncationLock to be explained by the
callers that do that.  Or you could drop these comments entirely.

Overall, though, I think that 0001 looks far better than any previous
iteration.  It's simple.  It looks safe.  It seems unlikely to break
anything that works now.  Woo hoo!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 22 March 2017 at 01:49, Robert Haas <robertmhaas@gmail.com> wrote:

> /me smacks forehead.  Actually, it should be CLogTruncationLock, with
> a capital L, for consistency with CLogControlLock.

Will do.

> The new lock needs to be added to the table in monitoring.sgml.

Same.

> I don't think the new header comments in TransactionIdDidCommit and
> TransactionIdDidAbort are super-clear.  I'm not sure you're going to
> be able to explain it there in a reasonable number of words, but I
> think that speaking of "testing against oldestClogXid" will leave
> people wondering what exactly that means. Maybe just write "caller is
> responsible for ensuring that the clog records covering XID being
> looked up can't be truncated away while the lookup is in progress",
> and then leave the bit about CLogTruncationLock to be explained by the
> callers that do that.  Or you could drop these comments entirely.

OK. I'll revisit and see if I can clean it up, otherwise remove it.

> Overall, though, I think that 0001 looks far better than any previous
> iteration.  It's simple.  It looks safe.  It seems unlikely to break
> anything that works now.  Woo hoo!

Funny that this started with "hey, here's a simple, non-invasive
function for looking up the status of an arbitrary xid".

Mature, complex systems eh?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 22 March 2017 at 09:49, Craig Ringer <craig@2ndquadrant.com> wrote:

>> Overall, though, I think that 0001 looks far better than any previous
>> iteration.  It's simple.  It looks safe.  It seems unlikely to break
>> anything that works now.  Woo hoo!
>
> Funny that this started with "hey, here's a simple, non-invasive
> function for looking up the status of an arbitrary xid".

Changes made per discussion.

Removed the comments on TransactionIdDidCommit and
TransactionIdDidAbort . It's not going to be relevant for the immense
majority of callers anyway, and callers that are looking up arbitrary
user supplied XIDs will (hopefully) be looking at
TransactionIdInRecentPast anyway.

I'll be leaving the 'xid' vs 'bigint' issues elsewhere in Pg for next
release, nowhere near time for that now.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Simon Riggs
Дата:
On 22 March 2017 at 03:35, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 22 March 2017 at 09:49, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>>> Overall, though, I think that 0001 looks far better than any previous
>>> iteration.  It's simple.  It looks safe.  It seems unlikely to break
>>> anything that works now.  Woo hoo!
>>
>> Funny that this started with "hey, here's a simple, non-invasive
>> function for looking up the status of an arbitrary xid".
>
> Changes made per discussion.

This looks good to me also. Does what we need it to do.

I was a little worried by possible performance of new lock, but its
not intended to be run frequently, so its OK.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Wed, Mar 22, 2017 at 3:13 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Changes made per discussion.
>
> This looks good to me also. Does what we need it to do.
>
> I was a little worried by possible performance of new lock, but its
> not intended to be run frequently, so its OK.

Agreed.

Reviewing 0002:

+        if (!TransactionIdIsValid(xid))
+        {
+            LWLockRelease(XidGenLock);
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("transaction ID " UINT64_FORMAT " is an invalid xid",
+                        xid_with_epoch)));
+        }

It's unnecessary to release LWLockRelease() before throwing an error,
and it's also wrong because we haven't acquired XidGenLock in this
code path.  But maybe it would be better to just remove this entirely
and instead have TransactionIdInRecentPast return false for
InvalidTransactionId.  Then we'd avoid adding a translatable message
for a case that is basically harmless to allow.

+        if (TransactionIdIsCurrentTransactionId(xid))
+            status = gettext_noop("in progress");
+        else if (TransactionIdDidCommit(xid))
+            status = gettext_noop("committed");
+        else if (TransactionIdDidAbort(xid))
+            status = gettext_noop("aborted");
+        else
+
+            /*
+             * can't test TransactionIdIsInProgress here or we race with
+             * concurrent commit/abort. There's no point anyway, since it
+             * might then commit/abort just after we check.
+             */
+            status = gettext_noop("in progress");

I am not sure this is going to do the right thing for transactions
that are aborted by a crash without managing to write an abort record.
It seems that the first check will say the transaction isn't in
progress, and the second and third checks will say it isn't either
committed or aborted since, if I am reading this correctly, they just
read clog directly.  Compare HeapTupleSatisfiesMVCC, which assumes
that a not-in-progress, not-committed transaction must necessarily
have aborted.  I think your comment is pointing to a real issue,
though.  It seems like what might be needed is to add one more check.
Before where you have the "else" clause, check if the XID is old, e.g.
precedes our snapshot's xmin.  If so, it must be committed or aborted
and, since it didn't commit, it aborted.  If not, it must've changed
from in progress to not-in-progress just as we were in the midst
checking, so labeling it as in progress is fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Simon Riggs
Дата:
On 22 March 2017 at 17:41, Robert Haas <robertmhaas@gmail.com> wrote:

> +        if (TransactionIdIsCurrentTransactionId(xid))
> +            status = gettext_noop("in progress");
> +        else if (TransactionIdDidCommit(xid))
> +            status = gettext_noop("committed");
> +        else if (TransactionIdDidAbort(xid))
> +            status = gettext_noop("aborted");
> +        else
> +
> +            /*
> +             * can't test TransactionIdIsInProgress here or we race with
> +             * concurrent commit/abort. There's no point anyway, since it
> +             * might then commit/abort just after we check.
> +             */
> +            status = gettext_noop("in progress");
>
> I am not sure this is going to do the right thing for transactions
> that are aborted by a crash without managing to write an abort record.

Yes, perhaps we should report that state as "aborted - incomplete".

And of course, we might return "subcommitted" also, which could
technically also be an abort in some cases, so we'd need to do a wait
loop on that.

Which makes me think it would be confusing to say "in progress" for
when it is our current xid, since the user might wait until it is
complete and then wait forever. Prefer it if it said "in progress -
current transaction"

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Wed, Mar 22, 2017 at 2:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 22 March 2017 at 17:41, Robert Haas <robertmhaas@gmail.com> wrote:
>> +        if (TransactionIdIsCurrentTransactionId(xid))
>> +            status = gettext_noop("in progress");
>> +        else if (TransactionIdDidCommit(xid))
>> +            status = gettext_noop("committed");
>> +        else if (TransactionIdDidAbort(xid))
>> +            status = gettext_noop("aborted");
>> +        else
>> +
>> +            /*
>> +             * can't test TransactionIdIsInProgress here or we race with
>> +             * concurrent commit/abort. There's no point anyway, since it
>> +             * might then commit/abort just after we check.
>> +             */
>> +            status = gettext_noop("in progress");
>>
>> I am not sure this is going to do the right thing for transactions
>> that are aborted by a crash without managing to write an abort record.
>
> Yes, perhaps we should report that state as "aborted - incomplete".
>
> And of course, we might return "subcommitted" also, which could
> technically also be an abort in some cases, so we'd need to do a wait
> loop on that.

I actually don't think those are things we should expose to users.
They're internal implementation details.  The user had better not care
whether an abort was the type of abort that wrote an abort record or
the type that didn't.

> Which makes me think it would be confusing to say "in progress" for
> when it is our current xid, since the user might wait until it is
> complete and then wait forever. Prefer it if it said "in progress -
> current transaction"

Hmm, or just "current transaction", maybe?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 March 2017 at 02:08, Simon Riggs <simon@2ndquadrant.com> wrote:

> And of course, we might return "subcommitted" also, which could
> technically also be an abort in some cases, so we'd need to do a wait
> loop on that.

Users generally don't see subxact IDs, so it wasn't something I was
overly concerned by. Most notably,  txid_current() doesn't report
them.

Users who want to know the commit status of an xact that had a commit
in-flight when they lost access to it due to server crash, network
loss, etc aren't going to care about subxacts. If you lose your
connection after a RELEASE SAVEPOINT you know the outer xact will get
aborted or the state of the individual subxacts.

They're visible in heap tuples, but you can only see uncommitted heap
tuples from your own top-level xid. For anything else, if you can see
the xid in xmin you know it committed. There isn't really any reason
you'd be looking up tuple xids with txid_status anyway, and if you did
you'd have to pay attention to mess like multixacts in xmax ... but
you can't tell from the xid alone if it's a multixact or not, so this
doesn't make sense with xids that could be multis anyway.

If we're going to handle subxacts specially, we should probably report
them as "sub-committed" if we find that the current xid is a committed
subxact member of an outer xact that is still in-progress. But IMO
it's pretty pointless since you won't be dealing with subxact IDs in
the application anyway.

> Which makes me think it would be confusing to say "in progress" for
> when it is our current xid, since the user might wait until it is
> complete and then wait forever. Prefer it if it said "in progress -
> current transaction"

I'm fine with "current transaction". Though I think it's kind of a
moot point as I don't see any reason for an application to ever be
doing the equivalent of
   txid_status(txid_current())

in the first place. But it's harmless enough.



-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 March 2017 at 01:41, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 3:13 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Changes made per discussion.
>>
>> This looks good to me also. Does what we need it to do.
>>
>> I was a little worried by possible performance of new lock, but its
>> not intended to be run frequently, so its OK.
>
> Agreed.
>
> Reviewing 0002:
>
> +        if (!TransactionIdIsValid(xid))
> +        {
> +            LWLockRelease(XidGenLock);
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("transaction ID " UINT64_FORMAT " is an invalid xid",
> +                        xid_with_epoch)));
> +        }
>
> It's unnecessary to release LWLockRelease() before throwing an error,
> and it's also wrong because we haven't acquired XidGenLock in this
> code path.  But maybe it would be better to just remove this entirely
> and instead have TransactionIdInRecentPast return false for
> InvalidTransactionId.  Then we'd avoid adding a translatable message
> for a case that is basically harmless to allow.

Agreed, that's better.

> +        if (TransactionIdIsCurrentTransactionId(xid))
> +            status = gettext_noop("in progress");
> +        else if (TransactionIdDidCommit(xid))
> +            status = gettext_noop("committed");
> +        else if (TransactionIdDidAbort(xid))
> +            status = gettext_noop("aborted");
> +        else
> +
> +            /*
> +             * can't test TransactionIdIsInProgress here or we race with
> +             * concurrent commit/abort. There's no point anyway, since it
> +             * might then commit/abort just after we check.
> +             */
> +            status = gettext_noop("in progress");
>
> I am not sure this is going to do the right thing for transactions
> that are aborted by a crash without managing to write an abort record.

You're right. It'll report in-progress, since the clog entry will
still be 0. We don't appear to explicitly update the clog during
recovery.

Funny, I got so focused on the clog access safety stuff in the end
that I missed the bigger picture.

Given that part of the point of this is xact status after crash,
that's kind of important. I've added a TAP test for this, as part of a
new test set, "011_crash_recovery.pl". The recovery tests don't really
have much on crash behaviour at the moment so might as well start it
and add more as we go. It doesn't make sense to add a whole new top
level TAP test just for txid_status.

(I *love* the TAP framework now, it took 10 mins to write a test
that's trivial to repeat in 5 seconds per run for this).

> It seems that the first check will say the transaction isn't in
> progress, and the second and third checks will say it isn't either
> committed or aborted since, if I am reading this correctly, they just
> read clog directly.

Right. They're not concerned with subxacts or multixacts.

> Compare HeapTupleSatisfiesMVCC, which assumes
> that a not-in-progress, not-committed transaction must necessarily
> have aborted.

Right.

We don't have a HeapTuple here, of course, so we can't test flags.
Most importantly this means we can't handle multixacts. If we ever did
decide to expose multixacts as bigint epoch-qualified xids for general
users, we'd probably reserve the high bit of the epoch the bigint xid
representation for the equivalent of HEAP_XMAX_IS_MULTI, and maybe
it's worth reserving that bit now and ERROR'ing if we find it set. But
right now we never produce a bigint xid with a multixact.

I wonder if a note in the docs warning not to cast xid to bigint is
worth it. Probably not. You have to cast xid::text::bigint which is
kind of a hint, plus it doesn't make sense to test txid_status() for
tuples' xmin/xmax . I guess you might use it with xids from
pg_stat_activity, but there's not much point when you can just look at
pg_stat_activity again to see if the proc of interest is still there
and has the same xid.

Anyway...

> I think your comment is pointing to a real issue,
> though.  It seems like what might be needed is to add one more check.
> Before where you have the "else" clause, check if the XID is old, e.g.
> precedes our snapshot's xmin.  If so, it must be committed or aborted
> and, since it didn't commit, it aborted.  If not, it must've changed
> from in progress to not-in-progress just as we were in the midst
> checking, so labeling it as in progress is fine.

That seems clear and sensible.

XidInMVCCSnapshot simply tests TransactionIdPrecedes(xid,
snapshot->xmin), as does HeapTupleSatisfiesHistoricMVCC . It seems
reasonable enough to just examine the snapshot xmin directly in
txid_status too; it's already done in replication/logical/snapbuild.c
and lmgr/predicate.c.

We're running in an SQL-called function so we'll have a good snapshot
and GetSnapshotData will have run, so GetActiveSnapshot()->xmin should
be sufficient.

Amended patch attached, with added TAP test included.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 23 March 2017 at 11:25, Craig Ringer <craig@2ndquadrant.com> wrote:

> Amended patch attached, with added TAP test included.

Managed to omit it, sigh.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Changes made per discussion.

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Changes made per discussion.
>
> Committed 0001.

Much appreciated.

Here's the 2nd patch rebased on top of master, with the TAP test
included this time. Looks ready to go.

I really appreciate the time you've taken to look at this. Point me at
anything from your team you want some outside review on.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Fri, Mar 24, 2017 at 2:27 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> Changes made per discussion.
>>
>> Committed 0001.
>
> Much appreciated.
>
> Here's the 2nd patch rebased on top of master, with the TAP test
> included this time. Looks ready to go.

Committed.

> I really appreciate the time you've taken to look at this. Point me at
> anything from your team you want some outside review on.

Thanks, that is a valuable offer which I will be pleased to accept!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Peter Eisentraut
Дата:
On 3/24/17 02:27, Craig Ringer wrote:
> On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> Changes made per discussion.
>>
>> Committed 0001.
> 
> Much appreciated.
> 
> Here's the 2nd patch rebased on top of master, with the TAP test
> included this time. Looks ready to go.

I'm experiencing hangs in the new t/011_crash_recovery.pl test.  It
seems to hang after the first call to SELECT txid_current();.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:
On 25 March 2017 at 07:31, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/24/17 02:27, Craig Ringer wrote:
>> On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>>> Changes made per discussion.
>>>
>>> Committed 0001.
>>
>> Much appreciated.
>>
>> Here's the 2nd patch rebased on top of master, with the TAP test
>> included this time. Looks ready to go.
>
> I'm experiencing hangs in the new t/011_crash_recovery.pl test.  It
> seems to hang after the first call to SELECT txid_current();.

if you add

note "txid is $xid";

after

+chomp($xid);

does it report the xid?

Alternately, can you see a 'psql' process and a backend doing a SELECT
when it stops progressing?

I'm wondering if this is a perl version/platform issue around
   $tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/;

where we're not recognising the required output from psql when we get it.

What's in src/test/recovery/tmp_check/log/regress_log_011* ?

I couldn't use PostgresNode::psql or PostgresNode::safe_psql here
because the session must stay open.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Peter Eisentraut
Дата:
> I'm wondering if this is a perl version/platform issue around
> 
>     $tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/;
> 
> where we're not recognising the required output from psql when we get it.
> 
> What's in src/test/recovery/tmp_check/log/regress_log_011* ?
> 
> I couldn't use PostgresNode::psql or PostgresNode::safe_psql here
> because the session must stay open.

The problem was that psql needs to be called with -X.  Fix committed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
David Steele
Дата:
On 3/25/17 12:12 AM, Peter Eisentraut wrote:
>> I'm wondering if this is a perl version/platform issue around
>>
>>     $tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/;
>>
>> where we're not recognising the required output from psql when we get it.
>>
>> What's in src/test/recovery/tmp_check/log/regress_log_011* ?
>>
>> I couldn't use PostgresNode::psql or PostgresNode::safe_psql here
>> because the session must stay open.
>
> The problem was that psql needs to be called with -X.  Fix committed.

It's not clear to me what remains to be done on this patch.  I feel, at 
the least, that patches 3 and 4 need to be rebased and the status set 
back to "Needs Review".

This thread has been idle for six days.  Please respond with a new patch 
by 2017-04-04 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".

-- 
-David
david@pgmasters.net



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Craig Ringer
Дата:


On 31 Mar. 2017 22:31, "David Steele" <david@pgmasters.net> wrote:
On 3/25/17 12:12 AM, Peter Eisentraut wrote:
I'm wondering if this is a perl version/platform issue around

    $tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/;

where we're not recognising the required output from psql when we get it.

What's in src/test/recovery/tmp_check/log/regress_log_011* ?

I couldn't use PostgresNode::psql or PostgresNode::safe_psql here
because the session must stay open.

The problem was that psql needs to be called with -X.  Fix committed.

It's not clear to me what remains to be done on this patch.  I feel, at the least, that patches 3 and 4 need to be rebased and the status set back to "Needs Review".

This thread has been idle for six days.  Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback".

I consider the feature complete and committed.

Patch 3 is optional and per discussion buried upthread we generally agreed that it'd be better to replace most user visible uses of the 'xid' data type with a new epoch extended 'xid64' or similar.

Patch 4, txid_incinerate, has never been intended for commit. It's a testing tool.

Patches 1 and 2 were the key parts and thanks to Robert's helpful review, advice and edits they're committed now.

Committed, done. Yay.

Re: [PATCH] Transaction traceability - txid_status(bigint)

От
David Steele
Дата:
On 3/31/17 10:46 AM, Craig Ringer wrote:
>
> Patches 1 and 2 were the key parts and thanks to Robert's helpful
> review, advice and edits they're committed now.
>
> Committed, done. Yay.

Excellent.  I have marked this a "Committed" by Robert.

One down...

-- 
-David
david@pgmasters.net



Re: [PATCH] Transaction traceability - txid_status(bigint)

От
Robert Haas
Дата:
On Fri, Mar 31, 2017 at 11:08 AM, David Steele <david@pgmasters.net> wrote:
> On 3/31/17 10:46 AM, Craig Ringer wrote:
>> Patches 1 and 2 were the key parts and thanks to Robert's helpful
>> review, advice and edits they're committed now.
>>
>> Committed, done. Yay.
>
> Excellent.  I have marked this a "Committed" by Robert.
>
> One down...

...71 to go?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company