Обсуждение: multixacts woes

Поиск
Список
Период
Сортировка

multixacts woes

От
Robert Haas
Дата:
My colleague Thomas Munro and I have been working with Alvaro, and
also with Kevin and Amit, to fix bug #12990, a multixact-related data
corruption bug.  I somehow did not realize until very recently that we
actually use two SLRUs to keep track of multixacts: one for the
multixacts themselves (pg_multixacts/offsets) and one for the members
(pg_multixacts/members). Confusingly, members are sometimes called
offsets, and offsets are sometimes called IDs, or multixacts.  If
either of these SLRUs wraps around, we get data loss.  This comment in
multixact.c explains it well:
       /*        * Since multixacts wrap differently from transaction IDs, this logic is        * not entirely correct:
insome scenarios we could go for longer than 2        * billion multixacts without seeing any data loss, and in
 
some others we        * could get in trouble before that if the new pg_multixact/members data        * stomps on the
previouscycle's data.  For lack of a better
 
mechanism we        * use the same logic as for transaction IDs, that is, start
taking action        * halfway around the oldest potentially-existing multixact.        */       multiWrapLimit =
oldest_datminmxid+ (MaxMultiXactId >> 1);       if (multiWrapLimit < FirstMultiXactId)               multiWrapLimit +=
FirstMultiXactId;

Apparently, we have been hanging our hat since the release of 9.3.0 on
the theory that the average multixact won't ever have more than two
members, and therefore the members SLRU won't overwrite itself and
corrupt data.  This is not good enough: we need to prevent multixact
IDs from wrapping around, and we separately need to prevent multixact
members from wrapping around, and the current code was conflating
those things in a way that simply didn't work.  Recent commits by
Alvaro and by me have mostly fixed this, but there are a few loose
ends:

1. I believe that there is still a narrow race condition that cause
the multixact code to go crazy and delete all of its data when
operating very near the threshold for member space exhaustion. See
http://www.postgresql.org/message-id/CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
for the scenario and proposed fix.

2. We have some logic that causes autovacuum to run in spite of
autovacuum=off when wraparound threatens.  My commit
53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
anti-wraparound protections for multixact members that exist for
multixact IDs and for regular XIDs, but this remains an outstanding
issue.  I believe I know how to fix this, and will work up an
appropriate patch based on some of Thomas's earlier work.

3. It seems to me that there is a danger that some users could see
extremely frequent anti-mxid-member-wraparound vacuums as a result of
this work.  Granted, that beats data corruption or errors, but it
could still be pretty bad.  The default value of
autovacuum_multixact_freeze_max_age is 400000000.
Anti-mxid-member-wraparound vacuums kick in when you exceed 25% of the
addressable space, or 1073741824 total members.  So, if your typical
multixact has more than 1073741824/400000000 = ~2.68 members, you're
going to see more autovacuum activity as a result of this change.
We're effectively capping autovacuum_multixact_freeze_max_age at
1073741824/(average size of your multixacts).  If your multixacts are
just a couple of members (like 3 or 4) this is probably not such a big
deal.  If your multixacts typically run to 50 or so members, your
effective freeze age is going to drop from 400m to ~21.4m.  At that
point, I think it's possible that relminmxid advancement might start
to force full-table scans more often than would be required for
relfrozenxid advancement.  If so, that may be a problem for some
users.

What can we do about this?  Alvaro proposed back-porting his fix for
bug #8470, which avoids locking a row if a parent subtransaction
already has the same lock.  Alvaro tells me (via chat) that on some
workloads this can dramatically reduce multixact size, which is
certainly appealing.  But the fix looks fairly invasive - it changes
the return value of HeapTupleSatisfiesUpdate in certain cases, for
example - and I'm not sure it's been thoroughly code-reviewed by
anyone, so I'm a little nervous about the idea of back-porting it at
this point.  I am inclined to think it would be better to release the
fixes we have - after handling items 1 and 2 - and then come back to
this issue.  Another thing to consider here is that if the high rate
of multixact consumption is organic rather than induced by lots of
subtransactions of the same parent locking the same tuple, this fix
won't help.

Another thought that occurs to me is that if we had a freeze map, it
would radically decrease the severity of this problem, because
freezing would become vastly cheaper.  I wonder if we ought to try to
get that into 9.5, even if it means holding up 9.5.  Quite aside from
multixacts, repeated wraparound autovacuuming of static data is a
progressively more serious problem as data set sizes and transaction
volumes increase.  The possibility that multixact freezing may in some
scenarios exacerbate that problem is just icing on the cake.  The
fundamental problem is that a 32-bit address space just isn't that big
on modern hardware, and the problem is worse for multixact members
than it is for multixact IDs, because a given multixact only uses
consumes one multixact ID, but as many slots in the multixact member
space as it has members.

Thoughts, advice, etc. are most welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: multixacts woes

От
andres@anarazel.de (Andres Freund)
Дата:
Hi,

On 2015-05-08 14:15:44 -0400, Robert Haas wrote:
> Apparently, we have been hanging our hat since the release of 9.3.0 on
> the theory that the average multixact won't ever have more than two
> members, and therefore the members SLRU won't overwrite itself and
> corrupt data.

It's essentially a much older problem - it has essentially existed since
multixacts were introduced (8.1?). The consequences of it were much
lower before 9.3 though.

> 3. It seems to me that there is a danger that some users could see
> extremely frequent anti-mxid-member-wraparound vacuums as a result of
> this work.  Granted, that beats data corruption or errors, but it
> could still be pretty bad.

It's certainly possible to have workloads triggering that, but I think
it's relatively uncommon.  I in most cases I've checked the multixact
consumption rate is much lower than the xid consumption. There are some
exceptions, but often that's pretty bad code.

> At that
> point, I think it's possible that relminmxid advancement might start
> to force full-table scans more often than would be required for
> relfrozenxid advancement.  If so, that may be a problem for some
> users.

I think it's the best we can do right now.

> What can we do about this?  Alvaro proposed back-porting his fix for
> bug #8470, which avoids locking a row if a parent subtransaction
> already has the same lock.  Alvaro tells me (via chat) that on some
> workloads this can dramatically reduce multixact size, which is
> certainly appealing.  But the fix looks fairly invasive - it changes
> the return value of HeapTupleSatisfiesUpdate in certain cases, for
> example - and I'm not sure it's been thoroughly code-reviewed by
> anyone, so I'm a little nervous about the idea of back-porting it at
> this point.  I am inclined to think it would be better to release the
> fixes we have - after handling items 1 and 2 - and then come back to
> this issue.  Another thing to consider here is that if the high rate
> of multixact consumption is organic rather than induced by lots of
> subtransactions of the same parent locking the same tuple, this fix
> won't help.

I'm not inclined to backport it at this stage. Maybe if we get some
field reports about too many anti-wraparound vacuums due to this, *and*
the code has been tested in 9.5.

> Another thought that occurs to me is that if we had a freeze map, it
> would radically decrease the severity of this problem, because
> freezing would become vastly cheaper.  I wonder if we ought to try to
> get that into 9.5, even if it means holding up 9.5

I think that's not realistic. Doing this right isn't easy. And doing it
wrong can lead to quite bad results, i.e. data corruption. Doing it
under the pressure of delaying a release further and further seems like
recipe for disaster.

> Quite aside from multixacts, repeated wraparound autovacuuming of
> static data is a progressively more serious problem as data set sizes
> and transaction volumes increase.

Yes. Agreed.

> The possibility that multixact freezing may in some
> scenarios exacerbate that problem is just icing on the cake.  The
> fundamental problem is that a 32-bit address space just isn't that big
> on modern hardware, and the problem is worse for multixact members
> than it is for multixact IDs, because a given multixact only uses
> consumes one multixact ID, but as many slots in the multixact member
> space as it has members.

FWIW, I intend to either work on this myself, or help whoever seriously
tackles this, in the next cycle.

Greetings,

Andres Freund



Re: multixacts woes

От
Robert Haas
Дата:
On Fri, May 8, 2015 at 2:27 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-05-08 14:15:44 -0400, Robert Haas wrote:
>> Apparently, we have been hanging our hat since the release of 9.3.0 on
>> the theory that the average multixact won't ever have more than two
>> members, and therefore the members SLRU won't overwrite itself and
>> corrupt data.
>
> It's essentially a much older problem - it has essentially existed since
> multixacts were introduced (8.1?). The consequences of it were much
> lower before 9.3 though.

OK, I wasn't aware of that.  What exactly were the consequences before 9.3?

> I'm not inclined to backport it at this stage. Maybe if we get some
> field reports about too many anti-wraparound vacuums due to this, *and*
> the code has been tested in 9.5.

That was about what I was thinking, too.

>> Another thought that occurs to me is that if we had a freeze map, it
>> would radically decrease the severity of this problem, because
>> freezing would become vastly cheaper.  I wonder if we ought to try to
>> get that into 9.5, even if it means holding up 9.5
>
> I think that's not realistic. Doing this right isn't easy. And doing it
> wrong can lead to quite bad results, i.e. data corruption. Doing it
> under the pressure of delaying a release further and further seems like
> recipe for disaster.

Those are certainly good things to worry about.

> FWIW, I intend to either work on this myself, or help whoever seriously
> tackles this, in the next cycle.

That would be great.  I'll investigate what resources EnterpriseDB can
commit to this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: multixacts woes

От
Andres Freund
Дата:
On 2015-05-08 14:32:14 -0400, Robert Haas wrote:
> On Fri, May 8, 2015 at 2:27 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2015-05-08 14:15:44 -0400, Robert Haas wrote:
> >> Apparently, we have been hanging our hat since the release of 9.3.0 on
> >> the theory that the average multixact won't ever have more than two
> >> members, and therefore the members SLRU won't overwrite itself and
> >> corrupt data.
> >
> > It's essentially a much older problem - it has essentially existed since
> > multixacts were introduced (8.1?). The consequences of it were much
> > lower before 9.3 though.
> 
> OK, I wasn't aware of that.  What exactly were the consequences before 9.3?

I think just problems when locking a row. That's obviously much less bad
than problems when reading a row.

> > FWIW, I intend to either work on this myself, or help whoever seriously
> > tackles this, in the next cycle.
> 
> That would be great.

With "this" I mean freeze avoidance. While I obviously, having proposed
it as well at some point, think that freeze maps are a possible
solution, I'm not yet sure that it's the best solution.

> I'll investigate what resources EnterpriseDB can commit to this.

Cool.

Greetings,

Andres Freund



Re: multixacts woes

От
Josh Berkus
Дата:
On 05/08/2015 11:27 AM, Andres Freund wrote:
> Hi,
> 
> On 2015-05-08 14:15:44 -0400, Robert Haas wrote:
>> 3. It seems to me that there is a danger that some users could see
>> extremely frequent anti-mxid-member-wraparound vacuums as a result of
>> this work.  Granted, that beats data corruption or errors, but it
>> could still be pretty bad.
> 
> It's certainly possible to have workloads triggering that, but I think
> it's relatively uncommon.  I in most cases I've checked the multixact
> consumption rate is much lower than the xid consumption. There are some
> exceptions, but often that's pretty bad code.

I have a couple workloads in my pool which do consume mxids faster than
xids, due to (I think) execeptional numbers of FK conflicts.  It's
definitely unusual, though, and I'm sure they'd rather have corruption
protection and endure some more vacuums.  If we do this, though, it
might be worthwhile to backport the multixact age function, so that
affected users can check and schedule mxact wraparound vacuums
themselves, something you currently can't do on 9.3.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: multixacts woes

От
Andres Freund
Дата:
On 2015-05-08 12:57:17 -0700, Josh Berkus wrote:
> I have a couple workloads in my pool which do consume mxids faster than
> xids, due to (I think) execeptional numbers of FK conflicts.  It's
> definitely unusual, though, and I'm sure they'd rather have corruption
> protection and endure some more vacuums.  If we do this, though, it
> might be worthwhile to backport the multixact age function, so that
> affected users can check and schedule mxact wraparound vacuums
> themselves, something you currently can't do on 9.3.

That's not particularly realistic due to the requirement of an initdb to
change the catalog.

Greetings,

Andres Freund



Re: multixacts woes

От
Alvaro Herrera
Дата:
Josh Berkus wrote:

> I have a couple workloads in my pool which do consume mxids faster than
> xids, due to (I think) execeptional numbers of FK conflicts.  It's
> definitely unusual, though, and I'm sure they'd rather have corruption
> protection and endure some more vacuums.  If we do this, though, it
> might be worthwhile to backport the multixact age function, so that
> affected users can check and schedule mxact wraparound vacuums
> themselves, something you currently can't do on 9.3.

Backporting that is difficult in core, but you can do it with an
extension without too much trouble.  Also, the multixact age function
does not give you the "oldest member" which is what you need to properly
monitor the whole of this; you can add that to an extension too.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: multixacts woes

От
Alvaro Herrera
Дата:
Robert Haas wrote:
> My colleague Thomas Munro and I have been working with Alvaro, and
> also with Kevin and Amit, to fix bug #12990, a multixact-related data
> corruption bug.

Thanks for this great summary of the situation.


> 1. I believe that there is still a narrow race condition that cause
> the multixact code to go crazy and delete all of its data when
> operating very near the threshold for member space exhaustion. See
> http://www.postgresql.org/message-id/CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
> for the scenario and proposed fix.

I agree that there is a problem here.

> 2. We have some logic that causes autovacuum to run in spite of
> autovacuum=off when wraparound threatens.  My commit
> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
> anti-wraparound protections for multixact members that exist for
> multixact IDs and for regular XIDs, but this remains an outstanding
> issue.  I believe I know how to fix this, and will work up an
> appropriate patch based on some of Thomas's earlier work.

I believe autovacuum=off is fortunately uncommon, but certainly getting
this issue fixed is a good idea.

> 3. It seems to me that there is a danger that some users could see
> extremely frequent anti-mxid-member-wraparound vacuums as a result of
> this work.

I agree with the idea that the long term solution to this issue is to
make the freeze process cheaper.  I don't have any good ideas on how to
make this less severe in the interim.  You say the fix for #8470 is not
tested thoroughly enough to back-patch it just yet, and I can behind
that; so let's wait until 9.5 has been tested a bit more.

Another avenue not mentioned and possibly worth exploring is making some
more use of the multixact cache, and reuse multixacts that were
previously issued and have the same effects as the one you're interested
in: for instance, if you want a multixact with locking members
(10,20,30) and you have one for (5,10,20,30) but transaction 5 has
finished, then essentially both have the same semantics (because locks
don't have any effect the transaction has finished) so we can use it
instead of creating a new one.  I have no idea how to implement this;
obviously, having to run TransactionIdIsCurrentTransactionId for each
member on each multixact in the cache each time you want to create a new
multixact is not very reasonable.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: multixacts woes

От
Robert Haas
Дата:
On Fri, May 8, 2015 at 5:39 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> 1. I believe that there is still a narrow race condition that cause
>> the multixact code to go crazy and delete all of its data when
>> operating very near the threshold for member space exhaustion. See
>> http://www.postgresql.org/message-id/CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
>> for the scenario and proposed fix.
>
> I agree that there is a problem here.

OK, I'm glad we now agree on that, since it seemed like you were
initially unconvinced.

>> 2. We have some logic that causes autovacuum to run in spite of
>> autovacuum=off when wraparound threatens.  My commit
>> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
>> anti-wraparound protections for multixact members that exist for
>> multixact IDs and for regular XIDs, but this remains an outstanding
>> issue.  I believe I know how to fix this, and will work up an
>> appropriate patch based on some of Thomas's earlier work.
>
> I believe autovacuum=off is fortunately uncommon, but certainly getting
> this issue fixed is a good idea.

Right.

>> 3. It seems to me that there is a danger that some users could see
>> extremely frequent anti-mxid-member-wraparound vacuums as a result of
>> this work.
>
> I agree with the idea that the long term solution to this issue is to
> make the freeze process cheaper.  I don't have any good ideas on how to
> make this less severe in the interim.  You say the fix for #8470 is not
> tested thoroughly enough to back-patch it just yet, and I can behind
> that; so let's wait until 9.5 has been tested a bit more.

Sounds good.

> Another avenue not mentioned and possibly worth exploring is making some
> more use of the multixact cache, and reuse multixacts that were
> previously issued and have the same effects as the one you're interested
> in: for instance, if you want a multixact with locking members
> (10,20,30) and you have one for (5,10,20,30) but transaction 5 has
> finished, then essentially both have the same semantics (because locks
> don't have any effect the transaction has finished) so we can use it
> instead of creating a new one.  I have no idea how to implement this;
> obviously, having to run TransactionIdIsCurrentTransactionId for each
> member on each multixact in the cache each time you want to create a new
> multixact is not very reasonable.

This sounds to me like it's probably too clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: multixacts woes

От
José Luis Tallón
Дата:
On 05/08/2015 09:57 PM, Josh Berkus wrote:
> [snip]
>> It's certainly possible to have workloads triggering that, but I think
>> it's relatively uncommon.  I in most cases I've checked the multixact
>> consumption rate is much lower than the xid consumption. There are some
>> exceptions, but often that's pretty bad code.
> I have a couple workloads in my pool which do consume mxids faster than
> xids, due to (I think) execeptional numbers of FK conflicts.  It's
> definitely unusual, though, and I'm sure they'd rather have corruption
> protection and endure some more vacuums.

Seen corruption happen recently with OpenBravo on PostgreSQL 9.3.6 
(Debian; binaries upgraded from 9.3.2) in a cluster pg_upgraded from 9.2.4
(albeit with quite insufficient autovacuum / poorly configured Postgres)

I fear that this might be more widespread than we thought, depending on 
the exact workload/activity pattern.
If it would help, I can try to get hold of a copy of the cluster in 
question (if the customer keeps any copy at all)

> If we do this, though, it
> might be worthwhile to backport the multixact age function, so that
> affected users can check and schedule mxact wraparound vacuums
> themselves, something you currently can't do on 9.3.

Thanks,
    J.L.




Re: multixacts woes

От
Noah Misch
Дата:
On Fri, May 08, 2015 at 02:15:44PM -0400, Robert Haas wrote:
> My colleague Thomas Munro and I have been working with Alvaro, and
> also with Kevin and Amit, to fix bug #12990, a multixact-related data
> corruption bug.

Thanks Alvaro, Amit, Kevin, Robert and Thomas for mobilizing to get this fixed.

> 1. I believe that there is still a narrow race condition that cause
> the multixact code to go crazy and delete all of its data when
> operating very near the threshold for member space exhaustion. See
> http://www.postgresql.org/message-id/CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
> for the scenario and proposed fix.

For anyone else following along, Thomas's subsequent test verified this threat
beyond reasonable doubt:

http://www.postgresql.org/message-id/CAEepm=3C32VPJLOo45y0c3-3KWXNV2xM4jaPTSVjCRD2VG0Qgg@mail.gmail.com

> 2. We have some logic that causes autovacuum to run in spite of
> autovacuum=off when wraparound threatens.  My commit
> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
> anti-wraparound protections for multixact members that exist for
> multixact IDs and for regular XIDs, but this remains an outstanding
> issue.  I believe I know how to fix this, and will work up an
> appropriate patch based on some of Thomas's earlier work.

That would be good to have, and its implementation should be self-contained.

> 3. It seems to me that there is a danger that some users could see
> extremely frequent anti-mxid-member-wraparound vacuums as a result of
> this work.  Granted, that beats data corruption or errors, but it
> could still be pretty bad.  The default value of
> autovacuum_multixact_freeze_max_age is 400000000.
> Anti-mxid-member-wraparound vacuums kick in when you exceed 25% of the
> addressable space, or 1073741824 total members.  So, if your typical
> multixact has more than 1073741824/400000000 = ~2.68 members, you're
> going to see more autovacuum activity as a result of this change.
> We're effectively capping autovacuum_multixact_freeze_max_age at
> 1073741824/(average size of your multixacts).  If your multixacts are
> just a couple of members (like 3 or 4) this is probably not such a big
> deal.  If your multixacts typically run to 50 or so members, your
> effective freeze age is going to drop from 400m to ~21.4m.  At that
> point, I think it's possible that relminmxid advancement might start
> to force full-table scans more often than would be required for
> relfrozenxid advancement.  If so, that may be a problem for some
> users.

I don't know whether this deserves prompt remediation, but if it does, I would
look no further than the hard-coded 25% figure.  We permit users to operate
close to XID wraparound design limits.  GUC maximums force an anti-wraparound
vacuum at no later than 93.1% of design capacity.  XID assignment warns at
99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
bulkiest mandatory cushion yet, an anti-wraparound vacuum when
pg_multixact/members is just 25% full.  The pgsql-bugs thread driving that
patch did reject making it GUC-controlled, essentially on the expectation that
25% should be adequate for everyone:

http://www.postgresql.org/message-id/CA+Tgmoap6-o_5ESu5X2mBRVht_F+KNoY+oO12OvV_WekSA=ezQ@mail.gmail.com
http://www.postgresql.org/message-id/20150506143418.GT2523@alvh.no-ip.org
http://www.postgresql.org/message-id/1570859840.1241196.1430928954257.JavaMail.yahoo@mail.yahoo.com

> What can we do about this?  Alvaro proposed back-porting his fix for
> bug #8470, which avoids locking a row if a parent subtransaction
> already has the same lock.

Like Andres and yourself, I would not back-patch it.

> Another thought that occurs to me is that if we had a freeze map, it
> would radically decrease the severity of this problem, because
> freezing would become vastly cheaper.  I wonder if we ought to try to
> get that into 9.5, even if it means holding up 9.5.

Declaring that a release will wait for a particular feature has consistently
ended badly for PostgreSQL, and this feature is just in the planning stages.
If folks are ready to hit the ground running on the project, I suggest they do
so; a non-WIP submission to the first 9.6 CF would be a big accomplishment.
The time to contemplate slipping it into 9.5 comes after the patch is done.

If these aggressive ideas earn more than passing consideration, the 25%
threshold should become user-controllable after all.



Re: multixacts woes

От
Andrew Dunstan
Дата:
On 05/10/2015 10:30 AM, Robert Haas wrote:


>>> 2. We have some logic that causes autovacuum to run in spite of
>>> autovacuum=off when wraparound threatens.  My commit
>>> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
>>> anti-wraparound protections for multixact members that exist for
>>> multixact IDs and for regular XIDs, but this remains an outstanding
>>> issue.  I believe I know how to fix this, and will work up an
>>> appropriate patch based on some of Thomas's earlier work.
>> I believe autovacuum=off is fortunately uncommon, but certainly getting
>> this issue fixed is a good idea.
> Right.
>
>


I suspect it's quite a bit more common than many people imagine.

cheers

andrew



Re: multixacts woes

От
Jim Nasby
Дата:
On 5/8/15 1:15 PM, Robert Haas wrote:
> I somehow did not realize until very recently that we
> actually use two SLRUs to keep track of multixacts: one for the
> multixacts themselves (pg_multixacts/offsets) and one for the members
> (pg_multixacts/members). Confusingly, members are sometimes called
> offsets, and offsets are sometimes called IDs, or multixacts.

FWIW, since I had to re-read this bit... * We use two SLRU areas, one for storing the offsets at which the data *
startsfor each MultiXactId in the other one.  This trick allows us to * store variable length arrays of
TransactionIds.


Another way this could be 'fixed' would be to bump MultiXactOffset (but 
NOT MultiXactId) to uint64. That would increase the number of total 
members we could keep by a factor of 2^32. At that point wraparound 
wouldn't even be possible, because you can't have more than 2^31 members 
in an MXID (and there can only be 2^31 MXIDs). It may not be a trivial 
change through, because SLRUs are currently capped at 2^32 pages.

This probably isn't a good long-term solution, but it would eliminate 
the risk of really frequent freeze vacuums. It sounds like Josh at least 
knows some people that could cause big problems for.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: multixacts woes

От
Robert Haas
Дата:
On Sun, May 10, 2015 at 1:40 PM, Noah Misch <noah@leadboat.com> wrote:
> I don't know whether this deserves prompt remediation, but if it does, I would
> look no further than the hard-coded 25% figure.  We permit users to operate
> close to XID wraparound design limits.  GUC maximums force an anti-wraparound
> vacuum at no later than 93.1% of design capacity.  XID assignment warns at
> 99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
> pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
> 50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
> bulkiest mandatory cushion yet, an anti-wraparound vacuum when
> pg_multixact/members is just 25% full.

That's certainly one possible approach.  I had discounted it because
you can't really get more than a small multiple out of it, but getting
2-3x more room might indeed be enough to help some people quite a bit.
Just raising the threshold from 25% to say 40% would buy back a
healthy amount.

Or, as you suggest, we could just add a GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: multixacts woes

От
Noah Misch
Дата:
On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote:
> On Sun, May 10, 2015 at 1:40 PM, Noah Misch <noah@leadboat.com> wrote:
> > I don't know whether this deserves prompt remediation, but if it does, I would
> > look no further than the hard-coded 25% figure.  We permit users to operate
> > close to XID wraparound design limits.  GUC maximums force an anti-wraparound
> > vacuum at no later than 93.1% of design capacity.  XID assignment warns at
> > 99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
> > pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
> > 50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
> > bulkiest mandatory cushion yet, an anti-wraparound vacuum when
> > pg_multixact/members is just 25% full.
> 
> That's certainly one possible approach.  I had discounted it because
> you can't really get more than a small multiple out of it, but getting
> 2-3x more room might indeed be enough to help some people quite a bit.
> Just raising the threshold from 25% to say 40% would buy back a
> healthy amount.

Right.  It's fair to assume that the new VACUUM burden would be discountable
at a 90+% threshold, because the installations that could possibly find it
expensive are precisely those experiencing corruption today.  These reports
took eighteen months to appear, whereas some corruption originating in commit
0ac5ad5 saw reports within three months.  Therefore, sites burning
pg_multixact/members proportionally faster than both pg_multixact/offsets and
XIDs must be unusual.  Bottom line: if we do need to reduce VACUUM burden
caused by the commits you cited upthread, we almost certainly don't need more
than a 4x improvement.



Re: multixacts woes

От
Robert Haas
Дата:
On Mon, May 11, 2015 at 12:56 AM, Noah Misch <noah@leadboat.com> wrote:
> On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote:
>> On Sun, May 10, 2015 at 1:40 PM, Noah Misch <noah@leadboat.com> wrote:
>> > I don't know whether this deserves prompt remediation, but if it does, I would
>> > look no further than the hard-coded 25% figure.  We permit users to operate
>> > close to XID wraparound design limits.  GUC maximums force an anti-wraparound
>> > vacuum at no later than 93.1% of design capacity.  XID assignment warns at
>> > 99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
>> > pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
>> > 50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
>> > bulkiest mandatory cushion yet, an anti-wraparound vacuum when
>> > pg_multixact/members is just 25% full.
>>
>> That's certainly one possible approach.  I had discounted it because
>> you can't really get more than a small multiple out of it, but getting
>> 2-3x more room might indeed be enough to help some people quite a bit.
>> Just raising the threshold from 25% to say 40% would buy back a
>> healthy amount.
>
> Right.  It's fair to assume that the new VACUUM burden would be discountable
> at a 90+% threshold, because the installations that could possibly find it
> expensive are precisely those experiencing corruption today.  These reports
> took eighteen months to appear, whereas some corruption originating in commit
> 0ac5ad5 saw reports within three months.  Therefore, sites burning
> pg_multixact/members proportionally faster than both pg_multixact/offsets and
> XIDs must be unusual.  Bottom line: if we do need to reduce VACUUM burden
> caused by the commits you cited upthread, we almost certainly don't need more
> than a 4x improvement.

I looked into the approach of adding a GUC called
autovacuum_multixact_freeze_max_members to set the threshold.  I
thought to declare it this way:
       {
+               {"autovacuum_multixact_freeze_max_members",
PGC_POSTMASTER, AUTOVACUUM,
+                       gettext_noop("# of multixact members at which
autovacuum is forced to prevent multixact member wraparound."),
+                       NULL
+               },
+               &autovacuum_multixact_freeze_max_members,
+               2000000000, 10000000, 4000000000,
+               NULL, NULL, NULL
+       },

Regrettably, I think that's not going to work, because 4000000000
overflows int.  We will evidently need to denote this GUC in some
other units, unless we want to introduce config_int64.

Given your concerns, and the need to get a fix for this out the door
quickly, what I'm inclined to do for the present is go bump the
threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
changing anything else.  Your analysis shows that this is more in line
with the existing policy for multixact IDs than what I did, and it
will reduce the threat of frequent wraparound scans.  Now, it will
also increase the chances of somebody hitting the wall before
autovacuum can bail them out.  But maybe not that much.  If we need
75% of the multixact member space to complete one cycle of
anti-wraparound vacuums, we're actually very close to the point where
the system just cannot work.  If that's one big table, we're done.

Also, if somebody does have a workload where the auto-clamping doesn't
provide them with enough headroom, they can still improve things by
reducing autovacuum_multixact_freeze_max_age to a value less than the
value to which we're auto-clamping it.  If they need an effective
value of less than 10 million they are out of luck, but if that is the
case then there is a good chance that they are hosed anyway - an
anti-wraparound vacuum every 10 million multixacts sounds awfully
painful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: multixacts woes

От
Noah Misch
Дата:
On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote:
> Given your concerns, and the need to get a fix for this out the door
> quickly, what I'm inclined to do for the present is go bump the
> threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
> changing anything else.

+1

> Your analysis shows that this is more in line
> with the existing policy for multixact IDs than what I did, and it
> will reduce the threat of frequent wraparound scans.  Now, it will
> also increase the chances of somebody hitting the wall before
> autovacuum can bail them out.  But maybe not that much.  If we need
> 75% of the multixact member space to complete one cycle of
> anti-wraparound vacuums, we're actually very close to the point where
> the system just cannot work.  If that's one big table, we're done.

Agreed.



Re: multixacts woes

От
Robert Haas
Дата:
On Mon, May 11, 2015 at 10:11 AM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote:
>> Given your concerns, and the need to get a fix for this out the door
>> quickly, what I'm inclined to do for the present is go bump the
>> threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
>> changing anything else.
>
> +1
>
>> Your analysis shows that this is more in line
>> with the existing policy for multixact IDs than what I did, and it
>> will reduce the threat of frequent wraparound scans.  Now, it will
>> also increase the chances of somebody hitting the wall before
>> autovacuum can bail them out.  But maybe not that much.  If we need
>> 75% of the multixact member space to complete one cycle of
>> anti-wraparound vacuums, we're actually very close to the point where
>> the system just cannot work.  If that's one big table, we're done.
>
> Agreed.

OK, I have made this change.  Barring further trouble reports, this
completes the multixact work I plan to do for the next release.  Here
is what is outstanding:

1. We might want to introduce a GUC to control the point at which
member offset utilization begins clamping
autovacuum_multixact_freeze_max_age.  It doesn't seem wise to do
anything about this before pushing a minor release out.  It's not
entirely trivial, and it may be helpful to learn more about how the
changes already made work out in practice before proceeding.  Also, we
might not back-patch this anyway.

2. The recent changes adjust things - for good reason - so that the
safe threshold for multixact member creation is advanced only at
checkpoint time.  This means it's theoretically possible to have a
situation where autovacuum has done all it can, but because no
checkpoint has happened yet, the user can't create any more
multixacts.  Thanks to some good work by Thomas, autovacuum will
realize this and avoid spinning uselessly over every table in the
system, which is good, but you're still stuck with errors until the
next checkpoint.  Essentially, we're hoping that autovacuum will clean
things up far enough in advance of hitting the threshold where we have
to throw an error that a checkpoint will intervene before the error
starts happening.  It's possible we could improve this further, but I
think it would be unwise to mess with it right now.  It may be that
there is no real-world problem here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: multixacts woes

От
Josh Berkus
Дата:
On 05/11/2015 09:54 AM, Robert Haas wrote:
> OK, I have made this change.  Barring further trouble reports, this
> completes the multixact work I plan to do for the next release.  Here
> is what is outstanding:
> 
> 1. We might want to introduce a GUC to control the point at which
> member offset utilization begins clamping
> autovacuum_multixact_freeze_max_age.  It doesn't seem wise to do
> anything about this before pushing a minor release out.  It's not
> entirely trivial, and it may be helpful to learn more about how the
> changes already made work out in practice before proceeding.  Also, we
> might not back-patch this anyway.

-1 on back-patching a new GUC.  People don't know what to do with the
existing multixact GUCs, and without an age(multixact) function
built-in, any adjustments a user tries to make are likely to do more
harm than good.

In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
this instead of adding a new GUC?  We already have a bunch of freezing
GUCs which fewer than 1% of our user base has any idea how to set.

> 2. The recent changes adjust things - for good reason - so that the
> safe threshold for multixact member creation is advanced only at
> checkpoint time.  This means it's theoretically possible to have a
> situation where autovacuum has done all it can, but because no
> checkpoint has happened yet, the user can't create any more
> multixacts.  Thanks to some good work by Thomas, autovacuum will
> realize this and avoid spinning uselessly over every table in the
> system, which is good, but you're still stuck with errors until the
> next checkpoint.  Essentially, we're hoping that autovacuum will clean
> things up far enough in advance of hitting the threshold where we have
> to throw an error that a checkpoint will intervene before the error
> starts happening.  It's possible we could improve this further, but I
> think it would be unwise to mess with it right now.  It may be that
> there is no real-world problem here.

Given that our longest possible checkpoint timeout is an hour, is it
even hypotethically possible that we would hit a limit in that time?
How many mxact members are we talking about?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: multixacts woes

От
Alvaro Herrera
Дата:
Josh Berkus wrote:

> In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
> this instead of adding a new GUC?  We already have a bunch of freezing
> GUCs which fewer than 1% of our user base has any idea how to set.

If you have development resources to pour onto 9.5, I think it would be
better spent changing multixact usage tracking so that oldestOffset is
included in pg_control; also make pg_multixact truncation be WAL-logged.
With those changes, the need for a lot of pretty complicated code would
go away.  The fact that truncation is done by both vacuum and checkpoint
causes a lot of the mess we were in (and from which Robert and Thomas
took us --- thanks guys!).  Such a change is the first step towards
auto-tuning, I think.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: multixacts woes

От
Alvaro Herrera
Дата:
Robert Haas wrote:

> OK, I have made this change.  Barring further trouble reports, this
> completes the multixact work I plan to do for the next release.

Many thanks for all the effort here -- much appreciated.

> 2. The recent changes adjust things - for good reason - so that the
> safe threshold for multixact member creation is advanced only at
> checkpoint time.  This means it's theoretically possible to have a
> situation where autovacuum has done all it can, but because no
> checkpoint has happened yet, the user can't create any more
> multixacts.  Thanks to some good work by Thomas, autovacuum will
> realize this and avoid spinning uselessly over every table in the
> system, which is good, but you're still stuck with errors until the
> next checkpoint.  Essentially, we're hoping that autovacuum will clean
> things up far enough in advance of hitting the threshold where we have
> to throw an error that a checkpoint will intervene before the error
> starts happening.  It's possible we could improve this further, but I
> think it would be unwise to mess with it right now.  It may be that
> there is no real-world problem here.

See my response to Josh.  I think much of the current rube-goldbergian
design is due to the fact that pg_control cannot be changed in back
branches.  Going forward, I think a better plan is to include more info
in pg_control, WAL-log more operations, remove checkpoint from the loop
and have everything happen at vacuum time.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: multixacts woes

От
"Joshua D. Drake"
Дата:
On 05/11/2015 10:24 AM, Josh Berkus wrote:

> In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
> this instead of adding a new GUC?  We already have a bunch of freezing
> GUCs which fewer than 1% of our user base has any idea how to set.

That is a documentation problem not a user problem. Although I agree 
that yet another GUC for an obscure "feature" that should be internally 
intelligent is likely the wrong direction.

JD

-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.