Обсуждение: multixacts woes

Поиск

Список

Период

Сортировка

multixacts woes

От

Robert Haas

Дата:

08 мая 2015 г., 21:15:50

My colleague Thomas Munro and I have been working with Alvaro, and
also with Kevin and Amit, to fix bug #12990, a multixact-related data
corruption bug. I somehow did not realize until very recently that we
actually use two SLRUs to keep track of multixacts: one for the
multixacts themselves (pg_multixacts/offsets) and one for the members
(pg_multixacts/members). Confusingly, members are sometimes called
offsets, and offsets are sometimes called IDs, or multixacts. If
either of these SLRUs wraps around, we get data loss. This comment in
multixact.c explains it well:
/* * Since multixacts wrap differently from transaction IDs, this logic is * not entirely correct:
insome scenarios we could go for longer than 2 * billion multixacts without seeing any data loss, and in

some others we * could get in trouble before that if the new pg_multixact/members data * stomps on the
previouscycle's data. For lack of a better

mechanism we * use the same logic as for transaction IDs, that is, start
taking action * halfway around the oldest potentially-existing multixact. */ multiWrapLimit =
oldest_datminmxid+ (MaxMultiXactId >> 1); if (multiWrapLimit < FirstMultiXactId) multiWrapLimit +=
FirstMultiXactId;

Apparently, we have been hanging our hat since the release of 9.3.0 on
the theory that the average multixact won't ever have more than two
members, and therefore the members SLRU won't overwrite itself and
corrupt data. This is not good enough: we need to prevent multixact
IDs from wrapping around, and we separately need to prevent multixact
members from wrapping around, and the current code was conflating
those things in a way that simply didn't work. Recent commits by
Alvaro and by me have mostly fixed this, but there are a few loose
ends:

1. I believe that there is still a narrow race condition that cause
the multixact code to go crazy and delete all of its data when
operating very near the threshold for member space exhaustion. See
http://www.postgresql.org/message-id/CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
for the scenario and proposed fix.

2. We have some logic that causes autovacuum to run in spite of
autovacuum=off when wraparound threatens. My commit
53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
anti-wraparound protections for multixact members that exist for
multixact IDs and for regular XIDs, but this remains an outstanding
issue. I believe I know how to fix this, and will work up an
appropriate patch based on some of Thomas's earlier work.

3. It seems to me that there is a danger that some users could see
extremely frequent anti-mxid-member-wraparound vacuums as a result of
this work. Granted, that beats data corruption or errors, but it
could still be pretty bad. The default value of
autovacuum_multixact_freeze_max_age is 400000000.
Anti-mxid-member-wraparound vacuums kick in when you exceed 25% of the
addressable space, or 1073741824 total members. So, if your typical
multixact has more than 1073741824/400000000 = ~2.68 members, you're
going to see more autovacuum activity as a result of this change.
We're effectively capping autovacuum_multixact_freeze_max_age at
1073741824/(average size of your multixacts). If your multixacts are
just a couple of members (like 3 or 4) this is probably not such a big
deal. If your multixacts typically run to 50 or so members, your
effective freeze age is going to drop from 400m to ~21.4m. At that
point, I think it's possible that relminmxid advancement might start
to force full-table scans more often than would be required for
relfrozenxid advancement. If so, that may be a problem for some
users.

What can we do about this? Alvaro proposed back-porting his fix for
bug #8470, which avoids locking a row if a parent subtransaction
already has the same lock. Alvaro tells me (via chat) that on some
workloads this can dramatically reduce multixact size, which is
certainly appealing. But the fix looks fairly invasive - it changes
the return value of HeapTupleSatisfiesUpdate in certain cases, for
example - and I'm not sure it's been thoroughly code-reviewed by
anyone, so I'm a little nervous about the idea of back-porting it at
this point. I am inclined to think it would be better to release the
fixes we have - after handling items 1 and 2 - and then come back to
this issue. Another thing to consider here is that if the high rate
of multixact consumption is organic rather than induced by lots of
subtransactions of the same parent locking the same tuple, this fix
won't help.

Another thought that occurs to me is that if we had a freeze map, it
would radically decrease the severity of this problem, because
freezing would become vastly cheaper. I wonder if we ought to try to
get that into 9.5, even if it means holding up 9.5. Quite aside from
multixacts, repeated wraparound autovacuuming of static data is a
progressively more serious problem as data set sizes and transaction
volumes increase. The possibility that multixact freezing may in some
scenarios exacerbate that problem is just icing on the cake. The
fundamental problem is that a 32-bit address space just isn't that big
on modern hardware, and the problem is worse for multixact members
than it is for multixact IDs, because a given multixact only uses
consumes one multixact ID, but as many slots in the multixact member
space as it has members.

Thoughts, advice, etc. are most welcome.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: multixacts woes