Обсуждение: BUG #14195: "MultiXactId XXXXXX has not been created yet -- apparent wraparound" after upgrade from 9.2

Поиск
Список
Период
Сортировка

BUG #14195: "MultiXactId XXXXXX has not been created yet -- apparent wraparound" after upgrade from 9.2

От
sean.hope@eroad.com
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDE5NQpMb2dnZWQgYnk6ICAg
ICAgICAgIFNlYW4gSG9wZQpFbWFpbCBhZGRyZXNzOiAgICAgIHNlYW4uaG9w
ZUBlcm9hZC5jb20KUG9zdGdyZVNRTCB2ZXJzaW9uOiA5LjQuNwpPcGVyYXRp
bmcgc3lzdGVtOiAgIFVidW50dSAxMi4wNCBMVFMKRGVzY3JpcHRpb246ICAg
ICAgICAKCkhpLA0KDQpXZSdyZSBoYXZpbmcgaXNzdWVzIHdpdGggJ011bHRp
WGFjdElkIHh4eHh4eHh4eCBoYXMgbm90IGJlZW4gY3JlYXRlZCB5ZXQgLS0K
YXBwYXJlbnQgd3JhcGFyb3VuZCcgbG9nIG1lc3NhZ2VzIHVwb24gdGFibGUg
dmFjdXVtcy4gVGhlIHZhY3V1bXMgYXJlCmNhbmNlbGxlZCBieSB0aGUgc3lz
dGVtIGFmdGVyIHRoaXMgbWVzc2FnZSBpcyBpc3N1ZWQuDQoNClRoZXNlIG1l
c3NhZ2VzIGhhdmUgb25seSBiZWVuIHNlZW4gc2luY2UgdXBncmFkaW5nIGZy
b20gOS4yLnggdG8gOS40LjcgdmlhCnBnX3VwZ3JhZGUgYWJvdXQgYSBtb250
aCBhZ28uIFRoaXMgaXMgYSBsYXJnZSAvIGhpZ2ggdHJhbnNhY3Rpb24gcmF0
ZSAoMTEyMAp0cHMpIGRhdGFiYXNlLiBXZSBhcmUgdW5hYmxlIHRvIHZhY3V1
bSB0aGUgYWZmZWN0ZWQgdGFibGVzICg1IG9mIHRoZW0pCmN1cnJlbnRseSBk
dWUgdG8gdGhpcyBpc3N1ZS4NCg0KQW55IGhlbHAgbW9zdCB3ZWxjb21lLg0K
DQpSZWdhcmRzLA0KU2VhbgoK
>>>>> "sean" == sean hope <sean.hope@eroad.com> writes:

 sean> The following bug has been logged on the website:
 sean> Bug reference:      14195
 sean> Logged by:          Sean Hope
 sean> Email address:      sean.hope@eroad.com
 sean> PostgreSQL version: 9.4.7
 sean> Operating system:   Ubuntu 12.04 LTS
 sean> Description:

 sean> Hi,

 sean> We're having issues with 'MultiXactId xxxxxxxxx has not been
 sean> created yet -- apparent wraparound' log messages upon table
 sean> vacuums. The vacuums are cancelled by the system after this
 sean> message is issued.

 sean> These messages have only been seen since upgrading from 9.2.x to
 sean> 9.4.7 via pg_upgrade about a month ago. This is a large / high
 sean> transaction rate (1120 tps) database. We are unable to vacuum the
 sean> affected tables (5 of them) currently due to this issue.

This one came up on the IRC channel and I did some analysis there.

What it seems to boil down to is that, faced with a tuple with a
pre-upgrade mxid, we get this call path:

vacuum calls heap_prepare_freeze_tuple

heap_prepare_freeze_tuple calls FreezeMultiXactId
 (because HEAP_XMAX_IS_MULTI)

FreezeMultiXactId checks if the mxid is older than the cutoff, but
because this is a pre-upgrade mxid it can be literally anything, and in
this case it's in the future not the past, so that check is skipped

FreezeMultiXactId calls GetMultiXactIdMembers with allow_old=true

GetMultiXactIdMembers checks for wraparound, but while it downgrades the
error to DEBUG1 if allow_old is true and the mxid is in the past, it
unconditionally errors if the mxid is in the future.

From pageinspect output this is one of the failing tuples:

blk: 291;
lp: 35;
lp_off: 7992;
lp_flags: 1;
lp_len: 196;
t_xmin: 2;
t_xmax: 197529992;
t_field3: 21;
t_ctid: (291,35);
t_infomask2: 32791;
t_infomask: 14723;  (0x3983)
t_hoff: 32;
bits: "111111111111111111111100000000000000000000000000000000000000000000000000";

(note: the pageinspect output was all absolutely consistent, there were
no signs of any data corruption of any kind)

from pg_controldata:

Latest checkpoint's NextMultiXactId:  3125330305
Latest checkpoint's NextMultiOffset:  2217410182
Latest checkpoint's oldestMultiXid:   2741572175

--
Andrew (irc:RhodiumToad)
Andrew Gierth wrote:

> This one came up on the IRC channel and I did some analysis there.
>
> What it seems to boil down to is that, faced with a tuple with a
> pre-upgrade mxid, we get this call path:
>
> vacuum calls heap_prepare_freeze_tuple
>
> heap_prepare_freeze_tuple calls FreezeMultiXactId
>  (because HEAP_XMAX_IS_MULTI)
>
> FreezeMultiXactId checks if the mxid is older than the cutoff, but
> because this is a pre-upgrade mxid it can be literally anything, and in
> this case it's in the future not the past, so that check is skipped
>
> FreezeMultiXactId calls GetMultiXactIdMembers with allow_old=true
>
> GetMultiXactIdMembers checks for wraparound, but while it downgrades the
> error to DEBUG1 if allow_old is true and the mxid is in the past, it
> unconditionally errors if the mxid is in the future.

I was in the middle of writing up this in another thread, and your
analysis in spot on.

https://www.postgresql.org/message-id/20160615233427.GA18976@alvherre.pgsql

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
sean.hope@eroad.com wrote:

> Hi,
>
> We're having issues with 'MultiXactId xxxxxxxxx has not been created yet --
> apparent wraparound' log messages upon table vacuums. The vacuums are
> cancelled by the system after this message is issued.
>
> These messages have only been seen since upgrading from 9.2.x to 9.4.7 via
> pg_upgrade about a month ago. This is a large / high transaction rate (1120
> tps) database. We are unable to vacuum the affected tables (5 of them)
> currently due to this issue.

This is fixed now, finally.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=61b24fef89a9f7e381e6d83029b63e062d32812b
The other thread I linked to earlier has more discussion.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services