Обсуждение: ProcessStandbyHSFeedbackMessage can make global xmin go backwards

Поиск
Список
Период
Сортировка

ProcessStandbyHSFeedbackMessage can make global xmin go backwards

От
Tom Lane
Дата:
ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can
call GetOldestXmin and then the result will politely hold still while
it considers what to do next.  But in fact, whoever has the oldest xmin
could exit their transaction, allowing the global minimum to advance.
If a VACUUM process then inspects the ProcArray, it could compute an
oldest xmin that is newer than the value that
ProcessStandbyHSFeedbackMessage installs just afterwards.  So much
for keeping the data the standby wanted.

AFAICS we have to do all the logic about choosing the new value for
MyProc->xmin while holding ProcArrayLock, which IMO means that it should
go into a function in procarray.c.  The fact that walsender.c is taking
ProcArrayLock and whacking MyProc->xmin around is already a horrid
violation of modularity, even if it weren't incorrect.

Also, it seems like using GetOldestXmin is quite wrong here anyway; we
don't really want a result modified by vacuum_defer_cleanup_age do we?
It looks to me like that would result in vacuum_defer_cleanup_age being
applied twice: the walsender process sets its xmin the defer age into
the past, and then subsequent calls of GetOldestXmin compute a result
that is the defer age further back.  IOW this is an independent
mechanism that also results in the computed global xmin going backwards.

(Now that we have a standby feedback mechanism, I'm a bit tempted to
propose getting rid of vacuum_defer_cleanup_age altogether, rather than
hacking things to avoid the above.)
        regards, tom lane


Re: ProcessStandbyHSFeedbackMessage can make global xmin go backwards

От
Tom Lane
Дата:
I wrote:
> ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can
> call GetOldestXmin and then the result will politely hold still while
> it considers what to do next.  But in fact, whoever has the oldest xmin
> could exit their transaction, allowing the global minimum to advance.
> If a VACUUM process then inspects the ProcArray, it could compute an
> oldest xmin that is newer than the value that
> ProcessStandbyHSFeedbackMessage installs just afterwards.  So much
> for keeping the data the standby wanted.

Actually, it's worse than that.  Clamping the walsender's xmin to
GetOldestXmin() is useless, and if it weren't useless, this code would
be completely wrong even discounting the bugs I pointed out already.

Consider what it is we're trying to do here: we'd like to prevent
VACUUMs on the master from deleting dead rows with xmax newer than the
xmin the standby is requesting.  Setting the walsender's xmin will only
affect VACUUMs launched after that moment; anything that's already
running will have already determined its threshold xmin, and perhaps
already removed rows.  You can't retroactively fix that.  So clamping
the walsender's xmin to GetOldestXmin doesn't actually do anything for
data integrity; it only ensures that the value of GetOldestXmin doesn't
go backwards on successive calls.  But that's possible anyway, as the
comments for that function already note.

What's worse is that the only thing that is prevented from going
backwards is the value of GetOldestXmin with allDbs = true.  But that's
only used by vacuums on shared catalogs.  If we wanted to prevent
the values seen by vacuums on non-shared tables from going backwards,
we'd have to do some much-more-complex calculation that identified
the largest value of GetOldestXmin with allDbs = false, across all
databases.  And that would *still* be wrong, because then the walsender
would only be protecting data that is newer than the newest such value,
which might allow data in other databases to be reclaimed too early.
You could only really make this work by maintaining per-database xmins,
which the standby isn't sending and there's no place to put into the
walsender's ProcArray entry either.

So I've concluded that there's just no point in the GetOldestXmin
clamping, and we should just apply the xmin value we get from the
standby if it passes basic sanity checks (the epoch test), and hope that
we're not too late to prevent loss of the data the standby wanted.

This line of reasoning also suggests that it's a pretty bad idea for the
walsender to invalidate its existing xmin if it gets a transiently bogus
(out of epoch) value from the standby.  I think it should sit on the
xmin it has, instead, to avoid potential loss of needed data.
        regards, tom lane


Re: ProcessStandbyHSFeedbackMessage can make global xmin go backwards

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of jue oct 20 19:20:19 -0300 2011:

> So I've concluded that there's just no point in the GetOldestXmin
> clamping, and we should just apply the xmin value we get from the
> standby if it passes basic sanity checks (the epoch test), and hope that
> we're not too late to prevent loss of the data the standby wanted.

I am happy that the HS patch has introduced uses for the xid epoch.  I
had to use them for multixact truncation in the FK locks patch, and was
nervous because I wasn't seeing any user in core code other than the
txid functions; this lack of callers, added to the comments in
GetNextXidAndEpoch ("we do not support such things"), made me feel a bit
uncomfortable about using it.  I'm happy to have this problem out of my
mind now.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: ProcessStandbyHSFeedbackMessage can make global xmin go backwards

От
Merlin Moncure
Дата:
On Wed, Oct 19, 2011 at 6:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can
> call GetOldestXmin and then the result will politely hold still while
> it considers what to do next.  But in fact, whoever has the oldest xmin
> could exit their transaction, allowing the global minimum to advance.
> If a VACUUM process then inspects the ProcArray, it could compute an
> oldest xmin that is newer than the value that
> ProcessStandbyHSFeedbackMessage installs just afterwards.  So much
> for keeping the data the standby wanted.
>
> AFAICS we have to do all the logic about choosing the new value for
> MyProc->xmin while holding ProcArrayLock, which IMO means that it should
> go into a function in procarray.c.  The fact that walsender.c is taking
> ProcArrayLock and whacking MyProc->xmin around is already a horrid
> violation of modularity, even if it weren't incorrect.
>
> Also, it seems like using GetOldestXmin is quite wrong here anyway; we
> don't really want a result modified by vacuum_defer_cleanup_age do we?
> It looks to me like that would result in vacuum_defer_cleanup_age being
> applied twice: the walsender process sets its xmin the defer age into
> the past, and then subsequent calls of GetOldestXmin compute a result
> that is the defer age further back.  IOW this is an independent
> mechanism that also results in the computed global xmin going backwards.
>
> (Now that we have a standby feedback mechanism, I'm a bit tempted to
> propose getting rid of vacuum_defer_cleanup_age altogether, rather than
> hacking things to avoid the above.)

curious: are these bugs in production, and what would be the user
visible symptoms of seeing them in the wild?

merlin


Re: ProcessStandbyHSFeedbackMessage can make global xmin go backwards

От
Tom Lane
Дата:
Merlin Moncure <mmoncure@gmail.com> writes:
> On Wed, Oct 19, 2011 at 6:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can
>> call GetOldestXmin and then the result will politely hold still while
>> it considers what to do next.

> curious: are these bugs in production, and what would be the user
> visible symptoms of seeing them in the wild?

There's no bug so far as data integrity on the master goes.  The risk
is that you'd see queries failing with replication conflicts on a
hot-standby slave even though you thought you'd protected them by
setting hot_standby_feedback = on.  That would happen if any rows
actually got vacuumed despite the standby's attempt to set an xmin
that would protect them.  This is possible anyway at walsender
startup, but I think the logic errors made it more probable.
        regards, tom lane