Обсуждение: Hot Standby conflict resolution handling

Поиск
Список
Период
Сортировка

Hot Standby conflict resolution handling

От
Pavan Deolasee
Дата:
<br />I was trying some simple queries on a Hot Standby with streaming replication.<br /><br />On standby, I do
this:<br/>postgres=# begin transaction isolation level repeatable read;<br />BEGIN<br />postgres=# explain verbose
selectcount(b) from test WHERE a > 100000;<br /><br clear="all" />On master, I insert a bunch of tuples in the table
andrun VACUUM ANALYZE.<br />postgres=# INSERT INTO test VALUES (generate_series(110001,120000), 'foo', 1);<br />INSERT
010000<br />postgres=# VACUUM ANALYZE test;<br /> VACUUM<br /><br />After max_standby_streaming_delay, the standby
startscancelling the queries. I get an error like this on the standby:<br />postgres=# explain verbose select count(b)
fromtest WHERE a > 100000;<br />FATAL:  terminating connection due to conflict with recovery<br /> DETAIL:  User
querymight have needed to see row versions that must be removed.<br />HINT:  In a moment you should be able to
reconnectto the database and repeat your command.<br />server closed the connection unexpectedly<br />     This
probablymeans the server terminated abnormally<br />    before or while processing the request.<br />The connection to
theserver was lost. Attempting reset: Succeeded.<br /><br />So I've couple questions/concerns here<br /><br />1. Why to
throwa FATAL error here ? A plain ERROR should be enough to abort the transaction. There are four places in
ProcessInterrupts()where we throw these kind of errors and three of them are FATAL.<br /><br />2911             if
(DoingCommandRead)<br/> 2912                 ereport(FATAL,<br />2913                        
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br/>2914                          errmsg("terminating connection due to
conflictwith recovery"),<br /> 2915                          errdetail_recovery_conflict(),<br />2916                 
errhint("Ina moment you should be able to reconnect to the"<br />2917                          " database and repeat
yourcommand.")));<br /> 2918             else         <br />2919                 ereport(ERROR,<br
/>2920                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br />2921                  errmsg("canceling
statementdue to conflict with recovery"),<br /> 2922                          errdetail_recovery_conflict()));<br /><br
/>Inthis particular test case, the backend is DoingCommandRead and that forces a FATAL error. I'm not sure why is that
required. And even if its necessary, IMHO we should add a comment explaining that. In the other two places where we
throwFATAL, one looks legitimate, but I'm not sure about the other.<br /><br />2836         else if
(RecoveryConflictPending&& RecoveryConflictRetryable)<br />2837         {<br />2838            
pgstat_report_recovery_conflict(RecoveryConflictReason);<br/>2839             ereport(FATAL,<br />
2840                    (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br />2841               errmsg("terminating
connectiondue to conflict with recovery"),<br />2842                      errdetail_recovery_conflict()));<br />
2843        }<br />2844         else if (RecoveryConflictPending)<br />2845         {<br />2846             /*
Currentlythere is only one non-retryable recovery conflict */<br />2847             Assert(RecoveryConflictReason ==
PROCSIG_RECOVERY_CONFLICT_DATABASE);<br/> 2848             pgstat_report_recovery_conflict(RecoveryConflictReason);<br
/>2849            ereport(FATAL,<br />2850                     (errcode(ERRCODE_DATABASE_DROPPED),<br
/>2851              errmsg("terminating connection due to conflict with recovery"),<br /> 2852                     
errdetail_recovery_conflict()));<br/>2853         }<br /><br />AFAICS the first of these should be ereport(ERROR).
Otherwiseirrespective of whether RecoveryConflictRetryable is true or false, we will always ereport(FATAL).<br /><br
/>2.For my test, the error message itself looks wrong because I did not actually remove any rows on the master. VACUUM
probablymarked a bunch of pages as all-visible and that should have triggered a conflict on the standby in order to
supportindex-only scans. IMHO we should  improve the error message to avoid any confusion. Or we can add a new
ProcSignalReasonto differentiate between a cancel due to clean up vs visibilitymap_set() operation.<br /><br />BTW, my
currentset up is using 9.2.1, but I don't see any code changes in the master. So I would assume the issues will exist
theretoo.<br /><br />Thanks,<br />Pavan<br /><br />-- <br />Pavan Deolasee<br /><a
href="http://www.linkedin.com/in/pavandeolasee"target="_blank">http://www.linkedin.com/in/pavandeolasee</a><br /> 

Re: Hot Standby conflict resolution handling

От
Andres Freund
Дата:
Hi,

On 2012-12-04 12:30:43 +0530, Pavan Deolasee wrote:
> I was trying some simple queries on a Hot Standby with streaming
> replication.
>
> On standby, I do this:
> postgres=# begin transaction isolation level repeatable read;
> BEGIN
> postgres=# explain verbose select count(b) from test WHERE a > 100000;
>
> On master, I insert a bunch of tuples in the table and run VACUUM ANALYZE.
> postgres=# INSERT INTO test VALUES (generate_series(110001,120000), 'foo',
> 1);
> INSERT 0 10000
> postgres=# VACUUM ANALYZE test;
> VACUUM
>
> After max_standby_streaming_delay, the standby starts cancelling the
> queries. I get an error like this on the standby:
> postgres=# explain verbose select count(b) from test WHERE a > 100000;
> FATAL:  terminating connection due to conflict with recovery
> DETAIL:  User query might have needed to see row versions that must be
> removed.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> So I've couple questions/concerns here
>
> 1. Why to throw a FATAL error here ? A plain ERROR should be enough to
> abort the transaction. There are four places in ProcessInterrupts() where
> we throw these kind of errors and three of them are FATAL.

The problem here is that were in IDLE IN TRANSACTION in this case. Which
currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
anything).

There are two problems making this non-trivial. For one, while we're in
IDLE IN TXN the client doesn't expect a response on a protocol level, so
we can't simply ereport() at that time.
For another, when were in IDLE IN TXN we're potentially inside openssl
so we can't jump out of there anyway because that would quite likely
corrupt the internal state of openssl.

I tried to fix this before (c.f. "Idle in transaction cancellation" or
similar) but while I had some kind of fix for the first issue (i saved
the error and reported it later when the protocol state allows it) I
missed the jumping out of openssl bit. I think its not that hard to
solve though. I remember having something preliminary but I never had
the time to finish it. If I remember correctly the trick was to set
openssl into non-blocking mode temporarily and return to the caller
inside be-secure.c:my_sock_read.
At that location ProcessInterrupts can run safely, error out silently,
and reraise the error once were in the right protocol state.

> 2836         else if (RecoveryConflictPending && RecoveryConflictRetryable)
> 2837         {
> 2838             pgstat_report_recovery_conflict(RecoveryConflictReason);
> 2839             ereport(FATAL,
> 2840                     (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> 2841               errmsg("terminating connection due to conflict with
> recovery"),
> 2842                      errdetail_recovery_conflict()));
> 2843         }
> 2844         else if (RecoveryConflictPending)
> 2845         {
> 2846             /* Currently there is only one non-retryable recovery
> conflict */
> 2847             Assert(RecoveryConflictReason ==
> PROCSIG_RECOVERY_CONFLICT_DATABASE);
> 2848             pgstat_report_recovery_conflict(RecoveryConflictReason);
> 2849             ereport(FATAL,
> 2850                     (errcode(ERRCODE_DATABASE_DROPPED),
> 2851               errmsg("terminating connection due to conflict with
> recovery"),
> 2852                      errdetail_recovery_conflict()));
> 2853         }
>
> AFAICS the first of these should be ereport(ERROR). Otherwise irrespective
> of whether RecoveryConflictRetryable is true or false, we will always
> ereport(FATAL).

Which is fine, because were below if (ProcDiePending). Note there's a
separate path for QueryCancelPending. We go on to kill connections once
the normal conflict handling has tried several times.

> 2. For my test, the error message itself looks wrong because I did not
> actually remove any rows on the master. VACUUM probably marked a bunch of
> pages as all-visible and that should have triggered a conflict on the
> standby in order to support index-only scans. IMHO we should  improve the
> error message to avoid any confusion. Or we can add a new ProcSignalReason
> to differentiate between a cancel due to clean up vs visibilitymap_set()
> operation.

I think we desparately need to improve *all* of these message with
significantly more detail (cause for cancellation, relation, current
xid, conflicting xid, current/last query).

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Hot Standby conflict resolution handling

От
Pavan Deolasee
Дата:


On Tue, Dec 4, 2012 at 1:44 PM, Andres Freund <andres@2ndquadrant.com> wrote:

>
> After max_standby_streaming_delay, the standby starts cancelling the
> queries. I get an error like this on the standby:
> postgres=# explain verbose select count(b) from test WHERE a > 100000;
> FATAL:  terminating connection due to conflict with recovery
> DETAIL:  User query might have needed to see row versions that must be
> removed.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> So I've couple questions/concerns here
>
> 1. Why to throw a FATAL error here ? A plain ERROR should be enough to
> abort the transaction. There are four places in ProcessInterrupts() where
> we throw these kind of errors and three of them are FATAL.

The problem here is that were in IDLE IN TRANSACTION in this case. Which
currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
anything).

There are two problems making this non-trivial. For one, while we're in
IDLE IN TXN the client doesn't expect a response on a protocol level, so
we can't simply ereport() at that time.
For another, when were in IDLE IN TXN we're potentially inside openssl
so we can't jump out of there anyway because that would quite likely
corrupt the internal state of openssl.

I tried to fix this before (c.f. "Idle in transaction cancellation" or
similar) but while I had some kind of fix for the first issue (i saved
the error and reported it later when the protocol state allows it) I
missed the jumping out of openssl bit. I think its not that hard to
solve though. I remember having something preliminary but I never had
the time to finish it. If I remember correctly the trick was to set
openssl into non-blocking mode temporarily and return to the caller
inside be-secure.c:my_sock_read.

Thanks Andres. I also read the original thread and I now understand why we are using FATAL here, at least until we have a better solution. Obviously the connection reset is no good either because as someone commented in the original discussion, I thought that I'm seeing a server crash while it was not.

 

>
> AFAICS the first of these should be ereport(ERROR). Otherwise irrespective
> of whether RecoveryConflictRetryable is true or false, we will always
> ereport(FATAL).

Which is fine, because were below if (ProcDiePending). Note there's a
separate path for QueryCancelPending. We go on to kill connections once
the normal conflict handling has tried several times.


Ok. Understood.I now see that every path below if (ProcDiePending) will call FATAL, albeit with different error codes. That explains the current code.

 

I think we desparately need to improve *all* of these message with
significantly more detail (cause for cancellation, relation, current
xid, conflicting xid, current/last query).


I agree.

Thanks,
Pavan 


--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: Hot Standby conflict resolution handling

От
Pavan Deolasee
Дата:
On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:


Thanks Andres. I also read the original thread and I now understand why we are using FATAL here, at least until we have a better solution. Obviously the connection reset is no good either because as someone commented in the original discussion, I thought that I'm seeing a server crash while it was not.


How about attached comment to be added at the appropriate place ? I've extracted this mostly from Tom's explanation in the original thread. 

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
Вложения

Re: Hot Standby conflict resolution handling

От
Stephen Frost
Дата:
Pavan,

* Pavan Deolasee (pavan.deolasee@gmail.com) wrote:
> On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee <pavan.deolasee@gmail.com>wrote:
> > Thanks Andres. I also read the original thread and I now understand why we
> > are using FATAL here, at least until we have a better solution. Obviously
> > the connection reset is no good either because as someone commented in the
> > original discussion, I thought that I'm seeing a server crash while it was
> > not.
>
> How about attached comment to be added at the appropriate place ? I've
> extracted this mostly from Tom's explanation in the original thread.

I was hoping to see an update to the actual error messages returned in
this patch..  I agree that it's good to add the comments but that
doesn't do anything to help the user out in this case.

Regarding the actual comment, here's the wording that I'd use:

-----------------------
If we are in DoingCommandRead state, we can't use ereport(ERROR)
because we can't long jumps in this state.  If we attempt to
longjmps in this state, we not only risk breaking protocol at
our level, but also risk leaving openssl in an inconsistent
state, either violating the ssl protocol or having its internal
state trashed because it was interrupted while in the middle of
changing that state.

Currently, the only option is to promote ERROR to FATAL until
we figure out a way to handle errors more effectively while in
this state.
-----------------------

If you agree with that wording update, can you update the patch?
Thanks,
    Stephen

Re: Hot Standby conflict resolution handling

От
Abhijit Menon-Sen
Дата:
At 2012-12-29 14:23:45 -0500, sfrost@snowman.net wrote:
>
> Regarding the actual comment, here's the wording that I'd use:

Sorry for nitpicking, but "we can't long jumps" made me cringe.
Here's a slightly more condensed version:

    /*
     * We can't use ereport(ERROR) here, because any longjmps
     * in DoingCommandRead state run the risk of violating our
     * protocol or the SSL protocol, by interrupting OpenSSL in
     * the middle of changing its internal state.
     *
     * Currently, the only option is to promote ERROR to FATAL
     * until we figure out a better way to handle errors in this
     * state.
     */

Patch along these lines attached, which also removes trailing
whitespace from the original patch.

-- Abhijit

Вложения

Re: Hot Standby conflict resolution handling

От
Pavan Deolasee
Дата:


On Thu, Jan 17, 2013 at 9:26 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
At 2012-12-29 14:23:45 -0500, sfrost@snowman.net wrote:
>
> Regarding the actual comment, here's the wording that I'd use:

Sorry for nitpicking, but "we can't long jumps" made me cringe.
Here's a slightly more condensed version:

    /*
     * We can't use ereport(ERROR) here, because any longjmps
     * in DoingCommandRead state run the risk of violating our
     * protocol or the SSL protocol, by interrupting OpenSSL in
     * the middle of changing its internal state.
     *
     * Currently, the only option is to promote ERROR to FATAL
     * until we figure out a better way to handle errors in this
     * state.
     */

Patch along these lines attached, which also removes trailing
whitespace from the original patch.


Thanks Stephen and Abhijit for improving the comments. I like this wording. So +1 from my side. Abhijit, do you want to add the patch and change the CF status appropriately ?

Thanks,
Pavan
 
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: Hot Standby conflict resolution handling

От
Tom Lane
Дата:
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
> Sorry for nitpicking, but "we can't long jumps" made me cringe.

Agreed :-(

> Here's a slightly more condensed version:

I find this still not quite right, because where it's placed, it applies
to both the DoingCommandRead and !DoingCommandRead branches of the
if/else statement.  The wording would be okay if placed inside the first
if branch, but I think visually that would look ugly.  I'm inclined to
suggest wording it like
* If we're in DoingCommandRead state, we can't use ereport(ERROR),* because any longjmp would risk interrupting OpenSSL
operations*and thus confusing that library and/or violating wire protocol.
 

plus your second para as-is.

But having said that ... are we sure this code is not actually broken?
ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
cannot safely attempt to send an error message to the client either;
but ereport(FATAL) will try exactly that.
        regards, tom lane



Re: Hot Standby conflict resolution handling

От
Pavan Deolasee
Дата:


On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But having said that ... are we sure this code is not actually broken?

I'm not.
 
ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
cannot safely attempt to send an error message to the client either;
but ereport(FATAL) will try exactly that.

I thought since FATAL will force the backend to exit, we don't care much about corrupted OpenSSL state. I even thought that's why we raise ERROR to FATAL so that the backend can start in a clean state. But clearly I'm missing a point here because you don't think that way.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: Hot Standby conflict resolution handling

От
Andres Freund
Дата:
On 2013-01-17 01:38:31 -0500, Tom Lane wrote:
> But having said that ... are we sure this code is not actually broken?
> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
> cannot safely attempt to send an error message to the client either;
> but ereport(FATAL) will try exactly that.

You're absolutely right.

ISTM, to fix it we would have to either provide a COMERROR like facility
for FATAL errors or just remove the requirement of raising an error
exactly there.
If I remember what I tried before correctly the latter seems to involve
setting openssl into nonblocking mode and check for the saved error in
the EINTR loop in be-secure and re-raise the error from there.

Do we want to backport either - there hasn't been any report that I
could link to it right now, but on the other hand it could possibly
cause rather ugly problems (data leakage, segfaults, code execution
aren't all that improbable)?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Hot Standby conflict resolution handling

От
Tom Lane
Дата:
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
>> cannot safely attempt to send an error message to the client either;
>> but ereport(FATAL) will try exactly that.

> I thought since FATAL will force the backend to exit, we don't care much
> about corrupted OpenSSL state. I even thought that's why we raise ERROR to
> FATAL so that the backend can start in a clean state. But clearly I'm
> missing a point here because you don't think that way.

If we were to simply exit(1), leaving the kernel to close the client
socket, it'd be safe enough because control would never have returned to
OpenSSL.  But this code doesn't do that.  What we're looking at is that
we've interrupted OpenSSL at some arbitrary point, and now we're going
to make fresh calls to it to try to pump the FATAL error message out to
the client.  It seems fairly unlikely that that's safe.  I'm not sure
I credit Andres' worry of arbitrary code execution, but I do fear that
OpenSSL could get confused to the point of freezing up, or even more
likely that it would transmit garbage to the client, which rather
defeats the purpose.

Don't see a nice fix.  The COMMERROR approach (ie, don't try to send
anything to the client, only the log) is not nice at all since the
client would get the impression that the server crashed.  On the other
hand, anything else requires waiting till we get control back from
OpenSSL, which might be a long time, and meanwhile we're still holding
locks that prevent WAL recovery from proceeding.
        regards, tom lane



Re: Hot Standby conflict resolution handling

От
Andres Freund
Дата:
On 2013-01-17 10:19:23 -0500, Tom Lane wrote:
> Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> > On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
> >> cannot safely attempt to send an error message to the client either;
> >> but ereport(FATAL) will try exactly that.
> 
> > I thought since FATAL will force the backend to exit, we don't care much
> > about corrupted OpenSSL state. I even thought that's why we raise ERROR to
> > FATAL so that the backend can start in a clean state. But clearly I'm
> > missing a point here because you don't think that way.
> 
> If we were to simply exit(1), leaving the kernel to close the client
> socket, it'd be safe enough because control would never have returned to
> OpenSSL.  But this code doesn't do that.  What we're looking at is that
> we've interrupted OpenSSL at some arbitrary point, and now we're going
> to make fresh calls to it to try to pump the FATAL error message out to
> the client.  It seems fairly unlikely that that's safe.  I'm not sure
> I credit Andres' worry of arbitrary code execution, but I do fear that
> OpenSSL could get confused to the point of freezing up, or even more
> likely that it would transmit garbage to the client, which rather
> defeats the purpose.

I don't think its likely either, I seem to remember it copying arround
function pointers though, so it seems possible with some bad luck.

> Don't see a nice fix.  The COMMERROR approach (ie, don't try to send
> anything to the client, only the log) is not nice at all since the
> client would get the impression that the server crashed.  On the other
> hand, anything else requires waiting till we get control back from
> OpenSSL, which might be a long time, and meanwhile we're still holding
> locks that prevent WAL recovery from proceeding.

I think we can make openssl return pretty much immediately if we assume
recv() can reliably interrupted by signals, possibly by setting the
socket to nonblocking in the signal handler.
We just need to tell openssl not to retry immediately and we should be
fine. Given that quite some people use openssl with nonblocking sockets,
that code path should be reasonably safe.

That still requires ugliness around saving the error and reraising it
after returning from openssl though...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services