Re: termination of backend waiting for sync rep generates a junk log message

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: termination of backend waiting for sync rep generates a junk log message
Дата
Msg-id CA+TgmoYffyDe6Ar+i85HWYzJ+U2eGmTjdsNjoAaBYUF9P8Ez1w@mail.gmail.com
обсуждение исходный текст
Ответ на termination of backend waiting for sync rep generates a junk log message  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
On Mon, Oct 24, 2011 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> as it seems to me that any
>> client that is paranoid enough to care about sync rep had better
>> already be handling the case of a connection loss during commit.
>
> Agreed, but that is a problem that by definition we can't help with.
> Also, the issue with connection loss is that you really can't know
> whether your transaction got committed without reconnecting and looking
> for evidence.  There is no reason at all to inject such uncertainty
> into the cancel-SyncRepWaitForLSN case.  We know the transaction got
> committed,

I disagree.  The whole point of synchronous replication is that the
user is worried about the case where the primary goes away just after
the commit is acknowledged to the client.  Consider the following
scenario: Someone has determined that it can't be reached from 90% of
the corporate Internet, but the synchronous standby, which is
naturally on another network, still has connectivity.  So they log
into the master and perform a fast shutdown.  When they reconnect, the
connection pooler (or other mechanism) redirects their connection to
the standby, which has sense been promoted.  ISTM that the client had
darn well better go search for hard evidence about the transaction
state.

>> But I think that throwing an ERROR is likely to cause a LOT of client
>> breakage, even if you have some special (human-invisible?) flag that
>> indicates that you don't really mean it.  If we must do something
>> other than simulating a server disconnect, letting the command
>> completion message go through and annotating it with a NOTICE or
>> WARNING seems preferable.
>
> I think you're thinking narrowly of the SyncRepWaitForLSN case.  What
> I'm trying to point out is that there's a boatload of post-commit code
> which is capable of sometimes throwing errors, and that's not ever
> going to go away completely.
>
> It might be that it'd work to deal with this by reducing the reported
> strength of all such cases from ERROR to WARNING.  Not sure that that's
> a good idea, but it might work.

It's hard to be sure that a systematic approach will work.  For
example, if we fail to can't nuke a memory context for some reason, it
wouldn't be utterly crazy to just ignore the problem and try to
soldier on.  We've probably leaked some memory, but oh well.  If we've
failed to release a heavyweight lock we had better call
LockReleaseAll() somehow, but the details of what gets sent to the
client are negotiable and a WARNING is probably fine.  On the other
hand, if we experienced some failure that affects our ability to make
the transaction globally visible (like we wrote the commit record but
then fail trying to acquire ProcArrayLock to clear our xmin), it's
hard to believe that anything other than PANIC is enough.

Because of that and similar cases elsewhere, including for example
inside the lock manager, I've long been feeling grumpy about this:
       /* Ensure we will have room to remember the lock */       if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
 elog(ERROR, "too many LWLocks taken"); 

It seems to me that the idea that the abort path is going to be able
to recover from that situation is wildly optimistic.  Fortunately, our
coding practices are good enough that I think it never happens anyway,
but if it does it should surely PANIC.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Omar Bettin"
Дата:
Сообщение: [9.1] unusable for large views
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Updated version of pg_receivexlog