Re: termination of backend waiting for sync rep generates a junk log message
От | Robert Haas |
---|---|
Тема | Re: termination of backend waiting for sync rep generates a junk log message |
Дата | |
Msg-id | CA+TgmoYffyDe6Ar+i85HWYzJ+U2eGmTjdsNjoAaBYUF9P8Ez1w@mail.gmail.com обсуждение исходный текст |
Ответ на | termination of backend waiting for sync rep generates a junk log message (Fujii Masao <masao.fujii@gmail.com>) |
Список | pgsql-hackers |
On Mon, Oct 24, 2011 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> as it seems to me that any >> client that is paranoid enough to care about sync rep had better >> already be handling the case of a connection loss during commit. > > Agreed, but that is a problem that by definition we can't help with. > Also, the issue with connection loss is that you really can't know > whether your transaction got committed without reconnecting and looking > for evidence. There is no reason at all to inject such uncertainty > into the cancel-SyncRepWaitForLSN case. We know the transaction got > committed, I disagree. The whole point of synchronous replication is that the user is worried about the case where the primary goes away just after the commit is acknowledged to the client. Consider the following scenario: Someone has determined that it can't be reached from 90% of the corporate Internet, but the synchronous standby, which is naturally on another network, still has connectivity. So they log into the master and perform a fast shutdown. When they reconnect, the connection pooler (or other mechanism) redirects their connection to the standby, which has sense been promoted. ISTM that the client had darn well better go search for hard evidence about the transaction state. >> But I think that throwing an ERROR is likely to cause a LOT of client >> breakage, even if you have some special (human-invisible?) flag that >> indicates that you don't really mean it. If we must do something >> other than simulating a server disconnect, letting the command >> completion message go through and annotating it with a NOTICE or >> WARNING seems preferable. > > I think you're thinking narrowly of the SyncRepWaitForLSN case. What > I'm trying to point out is that there's a boatload of post-commit code > which is capable of sometimes throwing errors, and that's not ever > going to go away completely. > > It might be that it'd work to deal with this by reducing the reported > strength of all such cases from ERROR to WARNING. Not sure that that's > a good idea, but it might work. It's hard to be sure that a systematic approach will work. For example, if we fail to can't nuke a memory context for some reason, it wouldn't be utterly crazy to just ignore the problem and try to soldier on. We've probably leaked some memory, but oh well. If we've failed to release a heavyweight lock we had better call LockReleaseAll() somehow, but the details of what gets sent to the client are negotiable and a WARNING is probably fine. On the other hand, if we experienced some failure that affects our ability to make the transaction globally visible (like we wrote the commit record but then fail trying to acquire ProcArrayLock to clear our xmin), it's hard to believe that anything other than PANIC is enough. Because of that and similar cases elsewhere, including for example inside the lock manager, I've long been feeling grumpy about this: /* Ensure we will have room to remember the lock */ if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS) elog(ERROR, "too many LWLocks taken"); It seems to me that the idea that the abort path is going to be able to recover from that situation is wildly optimistic. Fortunately, our coding practices are good enough that I think it never happens anyway, but if it does it should surely PANIC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: