Re: Cascading replication: should we detect/prevent cycles?
От | Robert Haas |
---|---|
Тема | Re: Cascading replication: should we detect/prevent cycles? |
Дата | |
Msg-id | CA+TgmoZdO4qZyubHv1tXRUiiRT5s9UiM8tRLF78PGs-iL3xJ8Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Cascading replication: should we detect/prevent cycles? (Josh Berkus <josh@agliodbs.com>) |
Список | pgsql-hackers |
On Thu, Jan 31, 2013 at 9:48 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 02/01/2013 12:01 PM, Josh Berkus wrote: >>> If we're going to start installing safeguards against doing stupid >>> things, there's a long list of scenarios that happen far more >>> regularly than this ever will and cause far more damage. >> >> What's wrong with making it easier for sysadmins to troubleshoot things? >> Again, I'm not talking about erroring out, I'm talking about logging a >> warning. > > Or to put it another way: Robert, you just did a "nobody wants that" to > me. I thought you were opposed to such things on this list. I respectfully disagree. I'm saying that *I* don't want that, which I think is different. To interpret my opposition against saying "nobody wants that" to mean "you can never oppose anything someone else thinks is a good idea" would preclude meaningful dialogue on most of what we talk about here. And clearly there is at least some demand for this feature, because you and Craig Ringer both want it. So let me try to restate my objection to this specific feature more clearly. I think that we should be careful about warning the user about things that might not actually be mistakes. I'm not aware that we currently issue ANY warnings of that type. When we emit error messages, we sometimes suggest one possible cause of the error, and such messages are clearly labelled as HINT. But we don't, for example, emit an error or a WARNING or ERROR about a DELETE or UPDATE statement that lacks a WHERE clause, even though many people might like to have such a feature. We don't warn a user "hey, float8 is imprecise, consider using numeric" or "hey, numeric is slow, consider using float8" or "setting autovacuum_naptime to an hour is probably dummer than pouring sugar in your gas tank", even though all of those things are true and some people might like to be warned. We only warn or error out when something happens that we are 100% sure is bad. And, in this particular case, it has been suggested that there are legitimate reasons why a replication topology might temporarily involve loops, so I believe this fails that criterion. Second, we have often discussed the importance of avoiding log spam. Warnings that are likely to be repeated a large number of times when they occur have repeatedly been voted down on those grounds. I believe that objection also applies to this case. It is more appropriate to make information about the status of the system available via some status-inquiry function; for example, if you were to recast this as adding a slave-side function that attempts to return the IP of the current master, or NULL if no master, that would answer this objection (but not necessarily all of the other ones). Third, we usually apply a criterion that warnings or errors must represent conditions that we can reliably detect; in other words, we typically do not add checks for situations that we will only sometimes be able to identify. And, in this case, it's a little unclear how we would actually identify loops. Presumably, we'd do it by sending a chain of unique per-node identifiers along with the WAL, and looking for your own identifier in the path, but we don't have any sort of unique per-node identifier right now, and how would you create one? If someone shuts down the cluster, duplicates it, and starts up both copies, we want that to work. Any identifier embedded in the cluster by such a process would be duplicated. You could use something like the node IP and port number, which wouldn't have that pitfall, but as we all know, IPs can be duplicated (e.g. due to NAT) so this isn't necessarily reliable either. If you do come up with a suitable unique per-node identifier, then this is fairly simple to make work for streaming replication, but it's tricky to see how to make it work with archiving. Is that more clear? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andres FreundДата:
Сообщение: Re: GetOldestXmin going backwards is dangerous after all