Re: Add CREATE support to event triggers
От | Robert Haas |
---|---|
Тема | Re: Add CREATE support to event triggers |
Дата | |
Msg-id | CA+TgmoYMRXhbHQgAWsU2g_Rdw7EpxsYZg3u-a-nYNh5n6Gj94g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Add CREATE support to event triggers (Andres Freund <andres@2ndquadrant.com>) |
Список | pgsql-hackers |
On Fri, Nov 14, 2014 at 1:18 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I think it's a good idea to structure independent features in a way that > other solutions can reuse them. But I sure as hell can't force them to > use it - especially as there's unfortunately not too much development > going on in the existing logical replication solutions for postgres. > > Btw, I really think there's quite legitimate use cases for this besides > logical replication solutions - e.g. schema tracking is quite a sensible > use case. Well, as I already said, despite my doubts about the general utility of this feature, I'm willing to see us take it IF we have a testing framework that will reliably catch bugs, including bugs of omission. Without that, I'm very confident it's going to be a maintenance nightmare, and I believe you admitted yourself that that concern was reasonable. > Can you please hint at some other workable design? I don't think there > really is anything else. I think this really depends on what you mean by "anything else". Any DDL replication solution is necessarily going to involve the following steps: 1. Define some set of primitives such that any imaginable DDL operation can be expressed as a series of those primitives. 2. Find a way to capture those events as they happen. 3. Serialize them into some wire format and transport that format to the replica. 4. Apply them, possibly coordinating in some way with the master so that the user's original request fails if the apply fails. There are meaningful choices at every step. You're proposing that the primitives should be "anything that can be expressed as a complete SQL command against a single object" (I think - what are you going to emit for an ALL IN SCHEMA op - that thing itself, or a similar operation against each object in the schema?); that the capture mechanism should be an event trigger that inserts into a queue table; that the serialization format should be a JSON language designed to allow reassembly of the corresponding SQL statement; and that the apply coordination mechanism should be 2PC. But none of that is written in stone. As far as deciding what primitives to use, I think the principal decision to be made here is as to the appropriate level of granularity. For example, CREATE TABLE foo (a int DEFAULT 1, b int, CHECK (b > 42)) could be emitted as a single event saying that a table was created. But it could also be emitted as create-table (foo), add-column (foo, a, int), add-column (foo, b, int), add-column-default (a, 1), add-check-constraint (foo, b > 42). The appeal of a more fine-grained set of primitives is that there might be fewer of them, and that each of them might be simpler; one of the things that makes physical replication so reliable is that its primitives are very simple and thus easy to verify. The possible event-capture mechanisms seem to be to have either (a) event trigger or (b) a hook function in some yet-to-be-defined place or (c) core code which will either (i) write each event to a table, (ii) write each event directly into WAL, or perhaps (iii) write it someplace else (file? in-memory queue? network socket?). There are lots of possible serialization formats. Coordinating with the master could involve 2PC, as you propose; or trying to somehow validate that a given series of events is a valid state transformation based on the starting state on the standby before doing the operation on the master; or the use of a distributed transaction coordinated by something like PG-XC's global transaction manager; or you can skip it and hope for the best. In addition to the decisions above, you can try to prevent failures by restricting certain changes from happening, or you can let users change what they like and hope for the best. Different solutions can have different mechanisms for controlling which objects are under replication and which changes are not; or even allowing some individual DDL statements to opt out of replication while forcing others to participate. Administratively, solutions can be built to facilitate easy replication of an entire database to another node, or more specific applications like sharding, where creating a table on a master node creates child tables on a bunch of slave nodes, but they're not all identical, because we're partitioning the data so that only some of it will be on each node - thus the constraints and so on will be different. BDR has one set of answers to all of these questions, and despite my worries about a few points here and there, they are not stupid answers. But they are not the ONLY answers either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: