Re: [HACKERS] Built-in plugin for logical decoding output
| От | Gregory Brail | 
|---|---|
| Тема | Re: [HACKERS] Built-in plugin for logical decoding output | 
| Дата | |
| Msg-id | CAFF4x12pGTq3NFfEn5t9afwJ=ir_8TAhE+LnKUsObh7aW3SdEQ@mail.gmail.com обсуждение исходный текст | 
| Ответ на | Re: [HACKERS] Built-in plugin for logical decoding output (Alvaro Hernandez <aht@ongres.com>) | 
| Ответы | Re: [HACKERS] Built-in plugin for logical decoding output | 
| Список | pgsql-hackers | 
I'm encouraged that pgoutput exists and I'm sorry that I missed it before. I think it's fine as a binary-only format. If someone can write a client for the Postgres wire protocol as documented in Chapter 52 of the docs, then they should have no trouble consuming the output from pgoutput.
However, I can't find any docs for the output format of pgoutput, which is going to make it less likely for people to be able to consume it. Is anyone working on docs? I know that it's a painful process.
I also think that a JSON-format (or configurable format) plugin would make this part of PG much more usable and I'd encourage the community to come up with one.
Finally, since there were some "why didn't you just" questions in the email thread, let me write a little bit about what we were trying to do.
We have a set of data that represents the configuration of some of our customer's systems. (This is for Apigee Edge, which is a software product that represents a small part of Google Cloud, and which was developed long before we joined Google.) We'd like to efficiently and reliably push configuration changes down to our customer's systems, mostly to make it possible for them to run parts of our software stack in their own data centers, with limited or even unreliable network connectivity to the rest of our services. Data replication is a great fit for this problem.
However, we want the downstream software components (the ones that our customers run in their own data centers) to know when various things change, we want those changes delivered in a consistent order, and we want to be able to reliably receive them by having each consumer keep track of where they currently are in the replication scheme. Logical replication is a great fit for this because it enables us to build a list of all the changes to this management data in a consistent order. Once we have that list, it's fairly simple to persist it somewhere and let clients consume it in various ways. (In our case, via an HTTP API that supports long polling. Having all the clients consume a Kafka stream was not an option that we wanted to consider.)
The difference between what we're trying to do and most solutions that use logical replication is that we will have thousands or tens of thousands of clients pulling a list of changes that originated in a single Postgres database. That means that we need to index our own copy of the replication output so that clients can efficiently get changes only to "their" data. Furthermore, it means that we can't do things like create a unique replication slot for each client. Instead, we have a smaller number of servers that replicate from the master, and then those in turn give out lists of changes to other clients.
On Mon, Sep 25, 2017 at 9:48 AM, Alvaro Hernandez <aht@ongres.com> wrote:
On 25/09/17 19:39, Petr Jelinek wrote:
Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.However, having said that, and while json is a great output formatJSON is indeed great for interoperability, if you want more compact
for interoperability, if there's a discussion on which plugin to include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).
format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.
In my opinion, logical decoding plugins that don't come with core are close to worthless (don't get me wrong):
- They very unlikely will be installed in managed environments (an area growing significantly).
- As anything that is not in core, raises concerns by users.
- Distribution and testing are non-trivial: many OS/archs combinations.
Given the above, I believe having a general-purpose output plugin in-core is critical to the use of logical decoding. As for 9.4-9.6 there is test_decoding, and given that AWS uses it for production, that's kind of fine. For 10 there is at least pgoutput, which could be used (even though it was meant for replication). But if a new plugin is to be developed for 11+, one really general purpose one, I'd say json is not a good choice if it is the only output it would support. json is too verbose, and replication, if anything, needs performance (it is both network heavy and serialization/deserialization is quite expensive). Why not, if one and only one plugin would be developed for 11+, general purpose, do something that is, indeed, more general, i.e., that supports high-performance scenarios too?
Álvaro
--
Alvaro Hernandez
-----------
OnGres
В списке pgsql-hackers по дате отправления: