Обсуждение: pglogical_output - a general purpose logical decoding output plugin

Поиск

Список

Период

Сортировка

pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

02 ноября 2015 г., 15:17:36

Hi all

I'd like to submit pglogical_output for inclusion in the 9.6 series as
a contrib.

The output plugin is suitable for a number of uses. It's designed
primarily to supply a data stream to drive a logical replication
client running in another PostgreSQL instance, like the pglogical
client discussed at PGConf.EU 2015. However, the plugin can also be
used to drive other replication solutions, message queues, etc.

This isn't an extension, in that it doesn't actually provide any
extension control file, and you can't CREATE EXTENSION it. It's only
usable via the logical decoding interfaces - at the SQL level, or via
the walsender.

See the README.md and DESIGN.md in the attached patch for details on
the plugin. I will follow up with a summary in a separate mail, along
with a few points I'd value input on or want to focus discussion on.

I anticipate that I'll be following up with a few tweaks, but at this
point the plugin is feature-complete, documented and substantially
ready for inclusion as a contrib.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Add-contrib-pglogical_output-a-logical-decoding-plug.patch

Re: pglogical_output - a general purpose logical decoding output plugin

От

Andres Freund

Дата:

02 ноября 2015 г., 15:35:18

Hi,

On 2015-11-02 20:17:21 +0800, Craig Ringer wrote:
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.

Cool!

> See the README.md and DESIGN.md in the attached patch for details on
> the plugin. I will follow up with a summary in a separate mail, along
> with a few points I'd value input on or want to focus discussion on.

Sounds to me like at least a portion of this should be in sgml, either
in a separate contrib page or in the logical decoding section.

A quick readthrough didn't have a separate description of the
"sub-protocol" in which changes and such are encoded - I think we're
going to need that.

> I anticipate that I'll be following up with a few tweaks, but at this
> point the plugin is feature-complete, documented and substantially
> ready for inclusion as a contrib.

There's a bunch of changes that are hinted at in the files in various
places. Could you extract the ones you think need to be fixed before
integration see in some central place (or just an email)?

Regards,

Andres

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

02 ноября 2015 г., 15:44:26

On 2 November 2015 at 20:35, Andres Freund <andres@anarazel.de> wrote:
> On 2015-11-02 20:17:21 +0800, Craig Ringer wrote:
>
>> See the README.md and DESIGN.md in the attached patch for details on
>> the plugin. I will follow up with a summary in a separate mail, along
>> with a few points I'd value input on or want to focus discussion on.
>
> Sounds to me like at least a portion of this should be in sgml, either
> in a separate contrib page or in the logical decoding section.

Yes, I think so. Before rewriting to SGML I wanted to solicit opinions
on what should be hidden away in a for-developers README file and what
parts deserve exposure in the public docs.

> A quick readthrough didn't have a separate description of the
> "sub-protocol" in which changes and such are encoded - I think we're
> going to need that.

It didn't quite make the first cut as I have to make a couple of edits
to reflect late changes. I should be able to follow up with that later
today.

The protocol design documentation actually predates the plugin its
self, though it saw a few changes where it became clear something
wouldn't work as envisioned. It's been quite a pleasure starting with
a detailed design, then implementing it.

> There's a bunch of changes that are hinted at in the files in various
> places. Could you extract the ones you think need to be fixed before
> integration see in some central place (or just an email)?

Yep, writing that up at the moment. I didn't want to make the initial
post too verbose.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

02 ноября 2015 г., 16:10:32

On 2 November 2015 at 20:17, Craig Ringer <craig@2ndquadrant.com> wrote:
> Hi all
>
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.

A few points are likely to come up in anything but the most cursory
examination of the patch.

The README alludes to protocol docs that aren't in the tree. A
followup will add them shortly, they just need a few tweaks.

There are pg_regress tests, but they're limited. The output plugin
uses the binary output mode, and pg_regress doesn't play well with
that at all. Timestamps, XIDs, LSNs, etc are embedded in the output.
Also, pglogical its self emits LSNs and timestamps in commit messages.
Some things, like the startup message, are likely to contain variable
data in future too. So we can't easily do a "dumb" comparison of
expected output.

That's why the bulk of the tests in test/ are in Python, using
psycopg2. Python and psycopg2 were used partly because of the
excellent work done by Oleksandr Shulgin at Zalando
(https://github.com/zalando/psycopg2/tree/feature/replication-protocol,
https://github.com/psycopg/psycopg2/pull/322) which means we can
connect to the walsender and consume the replication protocol, rather
than relying only on the SQL interfaces. Both are supported, and only
the SQL interface is used by default. It also means the tests can have
logic to validate the protocol stream, examining it message by message
to ensure it's exactly what's expected. Rather than a diff where two
lines of binary gibberish don't match, you get a specific error. Of
course, I'm aware that the buildfarm animals aren't required to have
Python, let alone a patched psycopg2, so we can't rely on these as
smoketests. That's why the pg_regress tests are there too.

There another extension inside it, in
contrib/pglogical_output/examples/hooks . I'm not sure if this should
be separated out into a separate contrib/ since it's very tightly
coupled to pglogical_output. Its purpose is to expose the hooks from
pglogical_output to SQL, so that they can be implemented by plpgsql or
whatever, instead of having to be C functions. It's not integrated
into pglogical_output proper because I see this as mainly a test and
prototyping facility. It's necessary to have this in order for the
unit tests to cover filtering and hooks, but most practical users will
never want or need it. So I'd rather not integrate it into
pglogical_output proper.

pglogical_output has headers, and it installs them into Pg's include/
tree at install time. This is not something that prior contribs have
done, so there's no policy for it as yet. The reason for doing so is
that the output plugin exposes a hooks API so that it can be reused by
different clients with different needs, rather than being tightly
coupled to just one downstream user. For example, it makes no
assumptions about things like what replication origin names mean -
keeping with the design of replication origins, which just provide
mechanism without policy. That means that the client needs to tell the
output plugin how to filter transactions if it wants to do selective
replication on a node-by-node basis. Similarly, there's no built-in
support for selective replication on a per-table basis, just a hook
you can implement. So clients can provide their own policy for how to
decide what tables to replicate. When we're calling hooks for each and
every row we really want a C function pointer so we can avoid the need
to go through the fmgr each time, and so we can pass a `struct
Relation` and bypass the need for catalog lookups. That sort of thing.

Table metadata is sent for each row. It really needs to be sent once
for each consecutive series of rows for the same table. Some care is
required to make sure it's invalidated and re-sent when the table
structure changes mid-series. So that's a pending change. It's
important for efficiency, but pretty isolated and doesn't make the
plugin less useful otherwise, so I thought it could wait.

Sending the whole old tuple is not yet supported, per the fixme in
pglogical_write_update . It should really be a TODO, since to support
this we really need a way to keep track of replica identity for a
table, but also WAL-log the whole old tuple. (ab)using REPLICA
IDENTITY FULL to log the old tuple means we lose information about
what the real identity key is. So this is more of a wanted future
feature, and I'll change it to a TODO.

I'd like to delay some ERROR messages until after the startup
parameters are sent. That way the client can see more info about the
server's configuration, version, capabilities, etc, and possibly
reconnect with acceptable settings. Because a logical decoding plugin
isn't allowed to generate input during its startup callback, though,
this could mean indefinitely delaying an error until the upstream does
some work that results in a decoding callback. So for now errors on
protocol mismatches, etc, are sent immediately.

Text encoding names are compared byte-wise. They should be looked up
in the catalogs and compared properly. Just not done yet.

I think those are the main points.

-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

02 ноября 2015 г., 17:36:18

On 2 November 2015 at 20:17, Craig Ringer <craig@2ndquadrant.com> wrote:
> Hi all
>
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.

Here's the protocol documentation discussed in the README. It's
asciidoc at the moment, so it can be formatted into something with
readable tables.

If everyone thinks it's reasonable to document the pglogical protocol
as part of the SGML docs then it can be converted. Since the walsender
protocol is documented in the public SGML docs it probably should be,
it's just a matter of deciding what goes where.

Thanks to Darko Milojković for the asciidoc conversion. All errors are
likely to be my edits.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

pglogical.txt

Re: pglogical_output - a general purpose logical decoding output plugin

От

"Shulgin, Oleksandr"

Дата:

02 ноября 2015 г., 17:41:34

On Mon, Nov 2, 2015 at 1:17 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

Hi all

I'd like to submit pglogical_output for inclusion in the 9.6 series as
a contrib.

Yay, that looks pretty advanced! :-)

Still digesting...

Alex

Re: pglogical_output - a general purpose logical decoding output plugin

От

Jim Nasby

Дата:

02 ноября 2015 г., 21:58:45

On 11/2/15 8:36 AM, Craig Ringer wrote:
> Here's the protocol documentation discussed in the README. It's
> asciidoc at the moment, so it can be formatted into something with
> readable tables.

Is this by chance up on github? It'd be easier to read the final output 
there than the raw asciidoctor. ;)

Great work on this!
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

03 ноября 2015 г., 11:41:44

On 3 November 2015 at 02:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 11/2/15 8:36 AM, Craig Ringer wrote:
>>
>> Here's the protocol documentation discussed in the README. It's
>> asciidoc at the moment, so it can be formatted into something with
>> readable tables.
>
>
> Is this by chance up on github? It'd be easier to read the final output
> there than the raw asciidoctor. ;)

Not yet, no. Should be able to format it to PDF, HTML, etc if needed though.

Depending on the consensus here, I'm expecting the protocol docs will
likely get turned into a plaintext formatted README in the source
tree, or into SGML docs.

The rest are in rather more readable Markdown form at this point,
again pending opinions on where they should live - in the public SGML
docs or in-tree READMEs.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

03 ноября 2015 г., 11:46:03

On 3 November 2015 at 16:41, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 3 November 2015 at 02:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> On 11/2/15 8:36 AM, Craig Ringer wrote:
>>>
>>> Here's the protocol documentation discussed in the README. It's
>>> asciidoc at the moment, so it can be formatted into something with
>>> readable tables.
>>
>>
>> Is this by chance up on github? It'd be easier to read the final output
>> there than the raw asciidoctor. ;)
>
> Not yet, no. Should be able to format it to PDF, HTML, etc if needed though.

In fact, I'll just put a PDF up shortly.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

12 ноября 2015 г., 10:56:47

On 3 November 2015 at 02:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 11/2/15 8:36 AM, Craig Ringer wrote:
Here's the protocol documentation discussed in the README. It's
asciidoc at the moment, so it can be formatted into something with
readable tables.

Is this by chance up on github? It'd be easier to read the final output there than the raw asciidoctor. ;)

HTML for the protocol documentation attached.

The docs are being converted to SGML at the moment.

I'll be pushing an update of pglogical_output soon. Patch in a followup mail.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

protocol.html

Re: pglogical_output - a general purpose logical decoding output plugin

От

Amit Langote

Дата:

12 ноября 2015 г., 12:38:38

On 2015/11/02 23:36, Craig Ringer wrote:
> On 2 November 2015 at 20:17, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Hi all
>>
>> I'd like to submit pglogical_output for inclusion in the 9.6 series as
>> a contrib.
> 
> Here's the protocol documentation discussed in the README. It's
> asciidoc at the moment, so it can be formatted into something with
> readable tables.

Kudos! From someone who doesn't really read wire protocol docs a lot, this
was such an enlightening read.

Thanks,
Amit

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

12 ноября 2015 г., 17:24:13

Hi all

Here's an updated pglogical_output patch.

Selected changes since v1:

- add json protocol output support

- fix encoding comparisons to use parsed encoding not string name

- import protocol documentation

- significantly expand pg_regress tests

- move pglogical_output_plhooks to a top-level contrib

- remove 9.4 compatibility

Full diff since last version at https://github.com/2ndQuadrant/postgres/compare/pglogical-output-v1...2ndQuadrant:pglogical-output-v2

Notably, it now has support for a basic json-format text-based protocol as well as the main binary protocol. Mostly for debugging, testing and diagnostics.

I've used the newly added json support in pglogical_output to significantly expand the pg_regress tests, since the json mode can support text-mode decoding in the SQL interface to logical decoding, where the lsn, xid, etc can be skipped.

I've removed the Python based test framework. Test coverage in-tree is reduced as a result, with no coverage of the details of the binary protocol, no coverage of walsender based use, etc. I'll have a look at whether it'll be practical to convert the tests to Perl code driving pg_recvlogical as a co-process but don't think evaluation should wait on this.

I've also removed the 9.4 backward compatibility code from the version submitted for 9.6.

Docs conversion to SGML is still pending/WIP.

I've been unable to find a reasonable way to send the startup message before raising an ERROR when there's an issue with parameter/protocol negotiation. Advice/suggestions appreciated. The main issue is that the setup callback can't write to the protocol stream since in walsender mode the walsender isn't ready for writes yet. (Otherwise we could just write a record with InvalidXLogRecPtr, etc). Delaying the startup msg to the first begin callback works, as is done now. But delaying an error would involve allowing the begin callback to complete, then ERRORing at the *next* callback, which could be anything. Very ugly for what should be an uncommon case. So for now ERRORs are immediate, even though that makes negotiation much more difficult. Ideas welcomed.

(Cc'd a few people who expressed interest)

Вложения

0001-pglogical_output-v2.patch

Re: pglogical_output - a general purpose logical decoding output plugin

От

Peter Eisentraut

Дата:

16 ноября 2015 г., 04:57:55

On 11/2/15 7:17 AM, Craig Ringer wrote:
> The output plugin is suitable for a number of uses. It's designed
> primarily to supply a data stream to drive a logical replication
> client running in another PostgreSQL instance, like the pglogical
> client discussed at PGConf.EU 2015.

So where is that client?

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

16 ноября 2015 г., 05:10:48

On 16 November 2015 at 09:57, Peter Eisentraut <peter_e@gmx.net> wrote:

On 11/2/15 7:17 AM, Craig Ringer wrote:
> The output plugin is suitable for a number of uses. It's designed
> primarily to supply a data stream to drive a logical replication
> client running in another PostgreSQL instance, like the pglogical
> client discussed at PGConf.EU 2015.

So where is that client?

Not finished baking yet - in particular, the catalogs and UI are still in flux. The time scale for getting that out is in the order of a few weeks.

The output plugin stands alone to a fair degree, especially with the json output support. Comments would be greatly appreciated, especially from people who're involved in replication, are currently using triggers to feed data to external systems, etc.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

18 ноября 2015 г., 22:43:05

W dniu 12.11.2015, czw o godzinie 22∶23 +0800, użytkownik Craig Ringer
napisał:
> Hi all
>
> Here's an updated pglogical_output patch.
>
> Selected changes since v1:
>
>     - add json protocol output support
>     - fix encoding comparisons to use parsed encoding not string name
>     - import protocol documentation
>     - significantly expand pg_regress tests
>     - move pglogical_output_plhooks to a top-level contrib
>     - remove 9.4 compatibility

I've just skimmed through the patch.
As you removed 9.4 compatibility - are mentions of 9.4 and 9.5 
compatibility needed in README.md and README.plhooks?
It's not much text, but I'm not sure whether they shouldn't be
removed for 9.6-targeting change.

--
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

20 ноября 2015 г., 06:16:41

On 19 November 2015 at 03:42, Tomasz Rybak <tomasz.rybak@post.pl> wrote:

I've just skimmed through the patch.
As you removed 9.4 compatibility - are mentions of 9.4 and 9.5
compatibility needed in README.md and README.plhooks?

Thanks. Removed.

Here's a v3 patch. Changes since v2:

- fix stray braces in commit json output

- significantly expand pg_regress tests

- quote all json keys

- validate all json in tests

- decode json startup message to a table

- filter out mutable keys in json startup message

- troubleshooting section of README covers using json

- documentation copy editing

Full patch against master attached. Diff since v2 at

https://github.com/2ndQuadrant/postgres/compare/pglogical-output-v2...2ndQuadrant:pglogical-output-v3

While the downstream client using the binary protocol is still too WIP, the json support is useful for all sorts of apps. It has all the same support for hooks and filtering, origin information, etc. I'd love to hear from anyone who's trying it out.

SGML conversion of docs still WIP.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Add-contrib-pglogical_output-a-logical-decoding-plug.patch

Re: pglogical_output - a general purpose logical decoding output plugin

От

Petr Jelinek

Дата:

02 декабря 2015 г., 13:38:35

Hi,

I can't really do huge review considering I wrote half of the code, but 
I have couple of things I noticed.

First, I wonder if it would be useful to mention somewhere, even if it's 
only here in the mailing list how can the protocol be extended in 
non-breaking way in future for transaction streaming if we ever get 
that. I am mainly asking for this because the protocol does not 
currently send xid for every change as it's not necessary, but for 
streaming it will be. I know of couple of ways myself but I think they 
should be described publicly.

The other thing is that I think we don't need the "forward_changesets" 
parameter which currently decides if to forward changes that didn't 
originate on local node. There is already hook for origin filtering 
which provides same functionality in more flexible way so it seems 
redundant to also have special boolean option for it.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

07 декабря 2015 г., 06:10:26

On 2 December 2015 at 18:38, Petr Jelinek <petr@2ndquadrant.com> wrote:

First, I wonder if it would be useful to mention somewhere, even if it's only here in the mailing list how can the protocol be extended in non-breaking way in future for transaction streaming if we ever get that.

Good point.

I'll address that in the DESIGN.md in the next rev.

Separately, it's looking like xact streaming is possibly more complex than I hoped due to cache invalidation issues, but I haven't been able to fully understand the problem yet.

The other thing is that I think we don't need the "forward_changesets" parameter which currently decides if to forward changes that didn't originate on local node. There is already hook for origin filtering which provides same functionality in more flexible way so it seems redundant to also have special boolean option for it.

Removed, change pushed.

Also pushed a change to expose the decoded row data to row filter hooks.

I won't cut a v4 for this, as I'm working on merging the SGML-ified docs and will do a v4 with that and the above readme change once that's done.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Michael Paquier

Дата:

22 декабря 2015 г., 10:18:07

On Mon, Dec 7, 2015 at 12:10 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Removed, change pushed.
>
> Also pushed a change to expose the decoded row data to row filter hooks.
>
> I won't cut a v4 for this, as I'm working on merging the SGML-ified docs and
> will do a v4 with that and the above readme change once that's done.

Patch is moved to next CF, you seem to be still working on it..
-- 
Michael

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

22 декабря 2015 г., 12:55:59

On 22 December 2015 at 15:17, Michael Paquier <michael.paquier@gmail.com> wrote:

On Mon, Dec 7, 2015 at 12:10 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Removed, change pushed.
>
> Also pushed a change to expose the decoded row data to row filter hooks.
>
> I won't cut a v4 for this, as I'm working on merging the SGML-ified docs and
> will do a v4 with that and the above readme change once that's done.

Patch is moved to next CF, you seem to be still working on it.

Thanks.

Other than SGML-ified docs it's ready. The docs are a hard pre-req of course. In any case most people appear to be waiting for the downstream before looking at it at all, so bumping it makes sense.

I'm a touch frustrated by that, as a large part of the point of submitting the output plugin separately and in advance of the downstream was to get attention for it separately, as its own entity. A lot of effort has been put into making this usable for more than just a data source for pglogical's replication tools. In retrospect naming it pglogical_output was probably unwise.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Robert Haas

Дата:

22 декабря 2015 г., 20:37:41

On Tue, Dec 22, 2015 at 4:55 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I'm a touch frustrated by that, as a large part of the point of submitting
> the output plugin separately and in advance of the downstream was to get
> attention for it separately, as its own entity. A lot of effort has been put
> into making this usable for more than just a data source for pglogical's
> replication tools. In retrospect naming it pglogical_output was probably
> unwise.

It's not too late to rename it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

03 января 2016 г., 21:22:11

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

Applies cleanly on current master (b416c0bb622ce5d33fdbec3bbce00451132f10ec).

Builds without any problems on current Debian unstable (am64 arch, GCC 5.3.1-4, glibc 2.21-6)
There are 2 errors in tests, but they also occur on trunk build.
parallel group (20 tests): json_encoding combocid portals_p2 advisory_lock tsdicts xmlmap equivclass guc
functional_depsdependency json select_views cluster tsearch window jsonb indirect_toast bitmapops foreign_key
foreign_data select_views ... FAILED portals_p2 ... ok

parallel group (2 tests): create_view create_index create_index ... FAILED create_view
...ok

README.md:
+Only one database is replicated, rather than the whole PostgreSQL install. A
[...]
+Unlike block-level ("physical") streaming replication, the change stream from
+the `pglogical` output plugin is compatible across different PostgreSQL
+versions and can even be consumed by non-PostgreSQL clients.
+
+Because logical decoding is used, only the changed rows are sent on the wire.
+There's no index change data, no vacuum activity, etc transmitted.
+
+The use of a replication slot means that the change stream is reliable and
+crash-safe. If

Isn't it feature of logical replication, not this particular plugin?
I'm not sure whether all that text needs to be repeated here.
OTOH this is good summary - so maybe just add links to base
documentation about replication, logical replication, slots, etc.

+# Usage
+
+The overall flow of client/server interaction is:
This part (flow) belongs to DESIGN.md, not to usage.

+* [Client issues `IDENTIFY_SYSTEM`
Is the [ needed here?

protocol.txt
Contains wrapped lines, but also very long lines. While long
lines make sense for tables, they should not occur in paragraphs, e.g. in
+== Arguments client supplies to output plugin
and following ones. It looks like parts of this file were formatted, and parts not.

In summary, documentation is mostly OK, and it describe plugin quite nicely.
The only thing I haven't fully understood is COPY-related - so it might be
good to extend a bit. And how COPY relates to JSON output format?

pglogical_output_plhooks.c
+#if PG_VERSION_NUM >= 90500
+ username = GetUserNameFromId(GetUserId(), false);
+#else
+ username = GetUserNameFromId(GetUserId());
+#endif

Is it needed? I mean - will tris module compiled for 9.4 (or earlier)
versions, or will it be 9.5 (or even 9.6)+ only?

pglogical_output.c
+ /*
+ * Will we forward changesets? We have to if we're on 9.4;
+ * otherwise honour the client's request.
+ */
+ if (PG_VERSION_NUM/100 == 904)
+ {
+ /*
+ * 9.4 unconditionally forwards changesets due to lack of
+ * replication origins, and it can't ever send origin info
+ * for the same reason.
+ */

Similar case. In mail from 2015-11-12 (path v2) you removed v9.4 compatibility,
so I'm not sure whether checking for 9.4 or 9.5 makes any sense now.

This review focuses mostly on documentation, but I went through both
documentation and code. I understood most of the code (and it makes
sense after some cosideration :-) ), but I'm not proficient in PostgreSQL
to be fully sure that there are no hidden bugs.
At the same time - I haven't seen problems and suspicious fragments of code,
so after fixing mentioned problems it should go to the commiter.

Best regards.

The new status of this patch is: Waiting on Author

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

"Shulgin, Oleksandr"

Дата:

04 января 2016 г., 20:25:33

On Sun, Jan 3, 2016 at 7:21 PM, Tomasz Rybak <tomasz.rybak@post.pl> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

Applies cleanly on current master (b416c0bb622ce5d33fdbec3bbce00451132f10ec).

Builds without any problems on current Debian unstable (am64 arch, GCC 5.3.1-4, glibc 2.21-6)

A make from an external build dir fails on install, suggested fix:

install: all

$(MKDIR_P) '$(DESTDIR)$(includedir)'/pglogical_output

- $(INSTALL_DATA) pglogical_output/hooks.h '$(DESTDIR)$(includedir)'/pglogical_output

+ $(INSTALL_DATA) $(top_srcdir)/contrib/pglogical_output/pglogical_output/hooks.h '$(DESTDIR)$(includedir)'/pglogical_output

Alex

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Alvaro Herrera

Дата:

04 января 2016 г., 21:17:59

Shulgin, Oleksandr wrote:

> A make from an external build dir fails on install, suggested fix:
> 
> install: all
> $(MKDIR_P) '$(DESTDIR)$(includedir)'/pglogical_output
> - $(INSTALL_DATA) pglogical_output/hooks.h
> '$(DESTDIR)$(includedir)'/pglogical_output
> + $(INSTALL_DATA)
> $(top_srcdir)/contrib/pglogical_output/pglogical_output/hooks.h
> '$(DESTDIR)$(includedir)'/pglogical_output

Actually you should be able to use $(srcdir)/hooks.h ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Peter Eisentraut

Дата:

06 января 2016 г., 20:17:43

On 12/22/15 4:55 AM, Craig Ringer wrote:
> I'm a touch frustrated by that, as a large part of the point of
> submitting the output plugin separately and in advance of the downstream
> was to get attention for it separately, as its own entity. A lot of
> effort has been put into making this usable for more than just a data
> source for pglogical's replication tools.

I can't imagine that there is a lot of interest in a replication tool
where you only get one side of it, no matter how well-designed or
general it is.  Ultimately, what people will want to do with this is
replicate things, not muse about its design aspects.  So if we're going
to ship a replication solution in PostgreSQL core, we should ship all
the pieces that make the whole system work.

Also, I think there are two kinds of general systems: common core, and
all possible features.  A common core approach could probably be made
acceptable with the argument that anyone will probably want to do things
this way, so we might as well implement it once and give it to people.
In a way, the logical decoding interface is the common core, as we
currently understand it.  But this submission clearly has a lot of
features beyond just the basics, and we could probably go through them
one by one and ask, why do we need this bit?  So that kind of system
will be very hard to review as a standalone submission.

Re: pglogical_output - a general purpose logical decoding output plugin

От

Greg Stark

Дата:

06 января 2016 г., 21:33:32

On Wed, Jan 6, 2016 at 5:17 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> I can't imagine that there is a lot of interest in a replication tool
> where you only get one side of it, no matter how well-designed or
> general it is.

Well I do have another purpose in mind myself so I do appreciate it
being available now and separately.

However you also need to keep in mind that any of these other purposes
will be more or less equally large projects as logical replication.
There's no particular reason to expect one to be able to start up
today and provide feedback faster than the replication code that's
already been under development for ages. I haven't even started on my
pet project and probably won't until February. And I haven't even
thought through the details of it so I don't even know if it'll be a
matter of a few weeks or months or more.

The one project that does seem like it should be fairly fast to get
going and provide a relatively easy way to test the APIs separately
would be an auditing tool. I saw one go by but didn't look into
whether it used logical decoding or another mechanism. One based on
logical decoding does seem like it would let you verify that, for
example, the api gave the right information to filter effectively and
store meta information to index the audit records effectively.

-- 
greg

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

07 января 2016 г., 10:50:53

On 7 January 2016 at 01:17, Peter Eisentraut <peter_e@gmx.net> wrote:

On 12/22/15 4:55 AM, Craig Ringer wrote:
> I'm a touch frustrated by that, as a large part of the point of
> submitting the output plugin separately and in advance of the downstream
> was to get attention for it separately, as its own entity. A lot of
> effort has been put into making this usable for more than just a data
> source for pglogical's replication tools.

I can't imagine that there is a lot of interest in a replication tool
where you only get one side of it, no matter how well-designed or
general it is.

Well, the other part was posted most of a week ago.

http://www.postgresql.org/message-id/5685BB86.5010901@2ndquadrant.com

... but this isn't just about replication. At least, not just to another PostgreSQL instance. This plugin is designed to be general enough to use for replication to other DBMSes (via appropriate receivers), to replace trigger-based data collection in existing replication systems, for use in audit data collection, etc.

Want to get a stream of data out of PostgreSQL in a consistent, simple way, without having to add triggers or otherwise interfere with the origin database? That's the purpose of this plugin, and it doesn't care in the slightest what the receiver wants to do with that data. It's been designed to be usable separately from pglogical downstream and - before the Python tests were rejected in discussions on this list - was tested using a test suite completely separate to the pglogical downstream using psycopg2 to make sure no unintended interdependencies got introduced.

You can do way more than that with the output plugin but you have to write your own downstream/receiver for the desired purpose, since using a downstream based on bgworkers and SPI won't make any sense outside PostgreSQL.

If you just want a canned product to use, see the pglogical post above for the downstream code.

Ultimately, what people will want to do with this is
replicate things, not muse about its design aspects. So if we're going

to ship a replication solution in PostgreSQL core, we should ship all
the pieces that make the whole system work.

I don't buy that argument. Doesn't that mean logical decoding shouldn't have been accepted? Or the initial patches for parallel query? Or any number of other things that're part of incremental development solutions?

(This also seems to contradict what you then argue below, that the proposed feature is too broad and does too much.)

I'd be happy to see both parts go in, but I'm frustrated that nobody's willing to see beyond "replicate from one Pg to another Pg" and see all the other things you can do. Want to replicate to Oracle / MS-SQL / etc? This will help a lot and solve a significant part of the problem for you. Want to stream data to append-only audit logs? Ditto. But nope, it's all about PostgreSQL to PostgreSQL.

Please try to look further into what client applications can do with this directly. I already know it meets the needs of the pglogical downstream. What I was hoping to achieve with posting the output plugin earlier was to get some thought going about what *else* it'd be good for.

Again: pglogical is posted now (it just took longer than expected to get ready) and I'll be happy to see both it and the output plugin included. I just urge people to look at the output plugin as more than a tightly coupled component of pglogical.

Maybe some quality name bikeshedding for the output plugin would help ;)

Also, I think there are two kinds of general systems: common core, and
all possible features. A common core approach could probably be made
acceptable with the argument that anyone will probably want to do things
this way, so we might as well implement it once and give it to people.

That's what we're going for here. Extensible, something people can build on and use.

In a way, the logical decoding interface is the common core, as we
currently understand it. But this submission clearly has a lot of
features beyond just the basics

Really? What would you cut? What's beyond the basics here? What basics are you thinking of, i.e what set of requirements are you working towards / needs are you seeking to meet?

We cut this to the bone to produce a minimum viable logical replication solution. Especially the output plugin.

Cut the hook interfaces for row and xact filtering? You lose the ability to use replication origins, crippling functionality, and for no real gain in simplicity.

Remove JSON support? That's what most people are actually likely to want to use when using the output plugin directly, and it's important for debugging/tracing/diagnostics. It's a separate feature, to be sure, but it's also a pretty trivial addition.

and we could probably go through them
one by one and ask, why do we need this bit? So that kind of system
will be very hard to review as a standalone submission.

Again, I disagree. I think you're looking at this way too narrowly.

I find it quite funny, actually. Here we go and produce something that's a nice re-usable component that other people can use in their products and solutions ... and all anyone does is complain that the other part required to use it as a canned product isn't posted yet (though it is now). But with BDR all anyone ever does is complain that it's too tightly coupled to the needs of a single product and the features extracted from it, like replication origins, should be more generic and general purpose so other people can use them in their products too. Which is it going to be?

It would be helpful if you could take a step back and describe what *you* think logical replication for PostgreSQL should look like. You clearly have a picture in mind of what it should be, what requirements it satisfies, etc. If you're going to argue based on that it'd be very helpful to describe it. I might've missed some important points you've seen and you might've overlooked issues I've seen.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

07 января 2016 г., 10:58:39

On 7 January 2016 at 02:32, Greg Stark <stark@mit.edu> wrote:

However you also need to keep in mind that any of these other purposes
will be more or less equally large projects as logical replication.
There's no particular reason to expect one to be able to start up
today and provide feedback faster than the replication code that's
already been under development for ages.

Good point. Though it's been out there for a while now.

The one project that does seem like it should be fairly fast to get
going and provide a relatively easy way to test the APIs separately
would be an auditing tool.

Yep. The json stream is very simple to consume for that purpose too, providing pre-jsonified data from the upstream so you don't have to deal with all the mess of maintaining matching type and table definitions on the downstream to construct HeapTuple s, etc.

You generally don't want to write audit logs into a relation of the same form as the original table plus some extra columns. It's nicer to work with, sure, but it means you're almost totally locked in to your original table definition. You can add new nullable columns but that's about it, since there's no way to retroactively construct audit data for already logged entries. So I think the json output mode is more interesting for auditing.

An interesting hiccup for audit is that AFAIK nothing in WAL associates user identity information with a change, so we can't decode and send info on which user performed which xact or changed a given tuple. SET ROLE and SECURITY DEFINER mean there isn't a 1:1 mapping of xact to user either, and info would probably have to be tuple level. Fun. The WAL messages patch would be helpful for this sort of thing, allowing apps to use triggers to add arbitrary info to xlog; XLOG_NOOP could help too, just in an uglier way with more room for confusion between unrelated products using it.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

07 января 2016 г., 11:44:23

Here's v5 of the pglogical patch.

Changes:

* Implement relation metadata caching

* Add the relmeta_cache_size parameter for cache control

* Add an extension to get version information

* Create the pglogical_output header directory on install

* Restore 9.4 compatibility (it's small)

* Allow row filter hooks to see details of the changed tuple

* Remove forward_changesets from pglogical_output (use a hook if you want this functionality)

I'm not sure if 9.4 compat will be desirable or not. It's handy to avoid needing a separate backported version, but also confusing to do a PGXS build within a 9.6 tree against 9.4...

Вложения

pglogical-output-v5.patch

Re: pglogical_output - a general purpose logical decoding output plugin

От

Jarred Ward

Дата:

07 января 2016 г., 20:28:38

I didn't receive a response on the bugs mailing list for the following bug, so I was hoping we could triage to someone with more familiarity with Postgres internals than I to fix.

This ticket seems like folks who are invested in logical decoding.

The attached script is a simple workload that logical decoding is unable to decode. It causes an unrecoverable crash in the logical decoder with 'ERROR: subxact logged without previous toplevel record'.

On Thu, Jan 7, 2016 at 12:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Here's v5 of the pglogical patch.

Changes:

* Implement relation metadata caching
* Add the relmeta_cache_size parameter for cache control
* Add an extension to get version information
* Create the pglogical_output header directory on install
* Restore 9.4 compatibility (it's small)
* Allow row filter hooks to see details of the changed tuple
* Remove forward_changesets from pglogical_output (use a hook if you want this functionality)

I'm not sure if 9.4 compat will be desirable or not. It's handy to avoid needing a separate backported version, but also confusing to do a PGXS build within a 9.6 tree against 9.4...

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

bug.sql

Re: pglogical_output - a general purpose logical decoding output plugin

От

Andres Freund

Дата:

07 января 2016 г., 20:45:37

On 2016-01-07 09:28:29 -0800, Jarred Ward wrote:
> I didn't receive a response on the bugs mailing list for the following bug,
> so I was hoping we could triage to someone with more familiarity with
> Postgres internals than I to fix.

Please don't post to unrelated threads, that just confuses things.

Andres

PS: Yes, I do plan to look at that issue at some point.

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

19 января 2016 г., 00:47:26

W dniu 07.01.2016, czw o godzinie 15∶50 +0800, użytkownik Craig Ringer
napisał:
> On 7 January 2016 at 01:17, Peter Eisentraut <peter_e@gmx.net> wrote:
> > On 12/22/15 4:55 AM, Craig Ringer wrote:
> > > I'm a touch frustrated by that, as a large part of the point of
> > > submitting the output plugin separately and in advance of the
> > downstream
> > > was to get attention for it separately, as its own entity. A lot
> > of
> > > effort has been put into making this usable for more than just a
> > data
> > > source for pglogical's replication tools.
> >

Maybe chosen name was not the best one - I assumed from the very 
eginning that it's replication solution and not something separate.

> > I can't imagine that there is a lot of interest in a replication
> > tool
> > where you only get one side of it, no matter how well-designed or
> > general it is.
> Well, the other part was posted most of a week ago.
>
> http://www.postgresql.org/message-id/5685BB86.5010901@2ndquadrant.com
>
> ... but this isn't just about replication. At least, not just to
> another PostgreSQL instance. This plugin is designed to be general
> enough to use for replication to other DBMSes (via appropriate
> receivers), to replace trigger-based data collection in existing
> replication systems, for use in audit data collection, etc.
>
> Want to get a stream of data out of PostgreSQL in a consistent,
> simple way, without having to add triggers or otherwise interfere
> with the origin database? That's the purpose of this plugin, and it
> doesn't care in the slightest what the receiver wants to do with that
> data. It's been designed to be usable separately from pglogical
> downstream and - before the Python tests were rejected in discussions
> on this list - was tested using a test suite completely separate to
> the pglogical downstream using psycopg2 to make sure no unintended
> interdependencies got introduced.
>
> You can do way more than that with the output plugin but you have to
> write your own downstream/receiver for the desired purpose, since
> using a downstream based on bgworkers and SPI won't make any sense
> outside PostgreSQL.
>

Put those 3 paragraphs into README.md - and this is not a joke.
This is very good rationale behind this plugin; for now README
starts with link to documentation describing logical decoding
and the second paragraph talks about replication.
So when replication (and only it) is in README, it should be
no wonder that people (only - or mostly) think about replication.

Maybe we should think about changing the name to something like
logical_decoder or logical_streamer, to divorce this plugin
from pglogical? Currently even name suggests tight coupling - and
in other way than it should be. pglogical depends on this plugin,
not the other way around.

> If you just want a canned product to use, see the pglogical post
> above for the downstream code.
>
>  
> > Ultimately, what people will want to do with this is
> > replicate things, not muse about its design aspects. So if we're
> > going
> >  to ship a replication solution in PostgreSQL core, we should ship
> > all
> > the pieces that make the whole system work.
> I don't buy that argument. Doesn't that mean logical decoding
> shouldn't have been accepted? Or the initial patches for parallel
> query? Or any number of other things that're part of incremental
> development solutions?
>
> (This also seems to contradict what you then argue below, that the
> proposed feature is too broad and does too much.)
>
> I'd be happy to see both parts go in, but I'm frustrated that
> nobody's willing to see beyond "replicate from one Pg to another Pg"
> and see all the other things you can do. Want to replicate to Oracle
> / MS-SQL / etc? This will help a lot and solve a significant part of
> the problem for you. Want to stream data to append-only audit logs?
> Ditto. But nope, it's all about PostgreSQL to PostgreSQL.
>
> Please try to look further into what client applications can do with
> this directly. I already know it meets the needs of the pglogical
> downstream. What I was hoping to achieve with posting the output
> plugin earlier was to get some thought going about what *else* it'd
> be good for.
>
> Again: pglogical is posted now (it just took longer than expected to
> get ready) and I'll be happy to see both it and the output plugin
> included. I just urge people to look at the output plugin as more
> than a tightly coupled component of pglogical.
>
> Maybe some quality name bikeshedding for the output plugin would help
> ;)
>
> > Also, I think there are two kinds of general systems: common core,
> > and
> > all possible features.  A common core approach could probably be
> > made
> > acceptable with the argument that anyone will probably want to do
> > things
> > this way, so we might as well implement it once and give it to
> > people.
> That's what we're going for here. Extensible, something people can
> build on and use.
>  
> >  In a way, the logical decoding interface is the common core, as we
> > currently understand it.  But this submission clearly has a lot of
> > features beyond just the basics
> Really? What would you cut? What's beyond the basics here? What
> basics are you thinking of, i.e what set of requirements are you
> working towards / needs are you seeking to meet?
>
> We cut this to the bone to produce a minimum viable logical
> replication solution. Especially the output plugin.
>
> Cut the hook interfaces for row and xact filtering? You lose the
> ability to use replication origins, crippling functionality, and for
> no real gain in simplicity.
>
> Remove JSON support? That's what most people are actually likely to
> want to use when using the output plugin directly, and it's important
> for debugging/tracing/diagnostics. It's a separate feature, to be
> sure, but it's also a pretty trivial addition.
>  
> >  and we could probably go through them
> > one by one and ask, why do we need this bit?  So that kind of
> > system
> > will be very hard to review as a standalone submission.
> >
> Again, I disagree. I think you're looking at this way too narrowly.
>
> I find it quite funny, actually. Here we go and produce something
> that's a nice re-usable component that other people can use in their
> products and solutions ... and all anyone does is complain that the
> other part required to use it as a canned product isn't posted yet
> (though it is now). But with BDR all anyone ever does is complain
> that it's too tightly coupled to the needs of a single product and
> the features extracted from it, like replication origins, should be
> more generic and general purpose so other people can use them in
> their products too. Which is it going to be?
>
> It would be helpful if you could take a step back and describe what
> *you* think logical replication for PostgreSQL should look like. You
> clearly have a picture in mind of what it should be, what
> requirements it satisfies, etc. If you're going to argue based on
> that it'd be very helpful to describe it. I might've missed some
> important points you've seen and you might've overlooked issues I've
> seen. 
>

This is rather long, but I do not want to cut to much, because
it shows slight problem with workflow in PostgreSQL community.
I'm writing as someone trying to increase my involvement,
not fully outsider, but not yet feeling fully belonging.

I'll try to explain what I mean taking this patch as example.
It started
as part of pglogical replication, but you split
it to ease review. But
this origin shows - in name, in comments,
in README. It's good example
of scratching itch - but without
connection to others, or maybe without
wider picture.
Don't get me wrong - having code to discuss is much
better
than just bikeshedding what we'd like to have.
And changes in v5
(like caching and passing tuples to hooks)
give hope for some work.
OTOH,
by looking at parallel queries, maybe once it lands
in repository it'll
get more attention?

I can feel your frustration. Coming to community without own
itch to scratch is also a bit frustrating - I do not know where
to start, what needs the most attention. I can see that
commitfests are in dire need for reviewers, so I started with
them. But at the same time I can only check whether code looks
correct, applies cleanly, whether it compiles, whether
tests pass.

I do not see bigger picture - and also cannot see emails with
discussion about long- or mid-term direction or vision.
It makes harder to feel that it matters and to decide
which patch look at.
Both communities I feel attached to (Debian and PostgreSQL)
differ from many highly visible FLOSS projects that they
do not have one backing company, nor benevolent dictator.
It gives them freedom to pursue different goals without
risk of disrupting power structure, but at the same time
it make it harder to connect the dots and see how project
is doing.

OK, we went quite far away from review. I do not have closing
remarks - only that I hope to provide better review by the weekend.

And let's discuss name - I do not fully like pglogical_decoding.

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

19 января 2016 г., 00:48:08

I just quickly went through patch v5.
It's rather big patch, on (or beyond) my knowledge of PostgreSQL to perform high-quality review. But during this week
I'lltry to send reviews of parts of the code, as going through it in one sitting seems impossible.
 

One proposed change - update copyright to 2016.

i'd also propose to change of pglogical_output_control.in:
comment = 'general purpose logical decoding plugin'
to something like "general-purpoer plugin decoding and generating stream of logical changes"

We might also think about changing name of plugin to something resembling "logical_streaming_decoder" or even
"logical_streamer"

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

19 января 2016 г., 17:26:56

On 19 January 2016 at 05:47, Tomasz Rybak <tomasz.rybak@post.pl> wrote:

I just quickly went through patch v5.
It's rather big patch, on (or beyond) my knowledge of PostgreSQL to perform high-quality review. But during this week I'll try to send reviews of parts of the code, as going through it in one sitting seems impossible.

One proposed change - update copyright to 2016.

i'd also propose to change of pglogical_output_control.in:
comment = 'general purpose logical decoding plugin'
to something like "general-purpoer plugin decoding and generating stream of logical changes"

We might also think about changing name of plugin to something resembling "logical_streaming_decoder" or even "logical_streamer"

I'm open to ideas there but I'd want some degree of consensus before undertaking the changes required.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

20 января 2016 г., 01:23:06

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

First part of code review (for about 1/3rd of code):
pglogical_output.h:

+ /* Protocol capabilities */
+ #define PGLOGICAL_PROTO_VERSION_NUM 1
+ #define PGLOGICAL_PROTO_MIN_VERSION_NUM 1
Is this protocol version number and minimal recognized version number,
or major and minor version number? I assume that PGLOGICAL_PROTO_VERSION_NUM
is current protocol version (as in config max_proto_version and
min_proto_version). But why we have MIN_VERSION and not MAX_VERSION?

From code in pglogical_output.c (lines 215-225 it looks like
PGLOGICAL_PROTO_VERSION_NUM is maximum, and PGLOGICAL_PROTO_MIN_VERSION_NUM
is treated as minimal protocol version number.

I can see that those constants are exported in pglogical_infofuncs.c and
pglogical.sql, but I do not understand omission of MAX.


+ typedef struct HookFuncName
+ typedef struct PGLogicalTupleData
I haven't found those used anything, and they are not mentioned in
documentation. Are those structures needed?


+ pglogical_config.c:
+         switch(get_param_key(elem->defname))
+         {
+             val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_UINT32);
Why do we need this line here? All cases contain some variant of
val = get_param_value(elem, false, TYPE);
as first line after "case PARAM_*:" so this line should be removed.

+     val = get_param(options, "startup_params_format", false, false,
+                     OUTPUT_PARAM_TYPE_UINT32, &found);
+ 
+     params_format = DatumGetUInt32(val);
Shouldn't we check "found" here? We work with val (which is Datum(0)) - won't
it throw SIGFAULT, or similar?
Additionally - I haven't found any case where code would use "found"
passed from get_param() - so as it's not used it might be removed.


pglogical_output.c:

+         elog(DEBUG1, "Binary mode rejected: Server and client endian sizeof(Datum) mismatch");
+         return false;
+     }
+ 
+     if (data->client_binary_sizeofint != 0
+         && data->client_binary_sizeofint != sizeof(int))
+     {
+         elog(DEBUG1, "Binary mode rejected: Server and client endian sizeof(int) mismatch");
+         return false;
+     }
+ 
+     if (data->client_binary_sizeoflong != 0
+         && data->client_binary_sizeoflong != sizeof(long))
+     {
+         elog(DEBUG1, "Binary mode rejected: Server and client endian sizeof(long) mismatch");
Isn't "endian" here case of copy-paste from first error?
Error messages should rather look like:    elog(DEBUG1, "Binary mode rejected: Server and client sizeof(Datum)
mismatch");

+ static void pg_decode_shutdown(LogicalDecodingContext * ctx)
In pg_decode_startup we create main memory context, and create hooks memory
context. In pg_decode_shutdown we delete hooks memory context but not main
context. Is this OK, or should we also add:
MemoryContextDelete(data->context);

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

20 января 2016 г., 08:54:17

On 20 January 2016 at 06:23, Tomasz Rybak <tomasz.rybak@post.pl> wrote:

The following review has been posted through the commitfest application:

Thanks!

+ /* Protocol capabilities */
+ #define PGLOGICAL_PROTO_VERSION_NUM 1
+ #define PGLOGICAL_PROTO_MIN_VERSION_NUM 1
Is this protocol version number and minimal recognized version number,
or major and minor version number? I assume that PGLOGICAL_PROTO_VERSION_NUM
is current protocol version (as in config max_proto_version and
min_proto_version). But why we have MIN_VERSION and not MAX_VERSION?

>From code in pglogical_output.c (lines 215-225 it looks like
PGLOGICAL_PROTO_VERSION_NUM is maximum, and PGLOGICAL_PROTO_MIN_VERSION_NUM
is treated as minimal protocol version number.

I can see that those constants are exported in pglogical_infofuncs.c and
pglogical.sql, but I do not understand omission of MAX.

Thanks for stopping to think about this. It's one of the areas I really want to get right but I'm not totally confident of.

The idea here is that we want downwards compatibility as far as possible and maintainable but we can't really be upwards compatible for breaking protocol revisions. So the output plugin's native protocol version is inherently the max protocol version and we don't need a separate MAX.

The downstream connects and declares to the upstream "I speak protocol 2 through 3". The upstream sees this and replies "I speak protocol 1 through 2. We have protocol 2 in common so I will use that." Or alternately replies with an error "I only speak protocol 1 so we have no protocol in common". This is done via the initial parameters passed by the downstream to the logical decoding plugin and then via the startup reply message that's the first message on the logical decoding stream.

We can't do a better handshake than this because the underlying walsender protocol and output plugin API only gives us one chance to send free-form information to the output plugin and it has to do so before the output plugin can send anything to the downstream.

As much as possible I want to avoid ever needing to do a protocol bump anyway, since it'll involve adding conditionals and duplication. That's why the protocol tries so hard to be extensible and rely on declared capabilities rather than protocol version bumps. But I'd rather plan for it than be unable to ever do it in future.

Much (all?) of this is discussed in the protocol docs. I should probably double check that and add a comment that refers to them there.

+ typedef struct HookFuncName

Thanks. That's residue from the prior implementation of hooks, which used direct pg_proc lookups and cached the FmgrInfo in a dynahash. It's no longer required now that we're using a single hook entry point and direct C function calls. Dead code, removed.

+ typedef struct PGLogicalTupleData
I haven't found those used anything, and they are not mentioned in
documentation. Are those structures needed?

That snuck in from the pglogical downstream during the split into a separate tree. It's dead code as far as pglogical_output is concerned. Removed.

+ pglogical_config.c:
+ switch(get_param_key(elem->defname))
+ {
+ val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_UINT32);

Why do we need this line here? All cases contain some variant of
val = get_param_value(elem, false, TYPE);
as first line after "case PARAM_*:" so this line should be removed.

Correct. That seems to be an escapee editing error. Thanks, removed.

+ val = get_param(options, "startup_params_format", false, false,
+ OUTPUT_PARAM_TYPE_UINT32, &found);
+
+ params_format = DatumGetUInt32(val);
Shouldn't we check "found" here? We work with val (which is Datum(0)) - won't
it throw SIGFAULT, or similar?

get_param is called with missing_ok=false so it ERRORs and can never return !found . In any case it'd return (Datum)0 so we'd just get (uint32)0 not a crash.

To make this clearer I've changed get_param so it supports NULL as a value for found.

Additionally - I haven't found any case where code would use "found"
passed from get_param() - so as it's not used it might be removed.

Probably, but I expect it to be useful later. It was used before a restructure of how params get read. I don't mind removing it if you feel strongly about it, but it'll probably just land up being added back at some point.

+ elog(DEBUG1, "Binary mode rejected: Server and client endian sizeof(long) mismatch");
Isn't "endian" here case of copy-paste from first error?

Yes, it is. Thanks.

+ static void pg_decode_shutdown(LogicalDecodingContext * ctx)
In pg_decode_startup we create main memory context, and create hooks memory
context. In pg_decode_shutdown we delete hooks memory context but not main
context. Is this OK, or should we also add:
MemoryContextDelete(data->context);

Good catch. I think a better fix is to make it a child of the logical decoding context so it's deleted automatically. It's actually unnecessary to delete data->hooks_mctxt here for the same reason.

Amended.

I've also patched the tests to handle the failure to fail on an incorrect startup_params_format .

Changes pushed to https://github.com/2ndquadrant/postgres/tree/dev/pglogical-output

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

21 января 2016 г., 00:16:55

W dniu 20.01.2016, śro o godzinie 13∶54 +0800, użytkownik Craig Ringer
napisał:
> On 20 January 2016 at 06:23, Tomasz Rybak <tomasz.rybak@post.pl>
> wrote:
> > The following review has been posted through the commitfest
> > application:
> >
> Thanks!
>  
> >  
> > + /* Protocol capabilities */
> > + #define PGLOGICAL_PROTO_VERSION_NUM 1
> > + #define PGLOGICAL_PROTO_MIN_VERSION_NUM 1
> > Is this protocol version number and minimal recognized version
> > number,
> > or major and minor version number? I assume that
> > PGLOGICAL_PROTO_VERSION_NUM
> > is current protocol version (as in config max_proto_version and
> > min_proto_version). But why we have MIN_VERSION and not
> > MAX_VERSION?
> >
> > From code in pglogical_output.c (lines 215-225 it looks like
> > PGLOGICAL_PROTO_VERSION_NUM is maximum, and
> > PGLOGICAL_PROTO_MIN_VERSION_NUM
> > is treated as minimal protocol version number.
> >
> > I can see that those constants are exported in
> > pglogical_infofuncs.c and
> > pglogical.sql, but I do not understand omission of MAX.
> Thanks for stopping to think about this. It's one of the areas I
> really want to get right but I'm not totally confident of.
>
> The idea here is that we want downwards compatibility as far as
> possible and maintainable but we can't really be upwards compatible
> for breaking protocol revisions. So the output plugin's native
> protocol version is inherently the max protocol version and we don't
> need a separate MAX.
>
> The downstream connects and declares to the upstream "I speak
> protocol 2 through 3". The upstream sees this and replies "I speak
> protocol 1 through 2. We have protocol 2 in common so I will use
> that." Or alternately replies with an error "I only speak protocol 1
> so we have no protocol in common". This is done via the initial
> parameters passed by the downstream to the logical decoding plugin
> and then via the startup reply message that's the first message on
> the logical decoding stream.
>
> We can't do a better handshake than this because the underlying
> walsender protocol and output plugin API only gives us one chance to
> send free-form information to the output plugin and it has to do so
> before the output plugin can send anything to the downstream.
>
> As much as possible I want to avoid ever needing to do a protocol
> bump anyway, since it'll involve adding conditionals and duplication.
> That's why the protocol tries so hard to be extensible and rely on
> declared capabilities rather than protocol version bumps. But I'd
> rather plan for it than be unable to ever do it in future.
>
> Much (all?) of this is discussed in the protocol docs. I should
> probably double check that and add a comment that refers to them
> there.
>

Thanks for explanation. I'll think about it more and try to propose
something for this.

Best regards.

--
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

21 января 2016 г., 01:24:08

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

I revied more files:

pglogical_proto_native.c

+         pq_sendbyte(out, 'N');        /* column name block follows */
+         attname = NameStr(att->attname);
+         len = strlen(attname) + 1;
+         pq_sendint(out, len, 2);
+         pq_sendbytes(out, attname, len); /* data */
Identifier names are limited to 63 - so why we're sending 2 bytes here?
I do not have hard feelings about this - just curiousity. For:
+     pq_sendbyte(out, nspnamelen);        /* schema name length */
+     pq_sendbytes(out, nspname, nspnamelen);

+     pq_sendbyte(out, relnamelen);        /* table name length */
+     pq_sendbytes(out, relname, relnamelen);
schema and relation name we send 1 byte, for attribute 2. Strange, bit inconsistent.

+     pq_sendbyte(out, 'B');        /* BEGIN */
+ 
+     /* send the flags field its self */
+     pq_sendbyte(out, flags);
Comment: "flags field its self"? Shouldn't be "... itself"?
Similarly in write_origin; write_insert just says:
+     /* send the flags field */
+     pq_sendbyte(out, flags);

Speaking about flags - in most cases they are 0; only for attributes
we might have:
+             flags |= IS_REPLICA_IDENTITY;
I assume that flags field is put into protocol for future needs?


+     /* FIXME support whole tuple (O tuple type) */
+     if (oldtuple != NULL)
+     {
+         pq_sendbyte(out, 'K');    /* old key follows */
+         pglogical_write_tuple(out, data, rel, oldtuple);
+     }
I don't fully understand. We are sending whole old tuple here,
so this FIXME should be more about supporting sending just keys.
But then comment "old key follows" is not true. Or am I missing
something here?
Similarly for write_delete.

+     pq_sendbyte(out, 'S');    /* message type field */
+     pq_sendbyte(out, 1);     /* startup message version */
For now protocol is 1, but for code readability it might be better
to change this line to:
+     pq_sendbyte(out, PGLOGICAL_PROTO_VERSION_NUM);     /* startup message version */


Just for the sake of avoiding code repetition:
+     for (i = 0; i < desc->natts; i++)
+     {
+         if (desc->attrs[i]->attisdropped)
+             continue;
+         nliveatts++;
+     }
+     pq_sendint(out, nliveatts, 2);
The exact same code is in write_tuple and write_attrs. I don't know what's
policy for refactoring, but this might be extracted into separate function.

+                 else if (att->attlen == -1)
+                 {
+                     char *data = DatumGetPointer(values[i]);
+ 
+                     /* send indirect datums inline */
+                     if (VARATT_IS_EXTERNAL_INDIRECT(values[i]))
+                     {
+                         struct varatt_indirect redirect;
+                         VARATT_EXTERNAL_GET_POINTER(redirect, data);
+                         data = (char *) redirect.pointer;
+                     }
I really don't like this. We have function parameter "data" and now
are creating new variable with the same name. It might lead to confusion
and some long debugging sessions. Please change the name of this variable.
Maybe attr_data would be OK? Or outputbytes, like below, for case 'b' and default?


pglogical_proto_json.c

+         appendStringInfo(out, ", \"origin_lsn\":\"%X/%X\"",
+             (uint32)(txn->origin_lsn >> 32), (uint32)(txn->origin_lsn));
I remember there was discussion on *-hackers recently about %X/%X; I'll
try to find it and check whether it's according to final conclusion.


pglogical_relmetacache.c

First. In pglogical_output.c, in pg_decode_startup we are calling
init_relmetacache. I haven't found call to destroy_relmetacache
and comment says that it must be called at backend shutdown.
Is it guaranteed? Or will cache get freed with its context?

+     /* Find cached function info, creating if not found */
+     hentry = (struct PGLRelMetaCacheEntry*) hash_search(RelMetaCache,
+                                          (void *)(&RelationGetRelid(rel)),
+                                          HASH_ENTER, &found);
+ 
+     if (!found)
+     {
+         Assert(hentry->relid = RelationGetRelid(rel));
+         hentry->is_cached = false;
+         hentry->api_private = NULL;
+     }
+ 
+     Assert(hentry != NULL);
Shouldn't Assert be just after calling hash_search? We're (if !found)
dereferencing hentry and only after checking whether it's not NULL.

I haven't found relevance of relmeta_cache_size attribute.
It's set to value coming from client - but then the only thing that matters
is whether it is 0 or not. I'll try to reread DESIGN.md session about cache,
but for now (quite late and I'm quite tired) it is not clear on the first reading.

Best regards.

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

21 января 2016 г., 04:04:41

On 21 January 2016 at 06:23, Tomasz Rybak <tomasz.rybak@post.pl> wrote:

I reviewed more files:

Thanks.

Can you try to put more whitespace between items? It can be hard to follow at the moment.

pglogical_proto_native.c

+ pq_sendbyte(out, 'N'); /* column name block follows */
+ attname = NameStr(att->attname);
+ len = strlen(attname) + 1;
+ pq_sendint(out, len, 2);
+ pq_sendbytes(out, attname, len); /* data */
Identifier names are limited to 63 - so why we're sending 2 bytes here?

Good question. It should be one byte. I'll need to amend the protocol for that, but I don't think that's a problem this early on.

+ pq_sendbyte(out, 'B'); /* BEGIN */
+
+ /* send the flags field its self */
+ pq_sendbyte(out, flags);
Comment: "flags field its self"? Shouldn't be "... itself"?
Similarly in write_origin; write_insert just says:

itself is an abbreviation of its self.

+ /* send the flags field */
+ pq_sendbyte(out, flags);

Speaking about flags - in most cases they are 0; only for attributes
we might have:

+ flags |= IS_REPLICA_IDENTITY;
I assume that flags field is put into protocol for future needs?

Correct. The protocol specifies (in most places; need to double check all sites) that the lower 4 bits are reserved and must be treated as an ERROR if set. The high 4 bits must be ignored if set and not recognised. That gives us some room to wiggle without bumping the protocol version incompatibly, and lets us use capabilities (via client-supplied parameters) to add extra information on the wire.

+ /* FIXME support whole tuple (O tuple type) */
+ if (oldtuple != NULL)
+ {
+ pq_sendbyte(out, 'K'); /* old key follows */
+ pglogical_write_tuple(out, data, rel, oldtuple);
+ }
I don't fully understand. We are sending whole old tuple here,
so this FIXME should be more about supporting sending just keys.
But then comment "old key follows" is not true. Or am I missing
something here?

In wal_level=logical the tuple that's written is an abbreviated tuple containing data only for the REPLICA IDENTITY fields.

Ideally we'd also be able to support sending the _whole_ old tuple, but this would require the ability to log the whole old tuple in WAL when logging a DELETE or UPDATE into WAL. This isn't so much a FIXME as a logical decoding limitation and wishlist item; I'll amend to that effect.

+ pq_sendbyte(out, 'S'); /* message type field */
+ pq_sendbyte(out, 1); /* startup message version */
For now protocol is 1, but for code readability it might be better
to change this line to:
+ pq_sendbyte(out, PGLOGICAL_PROTO_VERSION_NUM); /* startup message version */

The startup message format isn't the same as the protocol version. Hopefully we'll never have to change it. The reason it's specified is so that if we ever do bump it a decoding plugin can recognise an old client and fall back. Maybe it's BC overkill but I'd kick myself for not doing it if we ever decided to (say) add support for structured json startup options from the client. Working on BDR has taught me that there's no such thing as too much consideration for cross-version compat and negotiation in replication.

I'm happy to create a new define for that an comment to this effect.

Just for the sake of avoiding code repetition:
+ for (i = 0; i < desc->natts; i++)
+ {
+ if (desc->attrs[i]->attisdropped)
+ continue;
+ nliveatts++;
+ }
+ pq_sendint(out, nliveatts, 2);

The exact same code is in write_tuple and write_attrs. I don't know what's
policy for refactoring, but this might be extracted into separate function.

Seems trivial enough not to care, but probably can.

+ else if (att->attlen == -1)
+ {
+ char *data = DatumGetPointer(values[i]);
+
+ /* send indirect datums inline */
+ if (VARATT_IS_EXTERNAL_INDIRECT(values[i]))
+ {
+ struct varatt_indirect redirect;
+ VARATT_EXTERNAL_GET_POINTER(redirect, data);
+ data = (char *) redirect.pointer;
+ }

I really don't like this. We have function parameter "data" and now
are creating new variable with the same name.

I agree. Good catch.

I don't much like the use of 'data' as the param name for the plugin private data and am quite inclined to change that instead, to plugin_private or something.

pglogical_proto_json.c

+ appendStringInfo(out, ", \"origin_lsn\":\"%X/%X\"",
+ (uint32)(txn->origin_lsn >> 32), (uint32)(txn->origin_lsn));
I remember there was discussion on *-hackers recently about %X/%X; I'll
try to find it and check whether it's according to final conclusion.

Thanks.

pglogical_relmetacache.c

First. In pglogical_output.c, in pg_decode_startup we are calling
init_relmetacache. I haven't found call to destroy_relmetacache
and comment says that it must be called at backend shutdown.
Is it guaranteed? Or will cache get freed with its context?

Hm, ok. It must never be called before shutdown, but it's not necessary to call at all. I'll amend it. It's present just in case future Valgrind support wants to call it to explicitly clean up memory.

I'll take a closer look there. I think I changed the lifetime of the relmetacache after figuring out a better way to handle the cache invalidations and it's possible I missed an update in the comments.

+ /* Find cached function info, creating if not found */
+ hentry = (struct PGLRelMetaCacheEntry*) hash_search(RelMetaCache,
+ (void *)(&RelationGetRelid(rel)),
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ Assert(hentry->relid = RelationGetRelid(rel));
+ hentry->is_cached = false;
+ hentry->api_private = NULL;
+ }
+
+ Assert(hentry != NULL);

Shouldn't Assert be just after calling hash_search? We're (if !found)
dereferencing hentry and only after checking whether it's not NULL.

Yeah. It's more of an exit condition, stating "this function never returns non-null hentry" but moving it up is fine. Can do.

I haven't found relevance of relmeta_cache_size attribute.
It's set to value coming from client - but then the only thing that matters
is whether it is 0 or not. I'll try to reread DESIGN.md session about cache,
but for now (quite late and I'm quite tired) it is not clear on the first reading.

It's the first part of a feature. You can turn the relation metadata cache off, set it to unlimited, or (in future, not implemented yet) set a bounded size where a LRU is used to evict the oldest entry.

The LRU approach is more complex since we have to track the LRU and do evictions as well as the cache map its self. Efficiently. I expect this to be a 1.1 or later feature. Having the param as an int now means we don't later land up having to have two params, one to enable the cache and another to bound its size, so once it's fully implemented it'll (IMO) be clearer what's going on.

Thanks again for the detailed code examination. I find it hard to read familiar code closely for review since I see what I expect to see. So it's really, really useful.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Robert Haas

Дата:

21 января 2016 г., 19:37:34

On Wed, Jan 20, 2016 at 8:04 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> itself is an abbreviation of its self.

I do not think this is true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Robert Haas

Дата:

21 января 2016 г., 19:38:09

On Wed, Jan 20, 2016 at 12:54 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> The idea here is that we want downwards compatibility as far as possible and
> maintainable but we can't really be upwards compatible for breaking protocol
> revisions. So the output plugin's native protocol version is inherently the
> max protocol version and we don't need a separate MAX.

That idea seems right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

22 января 2016 г., 01:14:09

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

Documentation - although I haven't yet went through protocol documentation:

README.md

+ data stream. The output stream is designed to be compact and fast to decode,
+ and the plugin supports upstream filtering of data so that only the required
+ information is sent.

plugin supports upstream filtering of data through hooks so that ...

+ subset of that database may be selected for replication, currently based on
+ table and on replication origin. Filtering by a WHERE clause can be supported
+ easily in future.

Is this filtering by table and replication origin implemented? I haven't
noticed it in source.

+ other daemon is required. It's accumulated using

Stream of changes is accumulated...

+ [the `CREATE_REPLICATION_SLOT ... LOGICAL ...` or `START_REPLICATION SLOT ... LOGICAL ...`
commands](http://www.postgresql.org/docs/current/static/logicaldecoding-walsender.html)to start streaming changes. (It
canalso be used via

+ [SQL level functions](http://www.postgresql.org/docs/current/static/logicaldecoding-sql.html)
+ over a non-replication connection, but this is mainly for debugging purposes)

Replication slot can also be configured (causing output plugin to be loaded) via [SQL level functions]...

+ The overall flow of client/server interaction is:

The overall flow of client/server interaction is as follows:

+ * Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL 'pglogical'` if it's setting up for the first time

* Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL 'pglogical'` to setup replication if it's connecting for the
firsttime

+ Details are in the replication protocol docs.

Add link to file with protocol documentation.

+ If your application creates its own slots on first use and hasn't previously
+ connected to this database on this system you'll need to create a replication
+ slot. This keeps track of the client's replay state even while it's disconnected.

If your application hasn't previously connected to this database on this system
it'll need to create and configure replication slot which keeps track of the
client's replay state even while it's disconnected.

+ `pglogical`'s output plugin now sends a continuous series of `CopyData`

As this is separate plugin, use 'pglogical_output' plugin now sends...
(not only here but also in few other places).

+ All hooks are called in their own memory context, which lasts for the duration

All hooks are called in separate hook memory context, which lasts for the duration...

+ + switched to a longer lived memory context like TopMemoryContext. Memory allocated
+ + in the hook context will be automatically when the decoding session shuts down.

...will be automatically freed when the decoding...

DDL for global object changes must be synchronized via some external means.

Just:
Global object changes must be synchronized via some external means.

+ determine why an error occurs in a downstream, since you can examine a
+ json-ified representation of the xact. It's necessary to supply a minimal

since you can examine a transaction in json (and not binary) format. It's necessary

+ discard up to, as identifed by LSN (log sequence number). See

identified

+ Once you've peeked the stream and know the LSN you want to discard up to, you
+ can use `pg_logical_slot_peek_changes`, specifying an `upto_lsn`, to consume

Shouldn't it be pg_logical_slot_get_changes? get_changes consumes changes,
peek_changes leaves them in the stream. Especially as example below
points that we need to use get_changes.

+ tp to but not including that point, i.e. that will be the
+ point at which replay resumes.

IMO it's better to introduce new sentence:
that point. This will be the point at which replay resumes.

DESIGN.md:

+ attnos don't necessarily correspond. The column names might, and their ordering
+ might even be the same, but any column drop or column type change will result

The column names and their ordering might even be the same...

README.pglogical_output_plhooks:

+ Note that pglogical
+ must already be installed so that its headers can be found.

Note that pglogical_output must already...

+ Arguments are the oid of the affected relation, and the change type: 'I'nsert,
+ 'U'pdate or 'D'elete. There is no way to access the change data - columns changed,
+ new values, etc.

Is it true (no way to access change data)? You added passing change
to C hooks; from looking at code it looks like it's true, but I want to be sure.

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

22 января 2016 г., 06:06:10

On 22 January 2016 at 06:13, Tomasz Rybak <tomasz.rybak@post.pl> wrote:

+ data stream. The output stream is designed to be compact and fast to decode,
+ and the plugin supports upstream filtering of data so that only the required
+ information is sent.

plugin supports upstream filtering of data through hooks so that ...

Ok.

+ subset of that database may be selected for replication, currently based on
+ table and on replication origin. Filtering by a WHERE clause can be supported
+ easily in future.

Is this filtering by table and replication origin implemented? I haven't
noticed it in source.

That's what the hooks are for.

+ other daemon is required. It's accumulated using

Stream of changes is accumulated...

Ok.

+ [the `CREATE_REPLICATION_SLOT ... LOGICAL ...` or `START_REPLICATION SLOT ... LOGICAL ...` commands](http://www.postgresql.org/docs/current/static/logicaldecoding-walsender.html) to start streaming changes. (It can also be used via
+ [SQL level functions](http://www.postgresql.org/docs/current/static/logicaldecoding-sql.html)
+ over a non-replication connection, but this is mainly for debugging purposes)

Replication slot can also be configured (causing output plugin to be loaded) via [SQL level functions]...

Covered in the next section. Or at least it is in the SGML docs conversion I'm still trying to finish off..

+ * Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL 'pglogical'` if it's setting up for the first time

* Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL 'pglogical'` to setup replication if it's connecting for the first time

I disagree. It's entirely possible to do your slot creation/setup manually or via something else, re-use a slot first created by another node, etc. Slot creation is part of client setup, not so much connection.

+ Details are in the replication protocol docs.

Add link to file with protocol documentation.

Is done in SGML conversion.

+ If your application creates its own slots on first use and hasn't previously
+ connected to this database on this system you'll need to create a replication
+ slot. This keeps track of the client's replay state even while it's disconnected.

If your application hasn't previously connected to this database on this system
it'll need to create and configure replication slot which keeps track of the
client's replay state even while it's disconnected.

As above, I don't quite agree.

+ `pglogical`'s output plugin now sends a continuous series of `CopyData`

As this is separate plugin, use 'pglogical_output' plugin now sends...
(not only here but also in few other places).

Thanks. Done.

+ All hooks are called in their own memory context, which lasts for the duration

All hooks are called in separate hook memory context, which lasts for the duration...

I don't see the difference, but OK.

Simplified:

> All hooks are called in a memory context that lasts ...

+ + switched to a longer lived memory context like TopMemoryContext. Memory allocated
+ + in the hook context will be automatically when the decoding session shuts down.

...will be automatically freed when the decoding...

Fixed, thanks.

DDL for global object changes must be synchronized via some external means.

Just:
Global object changes must be synchronized via some external means.

Agree, done.

+ determine why an error occurs in a downstream, since you can examine a
+ json-ified representation of the xact. It's necessary to supply a minimal

since you can examine a transaction in json (and not binary) format. It's necessary

Ok, done.

+ Once you've peeked the stream and know the LSN you want to discard up to, you
+ can use `pg_logical_slot_peek_changes`, specifying an `upto_lsn`, to consume

Shouldn't it be pg_logical_slot_get_changes? get_changes consumes changes,
peek_changes leaves them in the stream. Especially as example below
points that we need to use get_changes.

Yes. It should. Whoops. Thanks.

DESIGN.md:

+ attnos don't necessarily correspond. The column names might, and their ordering
+ might even be the same, but any column drop or column type change will result

The column names and their ordering might even be the same...

I disagree, that has a different meaning. It's also not really user-facing docs so I'm not too worried about being quite as readable.

README.pglogical_output_plhooks:

+ Note that pglogical
+ must already be installed so that its headers can be found.

Note that pglogical_output must already...

Thanks.

+ Arguments are the oid of the affected relation, and the change type: 'I'nsert,
+ 'U'pdate or 'D'elete. There is no way to access the change data - columns changed,
+ new values, etc.

Is it true (no way to access change data)? You added passing change
to C hooks; from looking at code it looks like it's true, but I want to be sure.

While the change data is now passed to the C hook, there's no attempt to expose it via PL/PgSQL. So yeah, that's still true.

Thanks again for this. Sorry it's taken so long to get the SGML docs converted. Too many other things on.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

24 января 2016 г., 17:33:43

I'm merging all your emails for sake of easier discussion.
I also cut all fragments that do not require response.

W dniu 22.01.2016, pią o godzinie 11∶06 +0800, użytkownik Craig Ringer
napisał:
> > We might also think about changing name of plugin to something
> > resembling "logical_streaming_decoder" or even "logical_streamer"
> > 
> I'm open to ideas there but I'd want some degree of consensus before
> undertaking the changes required. 

I know that it'd require much changes (both in this and pglogical 
plugin), and thus don't want to press for name change.
On one hand - changing of name might be good to avoid tight mental
coupling between pglogical_output and pglogica. At the same time - it's
much work, and I cannot think of any short and nice name, so 
pglogical_output might stay IMO.

 
> >  + subset of that database may be selected for replication,
> > currently based on
> > + table and on replication origin. Filtering by a WHERE clause can
> > be supported
> > + easily in future.
> >
> > Is this filtering by table and replication origin implemented? I
> > haven't
> > noticed it in source.
> That's what the hooks are for.
>

Current documentation suggests that replicating only selected is 
already available:
+ A subset of that database may be selected for replication, currently
+ based on table and on replication origin.

"currently based on table and on replication origin" means to me that
current state of plugin allows for just chosing which tables
to replicate. I'd see something like:

"A subset of that database might be selected for replication, e.g.
only chosen tables or changes from particular origin, in custom hook"

to convey that user needs to provide hook for filtering.

>  
> > + [the `CREATE_REPLICATION_SLOT ... LOGICAL ...` or
> > `START_REPLICATION SLOT ... LOGICAL ...` commands](http://www.postg
> > resql.org/docs/current/static/logicaldecoding-walsender.html) to
> > start streaming changes. (It can also be used via
> > + [SQL level functions](http://www.postgresql.org/docs/current/stat
> > ic/logicaldecoding-sql.html)
> > + over a non-replication connection, but this is mainly for
> > debugging purposes)
> >
> > Replication slot can also be configured (causing output plugin to
> > be loaded) via [SQL level functions]...
> Covered in the next section. Or at least it is in the SGML docs
> conversion I'm still trying to finish off..

OK, then I'll wait for the final version to review that.

>  
> > + * Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL
> > 'pglogical'` if it's setting up for the first time
> >
> > * Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL
> > 'pglogical'` to setup replication if it's connecting for the first
> > time
> I disagree. It's entirely possible to do your slot creation/setup
> manually or via something else, re-use a slot first created by
> another node, etc. Slot creation is part of client setup, not so much
> connection.

I'd propose then:
' * Client issues "CREATE_REPLICATION_SLOT ..." if the replication
was not configured earler, e.g. during previous connection, or manually
via [SQL functions | link to documentation]"


> > + If your application creates its own slots on first use and hasn't
> > previously
> > + connected to this database on this system you'll need to create a
> > replication
> > + slot. This keeps track of the client's replay state even while
> > it's disconnected.
> >
> > If your application hasn't previously connected to this database on
> > this system
> > it'll need to create and configure replication slot which keeps
> > track of the
> > client's replay state even while it's disconnected.
> As above, I don't quite agree.
>  

"If your application hasn't previously connedted to this database on
this system, and the replication slot was not configured through other
means (e.g. manually using [SQL functions | URL ] then you'll need
to create and configure replication slot ..."



> > 
> >  DESIGN.md:
> >
> > + attnos don't necessarily correspond. The column names might, and
> > their ordering
> > + might even be the same, but any column drop or column type change
> > will result
> >
> > The column names and their ordering might even be the same...
> I disagree, that has a different meaning. It's also not really user-
> facing docs so I'm not too worried about being quite as readable.
>

I do not try to change meaning but fix grammar. I'd like to have here
either more or less commas. So either:

+ The column names (and their ordering) might ...

or:

+ The column names, and their ordering, might ...



> > Is it true (no way to access change data)? You added passing change
> > to C hooks; from looking at code it looks like it's true, but I
> > want to be sure.
> While the change data is now passed to the C hook, there's no attempt
> to expose it via PL/PgSQL. So yeah, that's still true.
>

Thanks for confirming.

Speaking about flags - in most cases they are 0; only for
> > attributes
> > we might have: 
> >  +                       flags |= IS_REPLICA_IDENTITY;
> > I assume that flags field is put into protocol for future needs?
> Correct. The protocol specifies (in most places; need to double check
> all sites) that the lower 4 bits are reserved and must be treated as
> an ERROR if set. The high 4 bits must be ignored if set and not
> recognised. That gives us some room to wiggle without bumping the
> protocol version incompatibly, and lets us use capabilities (via
> client-supplied parameters) to add extra information on the wire.
> 

Thanks for explanation. You're right - protocol.txt explains that
quite nicely.


+       /* FIXME support whole tuple (O tuple type) */
> > +       if (oldtuple != NULL)
> > +       {
> > +               pq_sendbyte(out, 'K');  /* old key follows */
> > +               pglogical_write_tuple(out, data, rel, oldtuple);
> > +       }
> > I don't fully understand. We are sending whole old tuple here,
> > so this FIXME should be more about supporting sending just keys.
> > But then comment "old key follows" is not true. Or am I missing
> > something here?
> In wal_level=logical the tuple that's written is an abbreviated tuple
> containing data only for the REPLICA IDENTITY fields.
> 

I didn't know that - thanks for explanation.


 
> > +       pq_sendbyte(out, 'S');  /* message type field */
> > +       pq_sendbyte(out, 1);    /* startup message version */
> > For now protocol is 1, but for code readability it might be better
> > to change this line to:
> > +       pq_sendbyte(out, PGLOGICAL_PROTO_VERSION_NUM);  /* startup
> > message version */
> The startup message format isn't the same as the protocol version.
> Hopefully we'll never have to change it. The reason it's specified is
> so that if we ever do bump it a decoding plugin can recognise an old
> client and fall back. Maybe it's BC overkill but I'd kick myself for
> not doing it if we ever decided to (say) add support for structured
> json startup options from the client. Working on BDR has taught me
> that there's no such thing as too much consideration for cross-
> version compat and negotiation in replication.

Then I'd suggest adding named constant for this - so startup message
version is not mistaken for protocol version. Something like:

+ #define PGLOGICA_STARTUP_MESSAGE_VERSION_NUM 1

This way it is obvious that "1" and "1" do not mean that same.

Just for the sake of avoiding code repetition:
> > +       for (i = 0; i < desc->natts; i++)
> > +       {
> > +               if (desc->attrs[i]->attisdropped)
> > +                       continue;
> > +               nliveatts++;
> > +       }
> > +       pq_sendint(out, nliveatts, 2); 
> >  
> >  The exact same code is in write_tuple and write_attrs. I don't
> > know what's
> > policy for refactoring, but this might be extracted into separate
> > function.
> Seems trivial enough not to care, but probably can.

IMO such code (computing number of live attributes in relation) should
be used often enough to deserve own function - isn't such function
available in PostgreSQL?



> > I really don't like this. We have function parameter "data" and
> > now
> > are creating new variable with the same name.
> I agree. Good catch.
> 
> I don't much like the use of 'data' as the param name for the plugin
> private data and am quite inclined to change that instead, to
> plugin_private or something.


Maybe rename function parameter to plugin_data or pluging_private_data?


+       /* Find cached function info, creating if not found */
> > +       hentry = (struct PGLRelMetaCacheEntry*)
> > hash_search(RelMetaCache,
> > +                                                                 
> >               (void *)(&RelationGetRelid(rel)),
> > +                                                                 
> >               HASH_ENTER, &found);
> > +
> > +       if (!found)
> > +       {
> > +               Assert(hentry->relid = RelationGetRelid(rel));
> > +               hentry->is_cached = false;
> > +               hentry->api_private = NULL;
> > +       }
> > +
> > +       Assert(hentry != NULL); 
> >  
> >  Shouldn't Assert be just after calling hash_search? We're (if
> > !found)
> > dereferencing hentry and only after checking whether it's not NULL.
> Yeah. It's more of an exit condition, stating "this function never
> returns non-null hentry" but moving it up is fine. Can do.

Please do so. It's more about conveying intent and increaing code
readability - so anyone knows that hentry should not be NULL
when returned by hash_search, and not some time later.


> Thanks again for this. Sorry it's taken so long to get the SGML docs
> converted. Too many other things on. 

No problem, I know how it is.

Best regards.

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Re: pglogical_output - a general purpose logical decoding output plugin

От

Tomasz Rybak

Дата:

24 января 2016 г., 18:14:08

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Final part of review:
protocol.txt

+|origin_identifier|signed char[origin_identifier_length]|An origin identifier of arbitrary,
upstream-application-definedstructure. _Should_ be text in the same encoding as the upstream database. NULL-terminated.
_Should_be 7-bit ASCII.
 

Does it need NULL-termination when previous field contains length of origin_identifier?
Similarly for relation metadata message.



+ metadata message. All consecutive row messages must currently have the same
+ relidentifier. (_Later extensions to add metadata caching will relax these
+ requirements for clients that advertise caching support; see the documentation
+ on metadata messages for more detail_).

Shouldn't this be changed as metadata cache is implemented?




+ |relidentifier|uint32|relidentifier that matches the table metadata message sent for this row.
+ (_Not present in BDR, which sends nspname and relname instead_)

and

+ |natts|uint16|Number of fields sent in this tuple part.
+ (_Present in BDR, but meaning significantly different here)_

Is BDR mention relevant here? It was not mentioned anywhere else, and now appears
ex machina.



Long quote - but required.


+ ==== Tuple fields
+ 
+ |===
+ |Tuple type|signed char|Identifies the kind of tuple being sent.
+ 
+ |tupleformat|signed char|‘**T**’ (0x54)
+ |natts|uint16|Number of fields sent in this tuple part.
+ (_Present in BDR, but meaning significantly different here)_
+ |[tuple field values]|[composite]|
+ |===
+ 
+ ===== Tuple tupleformat compatibility
+ 
+ Unrecognised _tupleformat_ kinds are a protocol error for the downstream.
+ 
+ ==== Tuple field value fields
+ 
+ These message parts describe individual fields within a tuple.
+ 
+ There are two kinds of tuple value fields, abbreviated and full. Which is being
+ read is determined based on the first field, _kind_.
+ 
+ Abbreviated tuple value fields are nothing but the message kind:
+ 
+ |===
+ |*Message*|*Type/Size*|*Notes*
+ 
+ |kind|signed char| * ‘**n**’ull (0x6e) field
+ |===
+ 
+ Full tuple value fields have a length and datum:
+ 
+ |===
+ |*Message*|*Type/Size*|*Notes*
+ 
+ |kind|signed char| * ‘**i**’nternal binary (0x62) field
+ |length|int4|Only defined for kind = i\|b\|t
+ |data|[length]|Data in a format defined by the table metadata and column _kind_.
+ |===
+ 
+ ===== Tuple field values kind compatibility
+ 
+ Unrecognised field _kind_ values are a protocol error for the downstream. The
+ downstream may not continue processing the protocol stream after this
+ point**.**
+ 
+ The upstream may not send ‘**i**’nternal or ‘**b**’inary format values to the
+ downstream without the downstream negotiating acceptance of such values. The
+ downstream will also generally negotiate to receive type information to use to
+ decode the values. See the section on startup parameters and the startup
+ message for details.

I do not fully get it.
For each tuple we are supposed to have "Tuple type" (which is kind?). Does it
mean that T1 might be sent using "i" kind and T2 sent using "b" kind?
At the same tme we have kind "n" (null) - but it belongs to field level
(one field might be null, not entire tuple).

In other words - do we have "i" and then "T" and then number of attributes,
or "T', then number of attributes, then "i" or "b" or "n" for each of attributes?

Also - description of "b" seems missing.






+ Before sending changed rows for a relation, a metadata message for the relation
+ must be sent so the downstream knows the namespace, table name, column names,
+ optional column types, etc. A relidentifier field, an arbitrary numeric value
+ unique for that relation on that upstream connection, maps the metadata to
+ following rows.
+ 
+ A client should not assume that relation metadata will be followed immediately
+ (or at all) by rows, since future changes may lead to metadata messages being
+ delivered at other times. Metadata messages may arrive during or between
+ transactions.
+ 
+ The upstream may not assume that the downstream retains more metadata than the
+ one most recent table metadata message. This applies across all tables, so a
+ client is permitted to discard metadata for table x when getting metadata for
+ table y. The upstream must send a new metadata message before sending rows for
+ a different table, even if that metadata was already sent in the same session
+ or even same transaction. _This requirement will later be weakened by the
+ addition of client metadata caching, which will be advertised to the upstream
+ with an output plugin parameter._

This needs reworking while metadata caching is supported




+ |Message type|signed char|‘**S**’ (0x53) - startup
+ |Startup message version|uint8|Value is always “1”.

Value is "1" for the current plugin version. It is represented in code as
PGLOGICAL_STARTUP_MESSAGE_VERSION_NUM.


+ |startup_params_format|int8|1|The format version of this startup parameter set. Always the digit 1 (0x31), null
terminated.

int8 suggests binary value, and here we are sending ASCII which is NULL-terminated.
It's a bit inconsistent.


Please wrap long lines in the last part of protocol.txt file, starting with
"Arguments client supplies to output plugn" section.


Also - please put 3 paragraphs from your email from 2016-01-07 15:50
(staring with "but this isn't just about replication") into README.
This is really good rationale for this plugin existence.



Best regards.

The new status of this patch is: Waiting on Author

Re: Re: pglogical_output - a general purpose logical decoding output plugin

От

Andres Freund

Дата:

24 января 2016 г., 18:25:07

On 2016-01-18 21:47:27 +0000, Tomasz Rybak wrote:
> We might also think about changing name of plugin to something resembling "logical_streaming_decoder" or even
"logical_streamer"

FWIW, I find those proposals unconvincing. Not that pglogical_output is
grand, but "streaming decoder" or "logical_streamer" aren't even
correct. And output plugin isn't a "decoder" or a "streamer".

Re: pglogical_output - a general purpose logical decoding output plugin

От

Andres Freund

Дата:

29 января 2016 г., 13:16:46

Hi,

so, I'm reviewing the output of:
> git diff $(git merge-base upstream/master 2ndq/dev/pglogical-output)..2ndq/dev/pglogical-output
> diff --git a/contrib/Makefile b/contrib/Makefile
> index bd251f6..028fd9a 100644
> --- a/contrib/Makefile
> +++ b/contrib/Makefile
> @@ -35,6 +35,8 @@ SUBDIRS = \
>          pg_stat_statements \
>          pg_trgm        \
>          pgcrypto    \
> +        pglogical_output \
> +        pglogical_output_plhooks \

I'm doubtful we want these plhooks. You aren't allowed to access normal
(non user catalog) tables in output plugins. That seems too much to
expose to plpgsql function imo.

> +++ b/contrib/pglogical_output/README.md

I don't think we've markdown in postgres so far - so let's just keep the
current content and remove the .md :P

> +==== Table metadata header
> +
> +|===
> +|*Message*|*Type/Size*|*Notes*
> +
> +|Message type|signed char|Literal ‘**R**’ (0x52)
> +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not recognised.
> +|relidentifier|uint32|Arbitrary relation id, unique for this upstream. In practice this will probably be the
upstreamtable’s oid, but the downstream can’t assume anything.
 
> +|nspnamelength|uint8|Length of namespace name
> +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> +|relnamelength|uint8|Length of relation name
> +|relname|char[relname]|Relation name (null terminated)
> +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> +|natts|uint16|number of attributes
> +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each of which begins with a column delimiter
followedby zero or more column metadata blocks, each with the same column metadata block header.
 

That's a fairly high overhead. Hm.


> +== JSON protocol
> +
> +If `proto_format` is set to `json` then the output plugin will emit JSON
> +instead of the custom binary protocol. JSON support is intended mainly for
> +debugging and diagnostics.
> +

I'm fairly strongly opposed to including two formats in one output
plugin. I think the demand for being able to look into the binary
protocol should instead be satisfied by having a function that "expands"
the binary data returned into something easier to understand.

> + * Copyright (c) 2012-2015, PostgreSQL Global Development Group

2016 ;)


> +            case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> +                val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_UINT32);
> +                data->client_binary_basetypes_major_version = DatumGetUInt32(val);
> +                break;

Why is the major version tied to basetypes (by name)? Seem more
generally useful.


> +            case PARAM_RELMETA_CACHE_SIZE:
> +                val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_INT32);
> +                data->client_relmeta_cache_size = DatumGetInt32(val);
> +                break;

I'm not convinced this a) should be optional b) should have a size
limit. Please argue for that choice. And how the client should e.g. know
about evictions in that cache.



> --- /dev/null
> +++ b/contrib/pglogical_output/pglogical_config.h
> @@ -0,0 +1,55 @@
> +#ifndef PG_LOGICAL_CONFIG_H
> +#define PG_LOGICAL_CONFIG_H
> +
> +#ifndef PG_VERSION_NUM
> +#error <postgres.h> must be included first
> +#endif

Huh?

> +#include "nodes/pg_list.h"
> +#include "pglogical_output.h"
> +
> +inline static bool
> +server_float4_byval(void)
> +{
> +#ifdef USE_FLOAT4_BYVAL
> +    return true;
> +#else
> +    return false;
> +#endif
> +}
> +
> +inline static bool
> +server_float8_byval(void)
> +{
> +#ifdef USE_FLOAT8_BYVAL
> +    return true;
> +#else
> +    return false;
> +#endif
> +}
> +
> +inline static bool
> +server_integer_datetimes(void)
> +{
> +#ifdef USE_INTEGER_DATETIMES
> +    return true;
> +#else
> +    return false;
> +#endif
> +}
> +
> +inline static bool
> +server_bigendian(void)
> +{
> +#ifdef WORDS_BIGENDIAN
> +    return true;
> +#else
> +    return false;
> +#endif
> +}

Not convinced these should exists, and even moreso exposed in a header.

> +/*
> + * Returns Oid of the hooks function specified in funcname.
> + *
> + * Error is thrown if function doesn't exist or doen't return correct datatype
> + * or is volatile.
> + */
> +static Oid
> +get_hooks_function_oid(List *funcname)
> +{
> +    Oid            funcid;
> +    Oid            funcargtypes[1];
> +
> +    funcargtypes[0] = INTERNALOID;
> +
> +    /* find the the function */
> +    funcid = LookupFuncName(funcname, 1, funcargtypes, false);
> +
> +    /* Validate that the function returns void */
> +    if (get_func_rettype(funcid) != VOIDOID)
> +    {
> +        ereport(ERROR,
> +                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                 errmsg("function %s must return void",
> +                        NameListToString(funcname))));
> +    }

Hm, this seems easy to poke holes into. I mean you later use it like:

> +    if (data->hooks_setup_funcname != NIL)
> +    {
> +        hooks_func = get_hooks_function_oid(data->hooks_setup_funcname);
> +
> +        old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> +        (void) OidFunctionCall1(hooks_func, PointerGetDatum(&data->hooks));
> +        MemoryContextSwitchTo(old_ctxt);

e.g. you basically assume the function the does something reasonable
with those types. Why don't we instead create a 'plogical_hooks' return
type, and have the function return that?

> +    if (func_volatile(funcid) == PROVOLATILE_VOLATILE)
> +    {
> +        ereport(ERROR,
> +                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                 errmsg("function %s must not be VOLATILE",
> +                        NameListToString(funcname))));
> +    }

Hm, not sure what that's supposed to achieve. You could argue for
requiring the function to be immutable (i.e. not stable or volatile),
but I'm not sure what that'd achieve.

> +        old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> +        (void) (*data->hooks.startup_hook)(&args);
> +        MemoryContextSwitchTo(old_ctxt);

What is the hooks memory contexts intended to achieve? It's apparently
never reset. Normally output plugin calbacks are called in more
shortlived memory contexts, for good reason, to avoid leaks....


> +bool
> +call_row_filter_hook(PGLogicalOutputData *data, ReorderBufferTXN *txn,
> +        Relation rel, ReorderBufferChange *change)
> +{
> +    struct  PGLogicalRowFilterArgs hook_args;
> +    MemoryContext old_ctxt;
> +    bool ret = true;
> +
> +    if (data->hooks.row_filter_hook != NULL)
> +    {
> +        hook_args.change_type = change->action;
> +        hook_args.private_data = data->hooks.hooks_private_data;
> +        hook_args.changed_rel = rel;
> +        hook_args.change = change;
> +
> +        elog(DEBUG3, "calling pglogical row filter hook");
> +
> +        old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> +        ret = (*data->hooks.row_filter_hook)(&hook_args);

Why aren't we passing txn to the filter? ISTM it'd be better to
basically reuse/extend the signature by the the original change
callback.


> +/* These must be available to pg_dlsym() */

No the following don't? And they aren't, since they're static functions?
_PG_init and _PG_output_plugin_init need to, but that's it.


> +/*
> + * COMMIT callback
> + */
> +void
> +pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> +                     XLogRecPtr commit_lsn)
> +{

Missing static's?



> +/*
> + * Relation metadata invalidation, for when a relcache invalidation
> + * means that we need to resend table metadata to the client.
> + */
> +static void
> +relmeta_cache_callback(Datum arg, Oid relid)
> + {
> +    /*
> +     * We can be called after decoding session teardown becaues the
> +     * relcache callback isn't cleared. In that case there's no action
> +     * to take.
> +     */
> +    if (RelMetaCache == NULL)
> +        return;
> +
> +    /*
> +     * Nobody keeps pointers to entries in this hash table around so
> +     * it's safe to directly HASH_REMOVE the entries as soon as they are
> +     * invalidated. Finding them and flagging them invalid then removing
> +     * them lazily might save some memory churn for tables that get
> +     * repeatedly invalidated and re-sent, but it dodesn't seem worth
> +     * doing.
> +     *
> +     * Getting invalidations for relations that aren't in the table is
> +     * entirely normal, since there's no way to unregister for an
> +     * invalidation event. So we don't care if it's found or not.
> +     */
> +    (void) hash_search(RelMetaCache, &relid, HASH_REMOVE, NULL);
> + }

So, I don't buy this, like at all. The cache entry is passed to
functions, while we call output functions and such. Which in turn can
cause cache invalidations to be processed.

> +struct PGLRelMetaCacheEntry
> +{
> +    Oid relid;
> +    /* Does the client have this relation cached? */
> +    bool is_cached;
> +    /* Field for API plugin use, must be alloc'd in decoding context */
> +    void *api_private;
> +};

I don't see how api_private can safely be used. At the very least it
needs a lot more documentation about memory lifetime rules and
such. Afaics we'd just forever leak memory atm.

Greetings,

Andres Freund

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

31 января 2016 г., 00:09:43

On 29 January 2016 at 18:16, Andres Freund <andres@anarazel.de> wrote:

Hi,

so, I'm reviewing the output of:

Thankyou very much for the review.

> + pglogical_output_plhooks \

I'm doubtful we want these plhooks. You aren't allowed to access normal
(non user catalog) tables in output plugins. That seems too much to
expose to plpgsql function imo.

You're right. We've got no way to make sure the user sticks to things that're reasonably safe.

The intent of that module was to allow people to write row and origin filters in plpgsql, to serve as an example of how to implement hooks, and to provide a tool usable in testing pglogical_output from pg_regress.

An example can be in C, since it's not safe to do it in plpgsql as you noted. A few toy functions will be sufficient for test use.

As for allowing users to flexibly filter, I'm stating to think that hooks in pglogical_output aren't really the best long term option. They're needed for now, but for 9.7+ we should look at whether it's practical to separate "what gets forwarded" policy from the mechanics of how it gets sent. pglogical_output currently just farms part of the logical decoding hook out to its own hooks, but it wouldn't have to do that if logical decoding let you plug in policy on what you send separately to how you send it. Either via catalogs or plugin functions separate to the output plugin.

(Kinda off topic, though, and I think we need the hooks for now, just not the plpgsql implementation).

> +++ b/contrib/pglogical_output/README.md

I don't think we've markdown in postgres so far - so let's just keep the
current content and remove the .md

I'm halfway through turning it all into SGML anyway. I just got sidetracked by other work. I'd be just as happy to leave it as markdown but figured SGML would likely be required.

> +==== Table metadata header
> +
> +|===
> +|*Message*|*Type/Size*|*Notes*
> +
> +|Message type|signed char|Literal ‘**R**’ (0x52)
> +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not recognised.
> +|relidentifier|uint32|Arbitrary relation id, unique for this upstream. In practice this will probably be the upstream table’s oid, but the downstream can’t assume anything.
> +|nspnamelength|uint8|Length of namespace name
> +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> +|relnamelength|uint8|Length of relation name
> +|relname|char[relname]|Relation name (null terminated)
> +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> +|natts|uint16|number of attributes
> +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each of which begins with a column delimiter followed by zero or more column metadata blocks, each with the same column metadata block header.

That's a fairly high overhead. Hm.

Yeah, and it shows in Oleksandr's measurements. However, that's a metadata message that is sent only pretty infrequently if you enable relation metadata caching. Which is really necessary to get reasonable performance on anything but the simplest workloads, and is only optional because it makes it easier to write and test a client without it first.

> +== JSON protocol
> +
> +If `proto_format` is set to `json` then the output plugin will emit JSON
> +instead of the custom binary protocol. JSON support is intended mainly for
> +debugging and diagnostics.
> +

I'm fairly strongly opposed to including two formats in one output
plugin. I think the demand for being able to look into the binary
protocol should instead be satisfied by having a function that "expands"
the binary data returned into something easier to understand.

Per our discussion yesterday I think I agree with you on that now.

My thinking is that we should patch pg_recvlogical to be able to load a decoder plugin. Then extract the protocol decoding support from pglogical into a separately usable library that can be loaded by pg_recvlogical, pglogical downstream, and by SQL-level debug/test helper functions.

pg_recvlogical won't be able to decode binary or internal format field values, but you can simply not request that they be sent.

> + case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> + val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_UINT32);
> + data->client_binary_basetypes_major_version = DatumGetUInt32(val);
> + break;

Why is the major version tied to basetypes (by name)? Seem more
generally useful.

I found naming that param rather awkward.

The idea is that we can rely on the Pg major version only for types defined in core. It's mostly safe for built-in extensions in that few if any people ever upgrade them, but it's not strictly correct even there. Most of them (hstore, etc) don't expose their own versions so it's hard to know what to do about them.

What I want(ed?) to do is let a downstream enumerate the extensions it has and the version(s) it knows they're compatible with for send/recv and internal formats. But designing that was going to be hairy in the time available, and I think falling back on text representation for contribs and extensions is the safest choice. A solid way to express an extension compatibility matrix seemed rather too much to bite off as well as everything else.

> + case PARAM_RELMETA_CACHE_SIZE:
> + val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_INT32);
> + data->client_relmeta_cache_size = DatumGetInt32(val);
> + break;

I'm not convinced this a) should be optional b) should have a size
limit. Please argue for that choice. And how the client should e.g. know
about evictions in that cache.

I don't think metadata every row makes sense. So it shouldn't be optional in that sense. It was optional mostly because the downstream didn't initially support it and I thought it'd be useful to allow people to write simpler clients.

The cache logic is really quite simple though, so I'm happy enough to make it required.

Re the size limit, my thinking was that there are (unfortunately) real world apps that create and drop tables in production, that have tens of thousands or more schemas that each have hundreds or more tables, etc. They're awful and I wish people wouldn't do that, but they do. Rather than expecting some unknown and unbounded cache size being able to limit the number of tables for which the downstream is required to retain metadata seemed possibly useful. I'm far from wedded to the idea as it requires the upstream to maintain a metadata cache LRU (which it doesn't yet) and send cache eviction messages to the downstream. Cutting that whole idea out would simplify things.

> --- /dev/null
> +++ b/contrib/pglogical_output/pglogical_config.h
> @@ -0,0 +1,55 @@
> +#ifndef PG_LOGICAL_CONFIG_H
> +#define PG_LOGICAL_CONFIG_H
> +
> +#ifndef PG_VERSION_NUM
> +#error <postgres.h> must be included first
> +#endif

Huh?

It's a stray that should've been part of pglogical/compat.h (which will presumably get cut for inclusion in contrib anyway).

> +#include "nodes/pg_list.h"
> +#include "pglogical_output.h"
> +
> +inline static bool
> +server_float4_byval(void)
> +{
> +#ifdef USE_FLOAT4_BYVAL
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_float8_byval(void)
> +{
> +#ifdef USE_FLOAT8_BYVAL
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_integer_datetimes(void)
> +{
> +#ifdef USE_INTEGER_DATETIMES
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_bigendian(void)
> +{
> +#ifdef WORDS_BIGENDIAN
> + return true;
> +#else
> + return false;
> +#endif
> +}

Not convinced these should exists, and even moreso exposed in a header.

Yeah. The info's needed in both pglogical_config.c and pglogical_output.c but that can probably be avoided.

> +/*
> + * Returns Oid of the hooks function specified in funcname.
> + *
> + * Error is thrown if function doesn't exist or doen't return correct datatype
> + * or is volatile.
> + */
> +static Oid
> +get_hooks_function_oid(List *funcname)
> +{
> + Oid funcid;
> + Oid funcargtypes[1];
> +
> + funcargtypes[0] = INTERNALOID;
> +
> + /* find the the function */
> + funcid = LookupFuncName(funcname, 1, funcargtypes, false);
> +
> + /* Validate that the function returns void */
> + if (get_func_rettype(funcid) != VOIDOID)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("function %s must return void",
> + NameListToString(funcname))));
> + }

Hm, this seems easy to poke holes into. I mean you later use it like:

> + if (data->hooks_setup_funcname != NIL)
> + {
> + hooks_func = get_hooks_function_oid(data->hooks_setup_funcname);
> +
> + old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> + (void) OidFunctionCall1(hooks_func, PointerGetDatum(&data->hooks));
> + MemoryContextSwitchTo(old_ctxt);

e.g. you basically assume the function the does something reasonable
with those types. Why don't we instead create a 'plogical_hooks' return
type, and have the function return that?

I want to avoid requiring any extension to be loaded for the output plugin, to just have it provide the mechanism for transporting data changes and not start adding types, user catalogs, etc.

That's still OK if the signature in pg_proc is 'returns internal', I just don't want anything visible from SQL.

My other reasons for this approach have been obsoleted now that we're installing the pglogical/hooks.h header in Pg's server includes. Originally I was afraid extensions would have to make a *copy* of the hooks struct definition in their own headers. In that case we'd be able to add entries only strictly at the end of the struct and could only use it by palloc0'ing it and passing a pointer. Of course, I didn't think of the case where the hook defining extension had the bigger of the two definitions...

So yeah. Happy to just return 'internal', DatumGetPointer it and cast to a pointer to our hooks struct.

> + if (func_volatile(funcid) == PROVOLATILE_VOLATILE)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("function %s must not be VOLATILE",
> + NameListToString(funcname))));
> + }

Hm, not sure what that's supposed to achieve. You could argue for
requiring the function to be immutable (i.e. not stable or volatile),
but I'm not sure what that'd achieve.

It's a stupid holdover from the function's use elsewhere, and entirely my fault.

> + old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> + (void) (*data->hooks.startup_hook)(&args);
> + MemoryContextSwitchTo(old_ctxt);

What is the hooks memory contexts intended to achieve? It's apparently
never reset. Normally output plugin calbacks are called in more
shortlived memory contexts, for good reason, to avoid leaks....

Mainly to make memory allocated by and used by hooks more visible when debugging, rather than showing it conflated with the main logical decoding context, while still giving hooks somewhere to store state that they may need across the decoding session.

Not going to argue strongly for it. Just seemed like something I'd regret not having when trying to figure out why the decoding context was massive for no apparent reason later...

> +bool
> +call_row_filter_hook(PGLogicalOutputData *data, ReorderBufferTXN *txn,
> + Relation rel, ReorderBufferChange *change)
> +{
> + struct PGLogicalRowFilterArgs hook_args;
> + MemoryContext old_ctxt;
> + bool ret = true;
> +
> + if (data->hooks.row_filter_hook != NULL)
> + {
> + hook_args.change_type = change->action;
> + hook_args.private_data = data->hooks.hooks_private_data;
> + hook_args.changed_rel = rel;
> + hook_args.change = change;
> +
> + elog(DEBUG3, "calling pglogical row filter hook");
> +
> + old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> + ret = (*data->hooks.row_filter_hook)(&hook_args);

Why aren't we passing txn to the filter? ISTM it'd be better to
basically reuse/extend the signature by the the original change
callback.

Yeah, probably.

I somewhat wish the original callbacks used struct arguments. Otherwise you land up with fun #ifdef's when supporting multiple Pg versions and it's hard to add new params. Quite annoying when dealing with extensions. OTOH it's presumably faster to use the usual C calling convention.

> +/* These must be available to pg_dlsym() */

No the following don't? And they aren't, since they're static functions?
_PG_init and _PG_output_plugin_init need to, but that's it.

Yeah. Total thinko.

> +/*
> + * COMMIT callback
> + */
> +void
> +pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> + XLogRecPtr commit_lsn)
> +{

Missing static's?

Yes.

> + /*
> + * Nobody keeps pointers to entries in this hash table around so
> + * it's safe to directly HASH_REMOVE the entries as soon as they are
> + * invalidated. Finding them and flagging them invalid then removing
> + * them lazily might save some memory churn for tables that get
> + * repeatedly invalidated and re-sent, but it dodesn't seem worth
> + * doing.
> + *
> + * Getting invalidations for relations that aren't in the table is
> + * entirely normal, since there's no way to unregister for an
> + * invalidation event. So we don't care if it's found or not.
> + */
> + (void) hash_search(RelMetaCache, &relid, HASH_REMOVE, NULL);
> + }

So, I don't buy this, like at all. The cache entry is passed to
functions, while we call output functions and such. Which in turn can
cause cache invalidations to be processed.

That was a misunderstanding on my part. I hadn't realised that cache invalidations could be fired so easily by apparently innocuous actions, and had assumed incorrectly that we'd get invalidations only outside the scope of a decoding callback, not during one.

Clearly need to fix this with the usual invalid flag based approach. Which in turn makes me agree with your proposal yesterday about adding a generic mechanism for extensions to register their interest in invalidations on tables, attach data to them, and not worry about the details of managing the hash table correctly etc.

> +struct PGLRelMetaCacheEntry
> +{
> + Oid relid;
> + /* Does the client have this relation cached? */
> + bool is_cached;
> + /* Field for API plugin use, must be alloc'd in decoding context */
> + void *api_private;
> +};

I don't see how api_private can safely be used. At the very least it
needs a lot more documentation about memory lifetime rules and
such. Afaics we'd just forever leak memory atm.

Yeah. It's pretty much just wrong, and since I don't have a compelling reason for it I'm happy enough for it to just go away. Doing it correctly would pretty much require a callback to be registered for freeing it, and ... meh.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Michael Paquier

Дата:

01 февраля 2016 г., 07:53:58

On Thu, Jan 7, 2016 at 4:50 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 7 January 2016 at 01:17, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 12/22/15 4:55 AM, Craig Ringer wrote:
>> and we could probably go through them
>> one by one and ask, why do we need this bit?  So that kind of system
>> will be very hard to review as a standalone submission.
>
> Again, I disagree. I think you're looking at this way too narrowly.
>
> I find it quite funny, actually. Here we go and produce something that's a
> nice re-usable component that other people can use in their products and
> solutions ... and all anyone does is complain that the other part required
> to use it as a canned product isn't posted yet (though it is now). But with
> BDR all anyone ever does is complain that it's too tightly coupled to the
> needs of a single product and the features extracted from it, like
> replication origins, should be more generic and general purpose so other
> people can use them in their products too. Which is it going to be?

As far as I can see, this patch is leveraging the current
infrastructure in core, logical decoding to convert the data obtained
as a set JSON blobs via a custom protocol. Its maintenance load looks
minimal, that's at least a good thing.

> It would be helpful if you could take a step back and describe what *you*
> think logical replication for PostgreSQL should look like. You clearly have
> a picture in mind of what it should be, what requirements it satisfies, etc.
> If you're going to argue based on that it'd be very helpful to describe it.
> I might've missed some important points you've seen and you might've
> overlooked issues I've seen.

Personally, if I would put a limit of what should be in-core, or what
should be logical replication from the core prospective, that would be
just to give to potential consumers (understand plugins here) of this
binary data (be it pg_ddl_command or what the logical decoding context
offers) a way to access it and then to allow those plugins to change
this binary data into something that is suited to it, and have simple
tools and example to test those things without relying on anything
external. test_decoding and test_ddl_deparse cover that already. What
I find a bit disturbing regarding this patch is: why would a JSON
representation be able to cover all kinds of needs? Aren't other
replication solution going to have their own data format and their own
protocol with different requirements?

Considering that this module could have a happier life if managed independently.
-- 
Michael

Re: pglogical_output - a general purpose logical decoding output plugin

От

Andres Freund

Дата:

14 марта 2016 г., 23:48:15

On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> On 29 January 2016 at 18:16, Andres Freund <andres@anarazel.de> wrote:
> 
> > Hi,
> >
> > so, I'm reviewing the output of:
> >
> 
> Thankyou very much for the review.

Afaics you've not posted an updated version of this? Any chance you
could?

Greetings,

Andres Freund

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

15 марта 2016 г., 07:45:26

On 15 March 2016 at 04:48, Andres Freund <andres@anarazel.de> wrote:

On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> On 29 January 2016 at 18:16, Andres Freund <andres@anarazel.de> wrote:
>
> > Hi,
> >
> > so, I'm reviewing the output of:
> >
>
> Thankyou very much for the review.

Afaics you've not posted an updated version of this? Any chance you
could?

I'll try to get to it soon, yeah. I have been focusing on things that cannot exist as extensions, especially timeline following for logical slots and failover slots.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pglogical_output - a general purpose logical decoding output plugin

От

Craig Ringer

Дата:

29 марта 2016 г., 10:37:37

On 15 March 2016 at 04:48, Andres Freund <andres@anarazel.de> wrote:

On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> On 29 January 2016 at 18:16, Andres Freund <andres@anarazel.de> wrote:
>
> > Hi,
> >
> > so, I'm reviewing the output of:
> >
>
> Thankyou very much for the review.

Afaics you've not posted an updated version of this? Any chance you
could?

Here's v5.

It still needs json support to be removed and the plpgsql hooks replaced, but the rest should be sorted out.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Timeline-following-for-logical-decoding.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: pglogical_output - a general purpose logical decoding output plugin

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения