Обсуждение: Options OUTPUT_PLUGIN_* controlling format are confusing (Was: Misleading error message in logical decoding)
Hi all, As there has been a dump of CATALOG_VERSION_NO on REL9_4_STABLE recently, I am coming back to the options OUTPUT_PLUGIN_* that are rather confusing for developers of decoder plugins. In short, when extracting changes from a replication slot, a decoder plugin is free to set one option that influences the output that is fetched by a logical receiver: OUTPUT_PLUGIN_TEXTUAL_OUTPUT or OUTPUT_PLUGIN_BINARY_OUTPUT. The interesting point about this format option is that it does not directly influence the changes generated by an output plugin: its value only has effect on the set of functions pg_logical_[get|peek]_[binary_|]changes that can be used by a non-replication connection to get individual changes from a repslot. So a receiver fetching changes using PQgetCopyData is not really impacted by this format option. Even better it is even possible to use a custom option that is part of output_plugin_options to switch the OUTPUT_PLUGIN_* value (cf option force-binary in contrib/test_decoding). My point is: logical decoding is presenting in its infrastructure API a format option that could be added as a custom option in a decoder plugin. Isn't that orthogonal? To illustrate this argument here is an example: we could remove force-binary in test_decoding and replace it by a option that allows the user to directly choose the output format, like format=[binary|textual] and do the conversion in the plugin. In the same manner, the functions pg_logical_[get|peek]_binary_changes are equivalent to their cousin pg_logical_[get|peek]_changes casted to bytea. I am raising this point before the 9.4 ship sails, thinking long-term and to faciliate the maintenance of existing code. Attached is a patch that simplifies the current logical decoding API regarding that. Regards, -- Michael
Вложения
Hi, On 2014-09-18 02:14:48 -0500, Michael Paquier wrote: > As there has been a dump of CATALOG_VERSION_NO on REL9_4_STABLE > recently, I am coming back to the options OUTPUT_PLUGIN_* that are > rather confusing for developers of decoder plugins. In short, when > extracting changes from a replication slot, a decoder plugin is free > to set one option that influences the output that is fetched by a > logical receiver: OUTPUT_PLUGIN_TEXTUAL_OUTPUT or > OUTPUT_PLUGIN_BINARY_OUTPUT. The interesting point about this format > option is that it does not directly influence the changes generated by > an output plugin: its value only has effect on the set of functions > pg_logical_[get|peek]_[binary_|]changes that can be used by a > non-replication connection to get individual changes from a repslot. > > So a receiver fetching changes using PQgetCopyData is not really > impacted by this format option. Even better it is even possible to use > a custom option that is part of output_plugin_options to switch the > OUTPUT_PLUGIN_* value (cf option force-binary in > contrib/test_decoding). > > My point is: logical decoding is presenting in its infrastructure API > a format option that could be added as a custom option in a decoder > plugin. Isn't that orthogonal? To illustrate this argument here is an > example: we could remove force-binary in test_decoding and replace it > by a option that allows the user to directly choose the output format, > like format=[binary|textual] and do the conversion in the plugin. In > the same manner, the functions pg_logical_[get|peek]_binary_changes > are equivalent to their cousin pg_logical_[get|peek]_changes casted to > bytea. The point is that operating with byteas on SQL level is freaking painful. > I am raising this point before the 9.4 ship sails, thinking long-term > and to faciliate the maintenance of existing code. Attached is a patch > that simplifies the current logical decoding API regarding that. Sorry, -1. Improving the docs here is on my roadmap, but I don't see the benefit of this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: > The point is that operating with byteas on SQL level is freaking > painful. An example perhaps? I fail to see why it is related to the fact that a user could simply use that to fetch changes in bytea from a slot: select data::bytea from pg_logical_slot_get_changes('foo', NULL, NULL); That's perhaps not more simple than using the binary functions, which actually explain their existence because of the textual/binary format context with OUTPUT_PLUGIN_*, but that's possible. It is as well possible to pay quite a lot with custom data types and casts to transform textual changes (not to mention that a plugin could send out bytea changes by itself through the COPY string). Looking at the code, the only difference between textual and binary is an assertion check using the database encoding, something unlikely to be triggered on production systems btw. >> I am raising this point before the 9.4 ship sails, thinking long-term >> and to faciliate the maintenance of existing code. Attached is a patch >> that simplifies the current logical decoding API regarding that. > > Sorry, -1. > > Improving the docs here is on my roadmap, but I don't see the benefit of > this. Well, there is no actual benefit. That's only a logical reasoning to keep the interface as simple as possible without impacting the feature range. Enforcing an option with arbitrary values that only has effect for non-replication connections is unintuitive for a plugin structure that is dedicated for both non-replication and replication protocols (using replication protocol offers more flexibility, so I'd expect it to be more widely used than the SQL interface btw). If docs are improved, it should at least be clearly stated what is the benefit of one value to the other when fetching changes using the get/peek functions. -- Michael
On 2014-09-18 08:57:26 -0500, Michael Paquier wrote: > On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > The point is that operating with byteas on SQL level is freaking > > painful. > An example perhaps? I fail to see why it is related to the fact that a > user could simply use that to fetch changes in bytea from a slot: > select data::bytea from pg_logical_slot_get_changes('foo', NULL, NULL); Uh. Because that'd mean that pg_logical_slot_get_changes() would return text, that's not actually text? I.e. it'll contain 0 bytes and things outside the server's encoding? Not good. > That's perhaps not more simple than using the binary functions, which > actually explain their existence because of the textual/binary format > context with OUTPUT_PLUGIN_*, but that's possible. It is as well > possible to pay quite a lot with custom data types and casts to > transform textual changes (not to mention that a plugin could send out > bytea changes by itself through the COPY string). Looking at the code, > the only difference between textual and binary is an assertion check > using the database encoding, something unlikely to be triggered on > production systems btw. No. There's a *SIGNIFICANT* additional difference. If you use the *non binary* get/peek_changes on a output plugin which has declared (and we normally believe C code) that it produces binary output you will get an error. Again: The output plugin *declares* whether it returns binary data (i.e. data that's not in the server's encoding and/or possibly contains 0 bytes.). The textual get/peek_changes cannot accept changes from plugins declaring that. The binary variant can accept changes regardless of that. I have the feeling that you have some fundamentally different understanding of this in your head and aren't really following my explanations. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Improving the docs here is on my roadmap, but I don't see the benefit of >> this. Btw, I sent a couple of weeks back a patch that was an attempt to improve this portion of the docs, feel free to have a look if that helps :) http://www.postgresql.org/message-id/CAB7nPqQDxcHNEBhG71nL6jDGQji9m=FQEoOKs1_YLBFSts4Zsw@mail.gmail.com -- Michael
On 2014-09-18 09:13:48 -0500, Michael Paquier wrote: > > On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Improving the docs here is on my roadmap, but I don't see the benefit of > >> this. > Btw, I sent a couple of weeks back a patch that was an attempt to > improve this portion of the docs, feel free to have a look if that > helps :) > http://www.postgresql.org/message-id/CAB7nPqQDxcHNEBhG71nL6jDGQji9m=FQEoOKs1_YLBFSts4Zsw@mail.gmail.com Uh, I know - I commented extensively on that thread. Your doc patch is wrong, so I won't apply it ;) and also can't really use it as a basis... I'd understood CAB7nPqRCoCrrf84vHrvdRk8+9RGZbRfBkWMri9_79TFkmz41Fg@mail.gmail.com in that thread to mean we were on one page. But the message referenced by you above (sent later) and this discussion seems to indicate that that's not the case. Do you see the difference between what your doc patch states and the explanation I've given nearby in this thread? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 18, 2014 at 9:23 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-09-18 09:13:48 -0500, Michael Paquier wrote: >> > On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> >> Improving the docs here is on my roadmap, but I don't see the benefit of >> >> this. >> Btw, I sent a couple of weeks back a patch that was an attempt to >> improve this portion of the docs, feel free to have a look if that >> helps :) >> http://www.postgresql.org/message-id/CAB7nPqQDxcHNEBhG71nL6jDGQji9m=FQEoOKs1_YLBFSts4Zsw@mail.gmail.com > > Uh, I know - I commented extensively on that thread. Your doc patch is > wrong, so I won't apply it ;) and also can't really use it as a basis... Fine. No problem. > I'd understood > CAB7nPqRCoCrrf84vHrvdRk8+9RGZbRfBkWMri9_79TFkmz41Fg@mail.gmail.com in > that thread to mean we were on one page. But the message referenced by > you above (sent later) and this discussion seems to indicate that that's > not the case. > Do you see the difference between what your doc patch states and the > explanation I've given nearby in this thread? Perhaps that's the lack of documentation... Putting on the plugin developer hat (if there's one), things are not clear for users. And the thought that a mandatory option value within the plugin that is only linked to the SQL interface is somewhat non-intuitive. It would have been perhaps more interesting to allow users to specify a list of data types that are allowed as output instead. I guess that's just food for thoughts though... -- Michael
On 2014-09-18 09:50:38 -0500, Michael Paquier wrote: > > Do you see the difference between what your doc patch states and the > > explanation I've given nearby in this thread? > Perhaps that's the lack of documentation... Man. I've explained it to you about three times. The previous attempts at doing so didn't seem to help. If my explanations don't explain it so you can understand it adding them to the docs won't change a thing. That's why I ask whether you see the difference? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-09-18 09:50:38 -0500, Michael Paquier wrote: >> > Do you see the difference between what your doc patch states and the >> > explanation I've given nearby in this thread? >> Perhaps that's the lack of documentation... > > Man. I've explained it to you about three times. The previous attempts > at doing so didn't seem to help. If my explanations don't explain it so > you can understand it adding them to the docs won't change a thing. > That's why I ask whether you see the difference? Urg sorry for the misunderstanding. The patch stated that this parameter only influences the output of the SQL functions while it defines if "the output plugin requires the output method to support binary data"? -- Michael
On Thu, Sep 18, 2014 at 11:21 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2014-09-18 09:50:38 -0500, Michael Paquier wrote: >>> > Do you see the difference between what your doc patch states and the >>> > explanation I've given nearby in this thread? >>> Perhaps that's the lack of documentation... >> >> Man. I've explained it to you about three times. The previous attempts >> at doing so didn't seem to help. If my explanations don't explain it so >> you can understand it adding them to the docs won't change a thing. >> That's why I ask whether you see the difference? > Urg sorry for the misunderstanding. The patch stated that this > parameter only influences the output of the SQL functions while it > defines if "the output plugin requires the output method to support > binary data"? I'm not sure exactly what that sentence means. But here's the point: every series of bytes is a valid bytea, except maybe if it's over 1GB and runs afould of MaxAllocSize. But a series of bytes is only a valid text datum if it's a valid sequence of characters according to the database encoding. We like to think of text as being an arbitrary series of bytes, but it isn't. It can't contain any \0 bytes, and it can't contain anything that's invalid in the database encoding. bytea isn't subject to either of those restrictions. So if we were going to have one universal output format for output plugins, it would have to be bytea because that, really, can be anything. We could make users convert from that to text or whatever they like. But that's unappealing for several reasons: bytea output is printed as unreadable hexademical garbage, and encoding conversions are expensive. So what we do instead is provide a text output method and a binary output method. That way, plugins that want to return binary data are free to do so, and output methods that are happy to return text can *declare* that what they return is legal text - and then we just assume that to be true, and need not do an encoding conversion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 23, 2014 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 18, 2014 at 11:21 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote: >>> On 2014-09-18 09:50:38 -0500, Michael Paquier wrote: >>>> > Do you see the difference between what your doc patch states and the >>>> > explanation I've given nearby in this thread? >>>> Perhaps that's the lack of documentation... >>> >>> Man. I've explained it to you about three times. The previous attempts >>> at doing so didn't seem to help. If my explanations don't explain it so >>> you can understand it adding them to the docs won't change a thing. >>> That's why I ask whether you see the difference? >> Urg sorry for the misunderstanding. The patch stated that this >> parameter only influences the output of the SQL functions while it >> defines if "the output plugin requires the output method to support >> binary data"? > > I'm not sure exactly what that sentence means. > > But here's the point: every series of bytes is a valid bytea, except > maybe if it's over 1GB and runs afould of MaxAllocSize. But a series > of bytes is only a valid text datum if it's a valid sequence of > characters according to the database encoding. We like to think of > text as being an arbitrary series of bytes, but it isn't. It can't > contain any \0 bytes, and it can't contain anything that's invalid in > the database encoding. bytea isn't subject to either of those > restrictions. > > So if we were going to have one universal output format for output > plugins, it would have to be bytea because that, really, can be > anything. We could make users convert from that to text or whatever > they like. But that's unappealing for several reasons: bytea output > is printed as unreadable hexademical garbage, and encoding conversions > are expensive. So what we do instead is provide a text output method > and a binary output method. That way, plugins that want to return > binary data are free to do so, and output methods that are happy to > return text can *declare* that what they return is legal text - and > then we just assume that to be true, and need not do an encoding > conversion. Aha, thanks. That's all clear then! -- Michael
On 2014-09-23 12:40:25 +0900, Michael Paquier wrote: > On Tue, Sep 23, 2014 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 18, 2014 at 11:21 AM, Michael Paquier > > <michael.paquier@gmail.com> wrote: > >> On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >>> On 2014-09-18 09:50:38 -0500, Michael Paquier wrote: > >>>> > Do you see the difference between what your doc patch states and the > >>>> > explanation I've given nearby in this thread? > >>>> Perhaps that's the lack of documentation... > >>> > >>> Man. I've explained it to you about three times. The previous attempts > >>> at doing so didn't seem to help. If my explanations don't explain it so > >>> you can understand it adding them to the docs won't change a thing. > >>> That's why I ask whether you see the difference? > >> Urg sorry for the misunderstanding. The patch stated that this > >> parameter only influences the output of the SQL functions while it > >> defines if "the output plugin requires the output method to support > >> binary data"? > > > > I'm not sure exactly what that sentence means. > > > > But here's the point: every series of bytes is a valid bytea, except > > maybe if it's over 1GB and runs afould of MaxAllocSize. But a series > > of bytes is only a valid text datum if it's a valid sequence of > > characters according to the database encoding. We like to think of > > text as being an arbitrary series of bytes, but it isn't. It can't > > contain any \0 bytes, and it can't contain anything that's invalid in > > the database encoding. bytea isn't subject to either of those > > restrictions. > > > > So if we were going to have one universal output format for output > > plugins, it would have to be bytea because that, really, can be > > anything. We could make users convert from that to text or whatever > > they like. But that's unappealing for several reasons: bytea output > > is printed as unreadable hexademical garbage, and encoding conversions > > are expensive. So what we do instead is provide a text output method > > and a binary output method. That way, plugins that want to return > > binary data are free to do so, and output methods that are happy to > > return text can *declare* that what they return is legal text - and > > then we just assume that to be true, and need not do an encoding > > conversion. > Aha, thanks. That's all clear then! What about the attached patch then? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Mon, Sep 29, 2014 at 7:50 PM, Andres Freund <andres@2ndquadrant.com> wrote:
+ Can only be used on slots using a output plugin supporting textualWhat about the attached patch then?
Thanks for this update. This looks good. Here are a couple of small comments:
1) This sentence is correct English, but I don't recall seeing in the docs such a formulation:
+ output.
I'd rather rewrite 'It can only be used' or 'this function can only be used'. I imagine that you could add a reference to logicaldecoding-output-mode as well.
2) s/a output/an output/g
3) The formulation here seems vague as a plugin that generates textual output can call pg_logical_slot_peek_binary_changes as well:
- except that changes are returned as <type>bytea</type>.
+ except that changes are returned as <type>bytea</type> and that it can
+ be used on slots using output plugins that only support binary output.
+ except that changes are returned as <type>bytea</type> and that it can
+ be used on slots using output plugins that only support binary output.
4) What about reformulating the following:
+ so a <type>text</> can contain it. This is checked in assertion enabled
+ builds.
+ so a <type>text</> can contain it. This is checked in assertion enabled
+ builds.
"This is checked in builds with assertions enabled."
5) Better to add a newline between two portions sect2.
+ </sect2>
<sect2 id="logicaldecoding-output-plugin-callbacks">
<sect2 id="logicaldecoding-output-plugin-callbacks">
Regards,
--
Michael
Michael
On 2014-09-29 21:48:08 +0900, Michael Paquier wrote: > On Mon, Sep 29, 2014 at 7:50 PM, Andres Freund <andres@2ndquadrant.com> > wrote: > > > What about the attached patch then? > > > > Thanks for this update. This looks good. Here are a couple of small > comments: > 1) This sentence is correct English, but I don't recall seeing in the docs > such a formulation: > + Can only be used on slots using a output plugin supporting textual > + output. > I'd rather rewrite 'It can only be used' or 'this function can only be > used'. I imagine that you could add a reference to > logicaldecoding-output-mode as well. Hm. > 2) s/a output/an output/g > 3) The formulation here seems vague as a plugin that generates textual > output can call pg_logical_slot_peek_binary_changes as well: I've commented on this before: an output plugin doesn't call pg_logical_slot_peek_binary_changes - if at all it's the other way round. > - except that changes are returned as <type>bytea</type>. > + except that changes are returned as <type>bytea</type> and that it > can > + be used on slots using output plugins that only support binary > output. Imo that's pretty much implied because it references the !binary version. But I guess it doesn't hurt to be explicit. How about: " ... on output plugins using any form of output, including binary."? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Sep 29, 2014 at 9:03 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> - except that changes are returned as <type>bytea</type>. >> + except that changes are returned as <type>bytea</type> and that it >> can >> + be used on slots using output plugins that only support binary >> output. > > Imo that's pretty much implied because it references the !binary > version. But I guess it doesn't hurt to be explicit. How about: > " ... on output plugins using any form of output, including binary."? I think you should just leave it alone. There's no problem with what it says there right now. It goes without saying that if the plugin can only return bytea, then you have to use the bytea-returning function to get it. If it's not clear that such plugins might exist, that needs to be clarified better elsewhere, not here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 29, 2014 at 10:03 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> 3) The formulation here seems vague as a plugin that generates textual
> output can call pg_logical_slot_peek_binary_changes as well:
I've commented on this before: an output plugin doesn't call
pg_logical_slot_peek_binary_changes - if at all it's the other way
round.
Sorry. I am sure you got what I wanted to say: "a user can as well use pg_logical_slot_peek_binary_changes to retrieve changes from a replication slot that uses an plugin generating textual output" :)
--
Michael
--
Michael
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Tue, Sep 30, 2014 at 12:06 AM, Robert Haas<span dir="ltr"><<a href="mailto:robertmhaas@gmail.com" target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">OnMon, Sep 29, 2014 at 9:03 AM, Andres Freund <<a href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>wrote:<br /> >> - except that changes arereturned as <type>bytea</type>.<br /> >> + except that changes are returned as <type>bytea</type>and that it<br /> >> can<br /> >> + be used on slots using output pluginsthat only support binary<br /> >> output.<br /> ><br /> > Imo that's pretty much implied because it referencesthe !binary<br /> > version. But I guess it doesn't hurt to be explicit. How about:<br /> > " ... on outputplugins using any form of output, including binary."?<br /><br /></span>I think you should just leave it alone. There'sno problem with what<br /> it says there right now. It goes without saying that if the plugin<br /> can only returnbytea, then you have to use the bytea-returning<br /> function to get it. If it's not clear that such plugins mightexist,<br /> that needs to be clarified better elsewhere, not here.<span class="HOEnZb"><font color="#888888"><br /></font></span></blockquote></div>Yes,not modifying the current text would be fine.<br />-- <br />Michael<br /></div></div>