Обсуждение: Options OUTPUT_PLUGIN_* controlling format are confusing (Was: Misleading error message in logical decoding)

Поиск
Список
Период
Сортировка

Options OUTPUT_PLUGIN_* controlling format are confusing (Was: Misleading error message in logical decoding)

От
Michael Paquier
Дата:
Hi all,

As there has been a dump of CATALOG_VERSION_NO on REL9_4_STABLE
recently, I am coming back to the options OUTPUT_PLUGIN_* that are
rather confusing for developers of decoder plugins. In short, when
extracting changes from a replication slot, a decoder plugin is free
to set one option that influences the output that is fetched by a
logical receiver: OUTPUT_PLUGIN_TEXTUAL_OUTPUT or
OUTPUT_PLUGIN_BINARY_OUTPUT. The interesting point about this format
option is that it does not directly influence the changes generated by
an output plugin: its value only has effect on the set of functions
pg_logical_[get|peek]_[binary_|]changes that can be used by a
non-replication connection to get individual changes from a repslot.

So a receiver fetching changes using PQgetCopyData is not really
impacted by this format option. Even better it is even possible to use
a custom option that is part of output_plugin_options to switch the
OUTPUT_PLUGIN_* value (cf option force-binary in
contrib/test_decoding).

My point is: logical decoding is presenting in its infrastructure API
a format option that could be added as a custom option in a decoder
plugin. Isn't that orthogonal? To illustrate this argument here is an
example: we could remove force-binary in test_decoding and replace it
by a option that allows the user to directly choose the output format,
like format=[binary|textual] and do the conversion in the plugin. In
the same manner, the functions pg_logical_[get|peek]_binary_changes
are equivalent to their cousin pg_logical_[get|peek]_changes casted to
bytea.

I am raising this point before the 9.4 ship sails, thinking long-term
and to faciliate the maintenance of existing code. Attached is a patch
that simplifies the current logical decoding API regarding that.
Regards,
--
Michael

Вложения
Hi,

On 2014-09-18 02:14:48 -0500, Michael Paquier wrote:
> As there has been a dump of CATALOG_VERSION_NO on REL9_4_STABLE
> recently, I am coming back to the options OUTPUT_PLUGIN_* that are
> rather confusing for developers of decoder plugins. In short, when
> extracting changes from a replication slot, a decoder plugin is free
> to set one option that influences the output that is fetched by a
> logical receiver: OUTPUT_PLUGIN_TEXTUAL_OUTPUT or
> OUTPUT_PLUGIN_BINARY_OUTPUT. The interesting point about this format
> option is that it does not directly influence the changes generated by
> an output plugin: its value only has effect on the set of functions
> pg_logical_[get|peek]_[binary_|]changes that can be used by a
> non-replication connection to get individual changes from a repslot.
> 
> So a receiver fetching changes using PQgetCopyData is not really
> impacted by this format option. Even better it is even possible to use
> a custom option that is part of output_plugin_options to switch the
> OUTPUT_PLUGIN_* value (cf option force-binary in
> contrib/test_decoding).
> 
> My point is: logical decoding is presenting in its infrastructure API
> a format option that could be added as a custom option in a decoder
> plugin. Isn't that orthogonal? To illustrate this argument here is an
> example: we could remove force-binary in test_decoding and replace it
> by a option that allows the user to directly choose the output format,
> like format=[binary|textual] and do the conversion in the plugin. In
> the same manner, the functions pg_logical_[get|peek]_binary_changes
> are equivalent to their cousin pg_logical_[get|peek]_changes casted to
> bytea.

The point is that operating with byteas on SQL level is freaking
painful.

> I am raising this point before the 9.4 ship sails, thinking long-term
> and to faciliate the maintenance of existing code. Attached is a patch
> that simplifies the current logical decoding API regarding that.

Sorry, -1.

Improving the docs here is on my roadmap, but I don't see the benefit of
this.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> The point is that operating with byteas on SQL level is freaking
> painful.
An example perhaps? I fail to see why it is related to the fact that a
user could simply use that to fetch changes in bytea from a slot:
select data::bytea from pg_logical_slot_get_changes('foo', NULL, NULL);
That's perhaps not more simple than using the binary functions, which
actually explain their existence because of the textual/binary format
context with OUTPUT_PLUGIN_*, but that's possible. It is as well
possible to pay quite a lot with custom data types and casts to
transform textual changes (not to mention that a plugin could send out
bytea changes by itself through the COPY string). Looking at the code,
the only difference between textual and binary is an assertion check
using the database encoding, something unlikely to be triggered on
production systems btw.

>> I am raising this point before the 9.4 ship sails, thinking long-term
>> and to faciliate the maintenance of existing code. Attached is a patch
>> that simplifies the current logical decoding API regarding that.
>
> Sorry, -1.
>
> Improving the docs here is on my roadmap, but I don't see the benefit of
> this.
Well, there is no actual benefit. That's only a logical reasoning to
keep the interface as simple as possible without impacting the feature
range. Enforcing an option with arbitrary values that only has effect
for non-replication connections is unintuitive for a plugin structure
that is dedicated for both non-replication and replication protocols
(using replication protocol offers more flexibility, so I'd expect it
to be more widely used than the SQL interface btw). If docs are
improved, it should at least be clearly stated what is the benefit of
one value to the other when fetching changes using the get/peek
functions.
-- 
Michael



On 2014-09-18 08:57:26 -0500, Michael Paquier wrote:
> On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > The point is that operating with byteas on SQL level is freaking
> > painful.
> An example perhaps? I fail to see why it is related to the fact that a
> user could simply use that to fetch changes in bytea from a slot:
> select data::bytea from pg_logical_slot_get_changes('foo', NULL, NULL);

Uh. Because that'd mean that pg_logical_slot_get_changes() would return
text, that's not actually text? I.e. it'll contain 0 bytes and things
outside the server's encoding? Not good.

> That's perhaps not more simple than using the binary functions, which
> actually explain their existence because of the textual/binary format
> context with OUTPUT_PLUGIN_*, but that's possible. It is as well
> possible to pay quite a lot with custom data types and casts to
> transform textual changes (not to mention that a plugin could send out
> bytea changes by itself through the COPY string). Looking at the code,
> the only difference between textual and binary is an assertion check
> using the database encoding, something unlikely to be triggered on
> production systems btw.

No. There's a *SIGNIFICANT* additional difference. If you use the *non
binary* get/peek_changes on a output plugin which has declared (and we
normally believe C code) that it produces binary output you will get an
error.

Again:
The output plugin *declares* whether it returns binary data (i.e. data
that's not in the server's encoding and/or possibly contains 0
bytes.). The textual get/peek_changes cannot accept changes from plugins
declaring that. The binary variant can accept changes regardless of
that.

I have the feeling that you have some fundamentally different
understanding of this in your head and aren't really following my
explanations.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



> On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Improving the docs here is on my roadmap, but I don't see the benefit of
>> this.
Btw, I sent a couple of weeks back a patch that was an attempt to
improve this portion of the docs, feel free to have a look if that
helps :)
http://www.postgresql.org/message-id/CAB7nPqQDxcHNEBhG71nL6jDGQji9m=FQEoOKs1_YLBFSts4Zsw@mail.gmail.com
-- 
Michael



On 2014-09-18 09:13:48 -0500, Michael Paquier wrote:
> > On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Improving the docs here is on my roadmap, but I don't see the benefit of
> >> this.
> Btw, I sent a couple of weeks back a patch that was an attempt to
> improve this portion of the docs, feel free to have a look if that
> helps :)
> http://www.postgresql.org/message-id/CAB7nPqQDxcHNEBhG71nL6jDGQji9m=FQEoOKs1_YLBFSts4Zsw@mail.gmail.com

Uh, I know - I commented extensively on that thread. Your doc patch is
wrong, so I won't apply it ;) and also can't really use it as a basis...

I'd understood
CAB7nPqRCoCrrf84vHrvdRk8+9RGZbRfBkWMri9_79TFkmz41Fg@mail.gmail.com in
that thread to mean we were on one page. But the message referenced by
you above (sent later) and this discussion seems to indicate that that's
not the case.

Do you see the difference between what your doc patch states and the
explanation I've given nearby in this thread?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Thu, Sep 18, 2014 at 9:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-09-18 09:13:48 -0500, Michael Paquier wrote:
>> > On Thu, Sep 18, 2014 at 2:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> >> Improving the docs here is on my roadmap, but I don't see the benefit of
>> >> this.
>> Btw, I sent a couple of weeks back a patch that was an attempt to
>> improve this portion of the docs, feel free to have a look if that
>> helps :)
>> http://www.postgresql.org/message-id/CAB7nPqQDxcHNEBhG71nL6jDGQji9m=FQEoOKs1_YLBFSts4Zsw@mail.gmail.com
>
> Uh, I know - I commented extensively on that thread. Your doc patch is
> wrong, so I won't apply it ;) and also can't really use it as a basis...
Fine. No problem.

> I'd understood
> CAB7nPqRCoCrrf84vHrvdRk8+9RGZbRfBkWMri9_79TFkmz41Fg@mail.gmail.com in
> that thread to mean we were on one page. But the message referenced by
> you above (sent later) and this discussion seems to indicate that that's
> not the case.

> Do you see the difference between what your doc patch states and the
> explanation I've given nearby in this thread?
Perhaps that's the lack of documentation... Putting on the plugin
developer hat (if there's one), things are not clear for users. And
the thought that a mandatory option value within the plugin that is
only linked to the SQL interface is somewhat non-intuitive. It would
have been perhaps more interesting to allow users to specify a list of
data types that are allowed as output instead. I guess that's just
food for thoughts though...
-- 
Michael



On 2014-09-18 09:50:38 -0500, Michael Paquier wrote:
> > Do you see the difference between what your doc patch states and the
> > explanation I've given nearby in this thread?
> Perhaps that's the lack of documentation...

Man. I've explained it to you about three times. The previous attempts
at doing so didn't seem to help. If my explanations don't explain it so
you can understand it adding them to the docs won't change a thing.
That's why I ask whether you see the difference?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-09-18 09:50:38 -0500, Michael Paquier wrote:
>> > Do you see the difference between what your doc patch states and the
>> > explanation I've given nearby in this thread?
>> Perhaps that's the lack of documentation...
>
> Man. I've explained it to you about three times. The previous attempts
> at doing so didn't seem to help. If my explanations don't explain it so
> you can understand it adding them to the docs won't change a thing.
> That's why I ask whether you see the difference?
Urg sorry for the misunderstanding. The patch stated that this
parameter only influences the output of the SQL functions while it
defines if "the output plugin requires the output method to support
binary data"?
-- 
Michael



On Thu, Sep 18, 2014 at 11:21 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2014-09-18 09:50:38 -0500, Michael Paquier wrote:
>>> > Do you see the difference between what your doc patch states and the
>>> > explanation I've given nearby in this thread?
>>> Perhaps that's the lack of documentation...
>>
>> Man. I've explained it to you about three times. The previous attempts
>> at doing so didn't seem to help. If my explanations don't explain it so
>> you can understand it adding them to the docs won't change a thing.
>> That's why I ask whether you see the difference?
> Urg sorry for the misunderstanding. The patch stated that this
> parameter only influences the output of the SQL functions while it
> defines if "the output plugin requires the output method to support
> binary data"?

I'm not sure exactly what that sentence means.

But here's the point: every series of bytes is a valid bytea, except
maybe if it's over 1GB and runs afould of MaxAllocSize.  But a series
of bytes is only a valid text datum if it's a valid sequence of
characters according to the database encoding.  We like to think of
text as being an arbitrary series of bytes, but it isn't.  It can't
contain any \0 bytes, and it can't contain anything that's invalid in
the database encoding.  bytea isn't subject to either of those
restrictions.

So if we were going to have one universal output format for output
plugins, it would have to be bytea because that, really, can be
anything.  We could make users convert from that to text or whatever
they like.  But that's unappealing for several reasons: bytea output
is printed as unreadable hexademical garbage, and encoding conversions
are expensive.  So what we do instead is provide a text output method
and a binary output method.  That way, plugins that want to return
binary data are free to do so, and output methods that are happy to
return text can *declare* that what they return is legal text - and
then we just assume that to be true, and need not do an encoding
conversion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, Sep 23, 2014 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 18, 2014 at 11:21 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> On 2014-09-18 09:50:38 -0500, Michael Paquier wrote:
>>>> > Do you see the difference between what your doc patch states and the
>>>> > explanation I've given nearby in this thread?
>>>> Perhaps that's the lack of documentation...
>>>
>>> Man. I've explained it to you about three times. The previous attempts
>>> at doing so didn't seem to help. If my explanations don't explain it so
>>> you can understand it adding them to the docs won't change a thing.
>>> That's why I ask whether you see the difference?
>> Urg sorry for the misunderstanding. The patch stated that this
>> parameter only influences the output of the SQL functions while it
>> defines if "the output plugin requires the output method to support
>> binary data"?
>
> I'm not sure exactly what that sentence means.
>
> But here's the point: every series of bytes is a valid bytea, except
> maybe if it's over 1GB and runs afould of MaxAllocSize.  But a series
> of bytes is only a valid text datum if it's a valid sequence of
> characters according to the database encoding.  We like to think of
> text as being an arbitrary series of bytes, but it isn't.  It can't
> contain any \0 bytes, and it can't contain anything that's invalid in
> the database encoding.  bytea isn't subject to either of those
> restrictions.
>
> So if we were going to have one universal output format for output
> plugins, it would have to be bytea because that, really, can be
> anything.  We could make users convert from that to text or whatever
> they like.  But that's unappealing for several reasons: bytea output
> is printed as unreadable hexademical garbage, and encoding conversions
> are expensive.  So what we do instead is provide a text output method
> and a binary output method.  That way, plugins that want to return
> binary data are free to do so, and output methods that are happy to
> return text can *declare* that what they return is legal text - and
> then we just assume that to be true, and need not do an encoding
> conversion.
Aha, thanks. That's all clear then!
-- 
Michael



On 2014-09-23 12:40:25 +0900, Michael Paquier wrote:
> On Tue, Sep 23, 2014 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Sep 18, 2014 at 11:21 AM, Michael Paquier
> > <michael.paquier@gmail.com> wrote:
> >> On Thu, Sep 18, 2014 at 9:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >>> On 2014-09-18 09:50:38 -0500, Michael Paquier wrote:
> >>>> > Do you see the difference between what your doc patch states and the
> >>>> > explanation I've given nearby in this thread?
> >>>> Perhaps that's the lack of documentation...
> >>>
> >>> Man. I've explained it to you about three times. The previous attempts
> >>> at doing so didn't seem to help. If my explanations don't explain it so
> >>> you can understand it adding them to the docs won't change a thing.
> >>> That's why I ask whether you see the difference?
> >> Urg sorry for the misunderstanding. The patch stated that this
> >> parameter only influences the output of the SQL functions while it
> >> defines if "the output plugin requires the output method to support
> >> binary data"?
> >
> > I'm not sure exactly what that sentence means.
> >
> > But here's the point: every series of bytes is a valid bytea, except
> > maybe if it's over 1GB and runs afould of MaxAllocSize.  But a series
> > of bytes is only a valid text datum if it's a valid sequence of
> > characters according to the database encoding.  We like to think of
> > text as being an arbitrary series of bytes, but it isn't.  It can't
> > contain any \0 bytes, and it can't contain anything that's invalid in
> > the database encoding.  bytea isn't subject to either of those
> > restrictions.
> >
> > So if we were going to have one universal output format for output
> > plugins, it would have to be bytea because that, really, can be
> > anything.  We could make users convert from that to text or whatever
> > they like.  But that's unappealing for several reasons: bytea output
> > is printed as unreadable hexademical garbage, and encoding conversions
> > are expensive.  So what we do instead is provide a text output method
> > and a binary output method.  That way, plugins that want to return
> > binary data are free to do so, and output methods that are happy to
> > return text can *declare* that what they return is legal text - and
> > then we just assume that to be true, and need not do an encoding
> > conversion.
> Aha, thanks. That's all clear then!

What about the attached patch then?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения


On Mon, Sep 29, 2014 at 7:50 PM, Andres Freund <andres@2ndquadrant.com> wrote:
What about the attached patch then?

Thanks for this update. This looks good. Here are a couple of small comments:
1) This sentence is correct English, but I don't recall seeing in the docs such a formulation:
+        Can only be used on slots using a output plugin supporting textual
+        output.
I'd rather rewrite 'It can only be used' or 'this function can only be used'. I imagine that you could add a reference to logicaldecoding-output-mode as well.
2) s/a output/an output/g
3) The formulation here seems vague as a plugin that generates textual output can call pg_logical_slot_peek_binary_changes as well:
-        except that changes are returned as <type>bytea</type>.
+        except that changes are returned as <type>bytea</type> and that it can
+        be used on slots using output plugins that only support binary output.
4) What about reformulating the following:
+     so a <type>text</> can contain it. This is checked in assertion enabled
+     builds.
"This is checked in builds with assertions enabled."
5) Better to add a newline between two portions sect2.
+   </sect2>
    <sect2 id="logicaldecoding-output-plugin-callbacks">

Regards,
--
Michael
On 2014-09-29 21:48:08 +0900, Michael Paquier wrote:
> On Mon, Sep 29, 2014 at 7:50 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> 
> > What about the attached patch then?
> >
> 
> Thanks for this update. This looks good. Here are a couple of small
> comments:
> 1) This sentence is correct English, but I don't recall seeing in the docs
> such a formulation:
> +        Can only be used on slots using a output plugin supporting textual
> +        output.
> I'd rather rewrite 'It can only be used' or 'this function can only be
> used'. I imagine that you could add a reference to
> logicaldecoding-output-mode as well.

Hm.

> 2) s/a output/an output/g

> 3) The formulation here seems vague as a plugin that generates textual
> output can call pg_logical_slot_peek_binary_changes as well:

I've commented on this before: an output plugin doesn't call
pg_logical_slot_peek_binary_changes - if at all it's the other way
round.

> -        except that changes are returned as <type>bytea</type>.
> +        except that changes are returned as <type>bytea</type> and that it
> can
> +        be used on slots using output plugins that only support binary
> output.

Imo that's pretty much implied because it references the !binary
version. But I guess it doesn't hurt to be explicit. How about:
" ... on output plugins using any form of output, including binary."?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Mon, Sep 29, 2014 at 9:03 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> -        except that changes are returned as <type>bytea</type>.
>> +        except that changes are returned as <type>bytea</type> and that it
>> can
>> +        be used on slots using output plugins that only support binary
>> output.
>
> Imo that's pretty much implied because it references the !binary
> version. But I guess it doesn't hurt to be explicit. How about:
> " ... on output plugins using any form of output, including binary."?

I think you should just leave it alone.  There's no problem with what
it says there right now.  It goes without saying that if the plugin
can only return bytea, then you have to use the bytea-returning
function to get it.  If it's not clear that such plugins might exist,
that needs to be clarified better elsewhere, not here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





On Mon, Sep 29, 2014 at 10:03 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> 3) The formulation here seems vague as a plugin that generates textual
> output can call pg_logical_slot_peek_binary_changes as well:

I've commented on this before: an output plugin doesn't call
pg_logical_slot_peek_binary_changes - if at all it's the other way
round.
 
Sorry. I am sure you got what I wanted to say: "a user can as well use pg_logical_slot_peek_binary_changes to retrieve changes from a replication slot that uses an plugin generating textual output" :)
--
Michael
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Tue, Sep 30, 2014 at 12:06 AM, Robert
Haas<span dir="ltr"><<a href="mailto:robertmhaas@gmail.com" target="_blank">robertmhaas@gmail.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">OnMon, Sep 29, 2014 at 9:03 AM, Andres Freund <<a
href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>wrote:<br /> >> -        except that changes
arereturned as <type>bytea</type>.<br /> >> +        except that changes are returned as
<type>bytea</type>and that it<br /> >> can<br /> >> +        be used on slots using output
pluginsthat only support binary<br /> >> output.<br /> ><br /> > Imo that's pretty much implied because it
referencesthe !binary<br /> > version. But I guess it doesn't hurt to be explicit. How about:<br /> > " ... on
outputplugins using any form of output, including binary."?<br /><br /></span>I think you should just leave it alone. 
There'sno problem with what<br /> it says there right now.  It goes without saying that if the plugin<br /> can only
returnbytea, then you have to use the bytea-returning<br /> function to get it.  If it's not clear that such plugins
mightexist,<br /> that needs to be clarified better elsewhere, not here.<span class="HOEnZb"><font color="#888888"><br
/></font></span></blockquote></div>Yes,not modifying the current text would be fine.<br />-- <br />Michael<br
/></div></div>