Обсуждение: Fix XML handling with DOCTYPE

Поиск

Список

Период

Сортировка

Fix XML handling with DOCTYPE

От

Ryan Lambert

Дата:

16 марта 2019 г., 23:10:56

Hi all,

I'm investigating the issue I reported here: https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org

As Tom Lane mentioned there, the docs (8.13) indicate xmloption = CONTENT should accept all valid XML. At this time, XML with a DOCTYPE declaration is not accepted with this setting even though it is considered valid XML. I'd like to work on a patch to address this issue and make it work as advertised.

I traced the source of the error to line ~1500 in /src/backend/utils/adt/xml.c

res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, utf8string + count, NULL);

It looks like it is xmlParseBalancedChunkMemory from libxml that doesn't work when there's a DOCTYPE in the XML data. My assumption is the DOCTYPE element makes the XML not well-balanced. From:

http://xmlsoft.org/html/libxml-parser.html#xmlParseBalancedChunkMemory

This function returns:

0 if the chunk is well balanced, -1 in case of args problem and the parser error code otherwise

I see xmlParseBalancedChunkMemoryRecover that might provide the functionality needed. That function returns:

0 if the chunk is well balanced, -1 in case of args problem and the parser error code otherwise In case recover is set to 1, the nodelist will not be empty even if the parsed chunk is not well balanced, assuming the parsing succeeded to some extent.

I haven't tested yet to see if this parses the data w/ DOCTYPE successfully yet. If it does, I don't think it would be difficult to update the check on res_code to not fail. I'm making another assumption that there is a distinct code from libxml to differentiate from other errors, but I couldn't find those codes quickly. The current check is this:

if (res_code != 0 || xmlerrcxt->err_occurred)

Does this sound reasonable? Have I missed some major aspect? If this is on the right track I can work on creating a patch to move this forward.

Thanks,

Ryan Lambert

RustProof Labs

www.rustprooflabs.com

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

16 марта 2019 г., 23:31:18

On 03/16/19 16:10, Ryan Lambert wrote:
> As Tom Lane mentioned there, the docs (8.13) indicate xmloption = CONTENT
> should accept all valid XML.  At this time, XML with a DOCTYPE declaration
> is not accepted with this setting even though it is considered valid XML.

Hello Ryan,

A patch for your issue is currently registered in the 2019-03 commitfest[1].

If it attracts somebody to review it before the end of the month, it might
make it into PG v12.

It is the xml-content-2006-2.patch found on the email thread [2]. (The other
patch found there is associated documentation fixes, and also needs to be
reviewed.)

Further conversation should probably be on that email thread so that it
stays associated with the commitfest entry.

Thanks for your interest in the issue!

Regards,
Chapman Flack

[1] https://commitfest.postgresql.org/22/1872/
[2] https://www.postgresql.org/message-id/flat/5C81F8C0.6090901@anastigmatix.net

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

16 марта 2019 г., 23:42:45

Ryan Lambert <ryan@rustprooflabs.com> writes:
> I'm investigating the issue I reported here:
> https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org
> I'd like to work on a patch to address this issue and make it work as
> advertised.

Good idea, because it doesn't seem like anybody else cares ...

> I see xmlParseBalancedChunkMemoryRecover that might provide the
> functionality needed.

TBH, our experience with libxml has not been so positive that I'd think
adding dependencies on new parts of its API would be a good plan.

Experimenting with different inputs, it seems like removing the
"<!DOCTYPE ...>" tag is enough to make it work.  So what I'm wondering
about is writing something like parse_xml_decl() to skip over that.

Bear in mind though that I know next to zip about XML.  There may be
some good reason why we don't want to strip off the !DOCTYPE part
from what libxml sees.

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

16 марта 2019 г., 23:55:38

Chapman Flack <chap@anastigmatix.net> writes:
> A patch for your issue is currently registered in the 2019-03 commitfest[1].

Oh!  I apologize for saying nobody was working on this issue.

Taking a very quick look at your patch, though, I dunno --- it seems like
it adds a boatload of new assumptions about libxml's data structures and
error-reporting behavior.  I'm sure it works for you, but will it work
across a wide range of libxml versions?

What do you think of the idea I just posted about parsing off the DOCTYPE
thing for ourselves, and not letting libxml see it?

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

16 марта 2019 г., 23:55:44

On 03/16/19 16:42, Tom Lane wrote:
> Ryan Lambert <ryan@rustprooflabs.com> writes:
>> I'm investigating the issue I reported here:
>> https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org
>> I'd like to work on a patch to address this issue and make it work as
>> advertised.
> 
> Good idea, because it doesn't seem like anybody else cares ...

ahem

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

17 марта 2019 г., 00:11:29

On 03/16/19 16:55, Tom Lane wrote:
> What do you think of the idea I just posted about parsing off the DOCTYPE
> thing for ourselves, and not letting libxml see it?

The principled way of doing that would be to pre-parse to find a DOCTYPE,
and if there is one, leave it there and parse the input as we do for
'document'. Per XML, if there is a DOCTYPE, the document must satisfy
the 'document' syntax requirements, and per SQL/XML:2006-and-later,
'content' is a proper superset of 'document', so if we were asked for
'content' and can successfully parse it as 'document', we're good,
and if we see a DOCTYPE and yet it incurs a parse error as 'document',
well, that's what needed to happen.

The DOCTYPE can appear arbitrarily far in, but the only things that
can precede it are the XML decl, whitespace, XML comments, and XML
processing instructions. None of those things nest, so the preceding
stuff makes a regular language, and a regular expression that matches
any amount of that stuff ending in <!DOCTYPE would be enough to detect
that the parse should be shunted off to get 'document' treatment.

The patch I submitted essentially relies on libxml to do that same
parsing up to that same point and detect the error, so it would add
no upfront cost in the majority of cases that aren't this corner one.

But keeping a little compiled regex around and testing the input with that
would hardly break the bank, either.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

17 марта 2019 г., 00:21:12

Chapman Flack <chap@anastigmatix.net> writes:
> On 03/16/19 16:55, Tom Lane wrote:
>> What do you think of the idea I just posted about parsing off the DOCTYPE
>> thing for ourselves, and not letting libxml see it?

> The principled way of doing that would be to pre-parse to find a DOCTYPE,
> and if there is one, leave it there and parse the input as we do for
> 'document'. Per XML, if there is a DOCTYPE, the document must satisfy
> the 'document' syntax requirements, and per SQL/XML:2006-and-later,
> 'content' is a proper superset of 'document', so if we were asked for
> 'content' and can successfully parse it as 'document', we're good,
> and if we see a DOCTYPE and yet it incurs a parse error as 'document',
> well, that's what needed to happen.

Hm, so, maybe just

(1) always try to parse as document.  If successful, we're done.

(2) otherwise, if allowed by xmloption, try to parse using our
current logic for the CONTENT case.

This avoids adding any new assumptions about how libxml acts,
which is what I was hoping to achieve.

One interesting question is which error to report if both (1) and (2)
fail.

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

17 марта 2019 г., 01:33:19

On 03/16/19 17:21, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> On 03/16/19 16:55, Tom Lane wrote:
>>> What do you think of the idea I just posted about parsing off the DOCTYPE
>>> thing for ourselves, and not letting libxml see it?
> 
>> The principled way of doing that would be to pre-parse to find a DOCTYPE,
>> and if there is one, leave it there and parse the input as we do for
>> 'document'. Per XML, if there is a DOCTYPE, the document must satisfy
>> the 'document' syntax requirements, and per SQL/XML:2006-and-later,
>> 'content' is a proper superset of 'document', so if we were asked for
>> 'content' and can successfully parse it as 'document', we're good,
>> and if we see a DOCTYPE and yet it incurs a parse error as 'document',
>> well, that's what needed to happen.
> 
> Hm, so, maybe just
> 
> (1) always try to parse as document.  If successful, we're done.
> 
> (2) otherwise, if allowed by xmloption, try to parse using our
> current logic for the CONTENT case.

What I don't like about that is that (a) the input could be
arbitrarily long and complex to parse (not that you can't imagine
a database populated with lots of short little XML snippets, but
at the same time, a query could quite plausibly deal in yooge ones),
and (b), step (1) could fail at the last byte of the input, followed
by total reparsing as (2).

I think the safer structure is clearly that of the current patch,
modulo whether the "has a DOCTYPE" test is done by libxml itself
(with the assumptions you don't like) or by a pre-scan.

So the current structure is:

restart:
  asked for document?
    parse as document, or fail
  else asked for content:
    parse as content
    failed?
      because DOCTYPE? restart as if document
      else fail

and a pre-scan structure could be very similar:

restart:
  asked for document?
    parse as document, or fail
  else asked for content:
    pre-scan finds DOCTYPE?
      restart as if document
    else parse as content, or fail

The pre-scan is a simple linear search and will ordinarily say yes or no
within a couple dozen characters--you could *have* an input with 20k of
leading whitespace and comments, but it's hardly the norm. Just trying to
parse as 'document' first could easily parse a large fraction of the input
before discovering it's followed by something that can't follow a document
element.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Ryan Lambert

Дата:

17 марта 2019 г., 01:43:43

Thank you both! I had glanced at that item in the commitfest but didn't notice it would fix this issue.

I'll try to test/review this before the end of the month, much better than starting from scratch myself. A quick glance at the patch looks logical and looks like it should work for my use case.

Thanks,

Ryan Lambert

On Sat, Mar 16, 2019 at 4:33 PM Chapman Flack <chap@anastigmatix.net> wrote:

On 03/16/19 17:21, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> On 03/16/19 16:55, Tom Lane wrote:
>>> What do you think of the idea I just posted about parsing off the DOCTYPE
>>> thing for ourselves, and not letting libxml see it?
>
>> The principled way of doing that would be to pre-parse to find a DOCTYPE,
>> and if there is one, leave it there and parse the input as we do for
>> 'document'. Per XML, if there is a DOCTYPE, the document must satisfy
>> the 'document' syntax requirements, and per SQL/XML:2006-and-later,
>> 'content' is a proper superset of 'document', so if we were asked for
>> 'content' and can successfully parse it as 'document', we're good,
>> and if we see a DOCTYPE and yet it incurs a parse error as 'document',
>> well, that's what needed to happen.
>
> Hm, so, maybe just
>
> (1) always try to parse as document. If successful, we're done.
>
> (2) otherwise, if allowed by xmloption, try to parse using our
> current logic for the CONTENT case.

What I don't like about that is that (a) the input could be
arbitrarily long and complex to parse (not that you can't imagine
a database populated with lots of short little XML snippets, but
at the same time, a query could quite plausibly deal in yooge ones),
and (b), step (1) could fail at the last byte of the input, followed
by total reparsing as (2).

I think the safer structure is clearly that of the current patch,
modulo whether the "has a DOCTYPE" test is done by libxml itself
(with the assumptions you don't like) or by a pre-scan.

So the current structure is:

restart:
asked for document?
parse as document, or fail
else asked for content:
parse as content
failed?
because DOCTYPE? restart as if document
else fail

and a pre-scan structure could be very similar:

restart:
asked for document?
parse as document, or fail
else asked for content:
pre-scan finds DOCTYPE?
restart as if document
else parse as content, or fail

The pre-scan is a simple linear search and will ordinarily say yes or no
within a couple dozen characters--you could *have* an input with 20k of
leading whitespace and comments, but it's hardly the norm. Just trying to
parse as 'document' first could easily parse a large fraction of the input
before discovering it's followed by something that can't follow a document
element.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

17 марта 2019 г., 06:50:15

On 03/16/19 18:33, Chapman Flack wrote:
> The pre-scan is a simple linear search and will ordinarily say yes or no
> within a couple dozen characters--you could *have* an input with 20k of
> leading whitespace and comments, but it's hardly the norm. Just trying to

If the available regexp functions want to start by munging the entire input
into a pg_wchar array, then it may be better to implement the pre-scan as
open code, the same way parse_xml_decl() is already implemented.

Given that parse_xml_decl() already covers the first optional thing that
can precede the doctype, the remaining scan routine would only need to
recognize comments, PIs, and whitespace. That would be pretty straightforward.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

17 марта 2019 г., 18:45:34

Chapman Flack <chap@anastigmatix.net> writes:
> On 03/16/19 17:21, Tom Lane wrote:
>> Hm, so, maybe just
>> 
>> (1) always try to parse as document.  If successful, we're done.
>> 
>> (2) otherwise, if allowed by xmloption, try to parse using our
>> current logic for the CONTENT case.

> What I don't like about that is that (a) the input could be
> arbitrarily long and complex to parse (not that you can't imagine
> a database populated with lots of short little XML snippets, but
> at the same time, a query could quite plausibly deal in yooge ones),
> and (b), step (1) could fail at the last byte of the input, followed
> by total reparsing as (2).

That doesn't seem particularly likely to me: based on what's been
said here, I'd expect parsing with the wrong expectation to usually
fail near the start of the input.  In any case, the other patch
also requires repeat parsing, no?  It's just doing that in a different
set of cases.

The reason I'm pressing you for a simpler patch is that dump/reload
failures are pretty bad, so ideally we'd find a fix that we're
comfortable with back-patching into the released branches.
Personally I would never dare to back-patch the proposed patch:
it's too complex, so it's not real clear that it doesn't have unwanted
side-effects, and it's not at all certain that there aren't libxml
version dependencies in it.  (Maybe another committer with more
familiarity with libxml would evaluate the risks differently, but
I doubt it.)  But I think that something close to what I sketched
above would pass muster as safe-to-backpatch.

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

17 марта 2019 г., 20:11:28

On 03/17/19 11:45, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> On 03/16/19 17:21, Tom Lane wrote:
>>> (1) always try to parse as document.  If successful, we're done.
>>> (2) otherwise, if allowed by xmloption, try to parse using our
> 
>> What I don't like about that is that (a) the input could be
>> arbitrarily long and complex to parse (not that you can't imagine
>> a database populated with lots of short little XML snippets, but
>> at the same time, a query could quite plausibly deal in yooge ones),
>> and (b), step (1) could fail at the last byte of the input, followed
>> by total reparsing as (2).
> 
> That doesn't seem particularly likely to me: based on what's been
> said here, I'd expect parsing with the wrong expectation to usually
> fail near the start of the input.  In any case, the other patch
> also requires repeat parsing, no?  It's just doing that in a different
> set of cases.

I'll do up a version with the open-coded prescan I proposed last night.

Whether parsing with the wrong expectation is likely to fail near the
start of the input depends on both the input and the expectation. If
your expectation is DOCUMENT and the input is CONTENT, it's possible
for the determining difference to be something that follows the first
element, and a first element can be (and often is) nearly all of the input.

What I was doing in the patch is the reverse: parsing with the expectation
of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
element to precede a DTD, so that approach can be expected to fail fast
if the other branch needs to be taken.

But a quick pre-scan for the same thing would have the same property,
without the libxml dependencies that bother you here. Watch this space.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

17 марта 2019 г., 20:16:31

Chapman Flack <chap@anastigmatix.net> writes:
> What I was doing in the patch is the reverse: parsing with the expectation
> of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
> element to precede a DTD, so that approach can be expected to fail fast
> if the other branch needs to be taken.

Ah, right.  I don't have any problem with trying the CONTENT approach
before the DOCUMENT approach rather than vice-versa.  What I was concerned
about was adding a lot of assumptions about exactly how libxml would
report the failure.  IMO a maximally-safe patch would just rearrange
things we're already doing without adding new things.

> But a quick pre-scan for the same thing would have the same property,
> without the libxml dependencies that bother you here. Watch this space.

Do we need a pre-scan at all?

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

17 марта 2019 г., 21:13:03

On 03/17/19 13:16, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> What I was doing in the patch is the reverse: parsing with the expectation
>> of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
>> element to precede a DTD, so that approach can be expected to fail fast
>> if the other branch needs to be taken.
> 
> Ah, right.  I don't have any problem with trying the CONTENT approach
> before the DOCUMENT approach rather than vice-versa.  What I was concerned
> about was adding a lot of assumptions about exactly how libxml would
> report the failure.  IMO a maximally-safe patch would just rearrange
> things we're already doing without adding new things.
> 
>> But a quick pre-scan for the same thing would have the same property,
>> without the libxml dependencies that bother you here. Watch this space.
> 
> Do we need a pre-scan at all?

Without it, we double the time to a failure result in every case that
should actually fail, as well as in this one corner case that we want to
see succeed, and the question you posed earlier about which error message
to return becomes thornier.

If the query asked for CONTENT, any error result should be one you could get
when parsing as CONTENT. If we switch and try parsing as DOCUMENT _because
the input is claiming to have the form of a DOCUMENT_, then it's defensible
to return errors explaining why it's not a DOCUMENT ... but not in the
general case of just throwing DOCUMENT at it any time CONTENT parse fails.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

17 марта 2019 г., 22:06:07

Chapman Flack <chap@anastigmatix.net> writes:
> On 03/17/19 13:16, Tom Lane wrote:
>> Do we need a pre-scan at all?

> Without it, we double the time to a failure result in every case that
> should actually fail, as well as in this one corner case that we want to
> see succeed, and the question you posed earlier about which error message
> to return becomes thornier.

I have absolutely zero concern about whether it takes twice as long to
detect bad input; nobody should be sending bad input if they're concerned
about performance.  (The costs of the ensuing transaction abort would
likely dwarf xml_in's runtime in any case.)  Besides, with what we're
talking about doing here,

(1) the extra runtime is consumed only in cases that would fail up to now,
so nobody can complain about a performance regression;
(2) doing a pre-scan *would* be a performance regression for cases that
work today; not a large one we hope, but still...

The error message issue is indeed a concern, but I don't see why it's
complicated if you agree that

> If the query asked for CONTENT, any error result should be one you could get
> when parsing as CONTENT.

That just requires us to save the first error message and be sure to issue
that one not the DOCUMENT one, no?  That's what we'd want to do from a
backwards-compatibility standpoint anyhow, since that's the error message
wording you'd get with today's code.

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

18 марта 2019 г., 01:31:00

On 03/17/19 15:06, Tom Lane wrote:
> The error message issue is indeed a concern, but I don't see why it's
> complicated if you agree that
> 
>> If the query asked for CONTENT, any error result should be one you could get
>> when parsing as CONTENT.
> 
> That just requires us to save the first error message and be sure to issue
> that one not the DOCUMENT one, no?

I confess I haven't looked hard yet at how to do that. The way errors come
out of libxml is more involved than "here's a message", so there's a choice
of (a) try to copy off that struct in a way that's sure to survive
re-executing the parser, and then use the copy, or (b) generate a message
right away from the structured information and save that, and I guess b
might not be too bad; a might not be too bad, or it might, and slide right
back into the kind of libxml-behavior-assumptions you're wanting to avoid.

Meanwhile, here is a patch on the lines I proposed earlier, with a
pre-check. Any performance hit that it could entail (which I'd really
expect to be de minimis, though I haven't benchmarked) ought to be
compensated by the strlen I changed to strnlen in parse_xml_decl (as
there's really no need to run off and count the whole rest of the input
just to know if 1, 2, 3, or 4 bytes are available to decode a UTF-8 char).

... and, yes, I know that could be an independent patch, and then the
performance effect here should be measured from there. But it was near
what I was doing anyway, so I included it here.

Attaching both still-outstanding patches (this one and docfix) so the
CF app doesn't lose one.

Regards,
-Chap

Вложения

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

18 марта 2019 г., 20:27:10

There might be too many different email threads on this with patches,
but in case it went under the radar, xml-content-2006-3.patch appeared
in my previous message on this thread[1].

It is based on a simple pre-check of the prefix of the input, determining
which form of parse to apply. That may or may not be simpler than parse-
once-save-error-parse-again-report-first-error, but IMV it's a more direct
solution and clearer (the logic is clearly about "how do I determine the way
this input should be parsed?" which is the problem on the table, rather
than "how should I save and regurgitate this libxml error?" which turns the
problem on the table to a different one).

I decided, for a first point of reference, to wear the green eyeshade and
write a pre-check that exactly implements the applicable rules. That gives
a starting point for simplifications that are probably safe.

For example, a bunch of lines at the end have to do with verifying the
content inside of a processing-instruction, after finding where it ends.
We could reasonably decide that, for the purpose of skipping it, knowing
where it ends is enough, as libxml will parse it next and report any errors
anyway.

That would slightly violate my intention of sending input to (the parser
that wasn't asked for) /only/ when it's completely clear (from the prefix
we've seen) that that's where it should go. The relaxed version could do
that in completely-clear cases and cases with an invalid PI ahead of what
looks like a DTD. But you'd pretty much expect both parsers to produce
the same message for a bad PI anyway.

That made me just want to try it now, and--surprise!--the messages from
libxml are not the same. So maybe I would lean to keeping the green-eyeshade
rules in the test, if you can stomach them, but I would understand taking
them out.

Regards,
-Chap

[1] https://www.postgresql.org/message-id/5C8ECAA4.3090301@anastigmatix.net

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

23 марта 2019 г., 23:59:04

Chapman Flack <chap@anastigmatix.net> writes:
> I decided, for a first point of reference, to wear the green eyeshade and
> write a pre-check that exactly implements the applicable rules. That gives
> a starting point for simplifications that are probably safe.
> For example, a bunch of lines at the end have to do with verifying the
> content inside of a processing-instruction, after finding where it ends.
> We could reasonably decide that, for the purpose of skipping it, knowing
> where it ends is enough, as libxml will parse it next and report any errors
> anyway.

Yeah, I did not like that code too much, particularly not all the magic
Unicode-code-point numbers.  I removed that, made some other changes to
bring the patch more in line with PG coding style, and pushed it.

> That made me just want to try it now, and--surprise!--the messages from
> libxml are not the same. So maybe I would lean to keeping the green-eyeshade
> rules in the test, if you can stomach them, but I would understand taking
> them out.

I doubt anyone will care too much about whether error messages for bad
XML input are exactly like what they were before; and even if someone
does, I doubt that these extra tests would be enough to ensure that
the messages don't change.  You're not really validating that the input
is something that libxml would accept, unless its processing of XML PIs
is far stupider than I would expect it to be.

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

24 марта 2019 г., 00:53:24

On 03/23/19 16:59, Tom Lane wrote:
> Unicode-code-point numbers.  I removed that, made some other changes to
> bring the patch more in line with PG coding style, and pushed it.

Thanks! It looks good. I'm content with the extra PI checking being gone.

The magic Unicode-code-point numbers come straight from the XML standard;
I couldn't make that stuff up. :)

> > You're not really validating that the input
> is something that libxml would accept, unless its processing of XML PIs
> is far stupider than I would expect it to be.

Out of curiosity, what further processing would you expect libxml to do?

XML parsers are supposed to be transparent PI-preservers, except in the
rare case of seeing a PI that actually means something to the embedding
application, which isn't going to be the case for a database simply
implementing an XML data type.

The standard literally requires that the target must be a NAME, and
can't match [Xx][Mm][Ll], and if there's whitespace and anything after
that, there can't be an embedded ?> ... and that's it.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

24 марта 2019 г., 01:22:39

Chapman Flack <chap@anastigmatix.net> writes:
> On 03/23/19 16:59, Tom Lane wrote:
>> You're not really validating that the input
>> is something that libxml would accept, unless its processing of XML PIs
>> is far stupider than I would expect it to be.

> Out of curiosity, what further processing would you expect libxml to do?

Hm, I'd have thought it'd try to parse the arguments to some extent,
but maybe not.  Does everybody reimplement attribute parsing for
themselves when using PIs?

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

24 марта 2019 г., 02:07:21

On 03/23/19 18:22, Tom Lane wrote:
>> Out of curiosity, what further processing would you expect libxml to do?
> 
> Hm, I'd have thought it'd try to parse the arguments to some extent,
> but maybe not.  Does everybody reimplement attribute parsing for
> themselves when using PIs?

Yeah, the content of a PI (whatever's after the target name) is left
all to be defined by whatever XML-using application might care about
that PI.

It could have an attribute=value syntax inspired by XML elements, or
some other form entirely, but there'd just better not be any ?> in it.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Ryan Lambert

Дата:

25 марта 2019 г., 04:04:24

I am unable to get latest patches I found [1] to apply cleanly to current branches. It's possible I missed the latest patches so if I'm using the wrong ones please let me know. I tried against master, 11.2 stable and the 11.2 tag with similar results. It's quite possible it's user error on my end, I am new to this process but didn't have issues with the previous patches when I tested those a couple weeks ago.

[1] https://www.postgresql.org/message-id/5C8ECAA4.3090301@anastigmatix.net

Ryan Lambert

On Sat, Mar 23, 2019 at 5:07 PM Chapman Flack <chap@anastigmatix.net> wrote:

On 03/23/19 18:22, Tom Lane wrote:
>> Out of curiosity, what further processing would you expect libxml to do?
>
> Hm, I'd have thought it'd try to parse the arguments to some extent,
> but maybe not. Does everybody reimplement attribute parsing for
> themselves when using PIs?

Yeah, the content of a PI (whatever's after the target name) is left
all to be defined by whatever XML-using application might care about
that PI.

It could have an attribute=value syntax inspired by XML elements, or
some other form entirely, but there'd just better not be any ?> in it.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

25 марта 2019 г., 04:49:31

On 03/24/19 21:04, Ryan Lambert wrote:
> I am unable to get latest patches I found  [1] to apply cleanly to current
> branches. It's possible I missed the latest patches so if I'm using the
> wrong ones please let me know.  I tried against master, 11.2 stable and the
> 11.2 tag with similar results.

Tom pushed the content-with-DOCTYPE patch, it's now included in master,
REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
REL9_4_STABLE.

The only patch that's left to be reviewed and applied is the documentation
fix, latest in [1].

If you were interested in giving a review opinion on some XML documentation.

Regards,
-Chap

[1] https://www.postgresql.org/message-id/5C96DBB5.2080103@anastigmatix.net

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

25 марта 2019 г., 06:18:28

Chapman Flack <chap@anastigmatix.net> writes:
> On 03/24/19 21:04, Ryan Lambert wrote:
>> I am unable to get latest patches I found  [1] to apply cleanly to current
>> branches. It's possible I missed the latest patches so if I'm using the
>> wrong ones please let me know.  I tried against master, 11.2 stable and the
>> 11.2 tag with similar results.

> Tom pushed the content-with-DOCTYPE patch, it's now included in master,
> REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
> REL9_4_STABLE.

Right.  If you want to test (and please do!) you could grab the relevant
branch tip from our git repo, or download a nightly snapshot tarball from

https://www.postgresql.org/ftp/snapshot/

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Ryan Lambert

Дата:

25 марта 2019 г., 17:40:50

Perfect, thank you! I do remember seeing that message now, but hadn't understood what it really meant.

I will test later today. Thanks

Ryan

On Sun, Mar 24, 2019 at 9:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chapman Flack <chap@anastigmatix.net> writes:
> On 03/24/19 21:04, Ryan Lambert wrote:
>> I am unable to get latest patches I found [1] to apply cleanly to current
>> branches. It's possible I missed the latest patches so if I'm using the
>> wrong ones please let me know. I tried against master, 11.2 stable and the
>> 11.2 tag with similar results.

> Tom pushed the content-with-DOCTYPE patch, it's now included in master,
> REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
> REL9_4_STABLE.

Right. If you want to test (and please do!) you could grab the relevant
branch tip from our git repo, or download a nightly snapshot tarball from

https://www.postgresql.org/ftp/snapshot/

regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Ryan Lambert

Дата:

26 марта 2019 г., 01:03:09

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

I tested the master branch (commit 8edd0e7), REL_11_STABLE (commit 24df866) and REL9_6_STABLE (commit 5097368) and
verifiedfunctionality.  This patch fixes the bug I had reported [1] previously.
 

With this in the stable branches is it safe to assume this will be included with the next minor releases?  Thanks for
yourwork on this!!
 

Ryan

[1] https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org

The new status of this patch is: Ready for Committer

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

26 марта 2019 г., 01:52:06

On 03/25/19 18:03, Ryan Lambert wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           not tested
> Documentation:            not tested

Hi,

Thanks for the review! Yes, that part of this commitfest entry has been
committed already and will appear in the next minor releases of those
branches.

That leaves only one patch in this commitfest entry that is still in
need of review, namely the update to the documentation.

If you happened to feel moved to look over a documentation patch, that
would be what this CF entry most needs in the waning days of the commitfest.

There seem to be community members reluctant to review it because of not
feeling sufficiently expert in XML to scrutinize every technical detail,
but there are other valuable angles for documentation review. (And the
reason there *is* a documentation patch is the plentiful room for
improvement in the documentation that's already there, so as far as
reviewing goes, the old yarn about the two guys, the running shoes, and
the bear comes to mind.)

I can supply pointers to specs, etc., for anyone who does see some technical
details in the patch and has questions about them.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

26 марта 2019 г., 01:56:38

Chapman Flack <chap@anastigmatix.net> writes:
> Thanks for the review! Yes, that part of this commitfest entry has been
> committed already and will appear in the next minor releases of those
> branches.

Indeed, thanks for verifying that this fixes your problem.

> That leaves only one patch in this commitfest entry that is still in
> need of review, namely the update to the documentation.

Yeah.  Since it *is* in need of review, I changed the CF entry's
state back to Needs Review.

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Ryan Lambert

Дата:

27 марта 2019 г., 15:22:43

Thanks for putting up with a new reviewer!

--with-libxml is the PostgreSQL configure option to make it use libxml2.

The very web page http://xmlsoft.org/index.html says "The XML C parser
and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

I think leaving it as libxml makes sense with all that. Good point that --with-libxml is used to build so I think staying with that works and is consistent. I agree that having this point included does clarify the how and why of the limitations of this implementation.

I also over-parenthesize so I'm used to looking for that in my own writing. The full sentences were the ones that seemed excessive to me, I think the others are ok and I won't nit-pick either way on those (unless you want me to!).

If you agree, I should go through and fix my nodesets to be node-sets.

Yes, I like node-sets better, especially knowing it conforms to the spec's language.

Thanks,

Ryan Lambert

On Tue, Mar 26, 2019 at 11:05 PM Chapman Flack <chap@anastigmatix.net> wrote:

On 03/26/19 21:39, Ryan Lambert wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: tested, passed
> Spec compliant: not tested
> Documentation: tested, passed

Thanks for the review!

> I have two recommendations for features.sgml. You state:
>
>> relies on the libxml library
>
> Should this be clarified as the libxml2 library? That's what I installed
> to build postgres from source (Ubuntu 16/18). If it is the libxml library
> and the "2" is irrelevant

That's a good catch. I'm not actually sure whether there is any "libxml"
library that isn't libxml2. Maybe there was once and nobody admits to
hanging out with it. Most Google hits on "libxml" seem to be modules
that have libxml in their names and libxml2 as their actual dependency.

Perl XML:LibXML: "This module is an interface to libxml2"
Haskell libxml: "Binding to libxml2"
libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
for the GNOME Libxml2 ..."

--with-libxml is the PostgreSQL configure option to make it use libxml2.

The very web page http://xmlsoft.org/index.html says "The XML C parser
and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

On 03/26/19 23:52, Tom Lane wrote:
> Do we need to mention that at all? If you're not building from source,
> it doesn't seem very interesting ... but maybe I'm missing some reason
> why end users would care.

The three places I've mentioned it were the ones where I thought users
might care:

- why are we stuck at XPath 1.0? It's what we get from the library we use.

- in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
a sequence type and you have order control. Observable behavior from
libxml2 (and you could certainly want to know this) is you get things out
in document order, whether that's what you wanted or not, even though
this is undocumented, and even counter-documented[1], libxml2 behavior.
So it's an example of something you would fundamentally like to know,
where the only available answer depends precariously on the library
we happen to be using.

- which limits in our implementation are inherent to the library, and
which are just current limits in our embedding of it? (Maybe this is
right at the border of what a user would care to know, but I know it's
a question that crosses my mind when I bonk into a limit I wasn't
expecting.)

> There are a few places where the parenthesis around a block of text
> seem unnecessary.

)blush( that's a long-standing wart in my writing ... seems I often think
in parentheses, then look and say "those aren't needed" and take them out,
only sometimes I don't.

I skimmed just now and found a few instances of parenthesized whole
sentence: the one you quoted, and some (if argument is null, the result
is null), and (No rows will be produced if ....). Shall I deparenthesize
them all? Did you have other instances in mind?

> It seems you are standardizing from "node set" to "nodeset", is that
> the preferred nomenclature or preference?

Another good catch. I remember consciously making a last pass to get them
all consistent, and I wanted them consistent with the spec, and I see now
I messed up.

XPath 1.0 [2] has zero instances of "nodeset", two of "node set" and about
six dozen of "node-set". The only appearances of "node set" without the
hyphen are in a heading and its ToC entry. The stuff under that heading
consistently uses node-set. It seems that's the XPath 1.0 term for sure.

When I made my consistency pass, I must have been looking too recently
in libxml2 C source, rather than the spec.

On 03/26/19 23:52, Tom Lane wrote:
> That seemed a bit jargon-y to me too. If that's standard terminology
> in the XML world, maybe it's fine; but I'd have stuck with "node set".

It really was my intention (though I flubbed it) to use XPath 1.0's term
for XPath 1.0's concept; in my doc philosophy, that gives readers
the most breadcrumbs to follow for the rest of the details if they want
them. "Node set" might be some sort of squishy expository concept I'm
using, but node-set is a thing, in a spec.

If you agree, I should go through and fix my nodesets to be node-sets.

I do think the terminology matters here, especially because of the
differences between what you can do with a node-set (XPath 1.0 thing)
and with a sequence (XPath 2.0+,XQuery,SQL/XML thing).

Let me know what you'd like best on these points and I'll revise the patch.

Regards,
-Chap

[1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
in no particular order"

[2] https://www.w3.org/TR/1999/REC-xpath-19991116/

Re: Fix XML handling with DOCTYPE

От

Alvaro Herrera

Дата:

27 марта 2019 г., 16:31:52

On 2019-Mar-27, Chapman Flack wrote:

> On 03/26/19 21:39, Ryan Lambert wrote:

> > Should this be clarified as the libxml2 library?  That's what I installed
> > to build postgres from source (Ubuntu 16/18).  If it is the libxml library
> > and the "2" is irrelevant
> 
> That's a good catch. I'm not actually sure whether there is any "libxml"
> library that isn't libxml2. Maybe there was once and nobody admits to
> hanging out with it. Most Google hits on "libxml" seem to be modules
> that have libxml in their names and libxml2 as their actual dependency.
> 
>   Perl XML:LibXML: "This module is an interface to libxml2"
>   Haskell libxml: "Binding to libxml2"
>   libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
>     for the GNOME Libxml2 ..."
> 
>   --with-libxml is the PostgreSQL configure option to make it use libxml2.
> 
>   The very web page http://xmlsoft.org/index.html says "The XML C parser
>   and toolkit of Gnome: libxml" and is all about libxml2.
> 
> So I think I was unsure what convention to follow, and threw up my hands
> and went with libxml. I could just as easily throw them up and go with
> libxml2. Which do you think would be better?

Daniel Veillard actually had libxml version 1 in that repository (mostly
of GNOME provenance, it seems, put together during some W3C meeting in
1998).  The version number changed to 2 sometime during year 2000.
Version 1 was mostly abandoned at that point, and for some reason
everyone keeps using "libxml2" as the name as though it was a different
thing from "libxml".  I suppose the latter name is just too generic, or
because they wanted to differentiate from the old (probably
incompatible API) code.
https://gitlab.gnome.org/GNOME/libxml2/tree/LIB_XML_1_BRANCH

Everyone calls it "libxml2" nowadays.  Let's just use that and avoid any
possible confusion.  If some libxml3 emerges one day, it's quite likely
we'll need to revise much more than our docs in order to use it.

> On 03/26/19 23:52, Tom Lane wrote:
> > Do we need to mention that at all?  If you're not building from source,
> > it doesn't seem very interesting ... but maybe I'm missing some reason
> > why end users would care.
> 
> The three places I've mentioned it were the ones where I thought users
> might care:

These seem relevant details.

> If you agree, I should go through and fix my nodesets to be node-sets.

+1

> [1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
>     in no particular order"

What this means is "we don't guarantee any specific order".  It's like a
query without ORDER BY: you may currently always get document order, but
if you upgrade the library one day, it's quite possible to get the nodes
in another order and you'll not get a refund.  So you (the user) should
not rely on the order, or at least be mindful that it may change in the
future.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

27 марта 2019 г., 19:35:22

On 3/27/19 9:31 AM, Alvaro Herrera wrote:
> Everyone calls it "libxml2" nowadays.  Let's just use that and avoid any
> possible confusion.  If some libxml3 emerges one day, it's quite likely
> we'll need to revise much more than our docs in order to use it.

That's persuasive to me. I'll change the references to say libxml2
and let a committer serve as tiebreaker.

>> [1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
>>     in no particular order"
> 
> What this means is "we don't guarantee any specific order".  It's like a
> query without ORDER BY: you may currently always get document order, but
> if you upgrade the library one day, it's quite possible to get the nodes
> in another order and you'll not get a refund.  So you (the user) should
> not rely on the order, or at least be mindful that it may change in the
> future.

Exactly. I called the behavior "counter-documented" to distinguish this
from the usual "undocumented" case, where you notice that a library is
behaving in a way you like, but its docs are utterly silent on the
matter, so you know you're going out on a limb to count on what you've
noticed.

In this case, you can notice the handy behavior but the doc *comes
right out and disclaims it* so if you count on it, you're going out
on a limb that has no bark left and looks punky.

And yet it seems worthwhile to mention how the library does in fact
seem to behave, because you might well be in the situation of porting
code over from SQL/XML:2006+ or XQuery or XPath 2+, or those are the
languages you've learned, so you may have order assumptions you've made,
and be surprised that XPath 1 doesn't let you make them, and at least
we can say "in a pinch, if you don't mind standing on this punky limb
here, you may be able to use the code you've got without having to
refactor every XMLTABLE() or xpath() into something wrapped in an
outer SQL query with ORDER BY. You just don't get your money back if
a later library upgrade changes the order."

The wiki page remembers[1] that I had tried some pretty gnarly XPath 1
queries to see if I could make libxml2 return things in a different
order, but no, got document order every time.

Regards,
-Chap

[1]
https://www.postgresql.org/message-id/5C465A65.4030305%40anastigmatix.net

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

28 марта 2019 г., 02:07:43

Hi,

xml-functions-type-docfix-5.patch attached, with node-sets instead of
nodesets, libxml2 instead of libxml, and parenthesized full sentences
now au naturel.

I ended up turning the formerly-parenthesized note about libxml2's
node-set ordering into a DocBook <note>: there is really something
parenthetical about it, with the official statement of node-set
element ordering being that there is none, and the description of
what the library happens to do being of possible interest, but set
apart, with the necessary caveats about relying on it.

Spotted and fixed a couple more typos in the process.

Regards,
-Chap

Вложения

xml-functions-type-docfix-5.patch

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

28 марта 2019 г., 02:27:23

On 03/27/19 19:07, Chapman Flack wrote:
> xml-functions-type-docfix-5.patch attached, with node-sets instead of
> nodesets, libxml2 instead of libxml, and parenthesized full sentences
> now au naturel.
> 
> I ended up turning the formerly-parenthesized note about libxml2's
> node-set ordering into a DocBook <note>: there is really something
> parenthetical about it, with the official statement of node-set
> element ordering being that there is none, and the description of
> what the library happens to do being of possible interest, but set
> apart, with the necessary caveats about relying on it.

I have just suffered a giant sinking feeling upon re-reading this
sentence in our XMLTABLE doc:

  A column marked FOR ORDINALITY will be populated with row numbers
  matching the order in which the output rows appeared in the original
  input XML document.

I've been skimming right over it all this time, and that right there is
a glaring built-in reliance on the observable-but-disclaimed iteration
order of a libxml2 node-set.

I'm a bit unsure what any clarifying language should even say.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

29 марта 2019 г., 02:45:24

On 03/27/19 19:27, Chapman Flack wrote:
>   A column marked FOR ORDINALITY will be populated with row numbers
>   matching the order in which the output rows appeared in the original
>   input XML document.
> 
> I've been skimming right over it all this time, and that right there is
> a glaring built-in reliance on the observable-but-disclaimed iteration
> order of a libxml2 node-set.

So, xml-functions-type-docfix-6.patch.

I changed that language to say "populated with row numbers, starting
with 1, in the order of nodes retrieved from the row_expression's
result node-set."

That's not such a terrible thing to have to say; in fact, it's the
*correct* description for the standard, XQuery-based, XMLTABLE (where
the language gives you control of the result sequence's order).

I followed that with a short note saying since XPath 1.0 doesn't
specify that order, relying on it is implementation-dependent, and
linked to the existing Appendix D discussion.

I would have like to link directly to the <listitem>, but of course
<xref> doesn't know what to call that, so I linked to the <sect3>
instead.

Regards,
-Chap

Вложения

xml-functions-type-docfix-6.patch

Re: Fix XML handling with DOCTYPE

От

Ryan Lambert

Дата:

30 марта 2019 г., 19:06:18

I applied and reviewed xml-functions-type-docfix-6.patch. Looks good to me.

I like the standardization (e.g. libxml2, node-set) and I didn't catch any spots that used the other versions. I agree that the <note> is appropriate for that block.

It also looks like you incorporated Alvaro's feedback about sorting, or the lack thereof.

Let me know if there's anything else I can do to help get this accepted. Thanks,

Ryan

On Thu, Mar 28, 2019 at 5:45 PM Chapman Flack <chap@anastigmatix.net> wrote:

On 03/27/19 19:27, Chapman Flack wrote:
> A column marked FOR ORDINALITY will be populated with row numbers
> matching the order in which the output rows appeared in the original
> input XML document.
>
> I've been skimming right over it all this time, and that right there is
> a glaring built-in reliance on the observable-but-disclaimed iteration
> order of a libxml2 node-set.

So, xml-functions-type-docfix-6.patch.

I changed that language to say "populated with row numbers, starting
with 1, in the order of nodes retrieved from the row_expression's
result node-set."

That's not such a terrible thing to have to say; in fact, it's the
*correct* description for the standard, XQuery-based, XMLTABLE (where
the language gives you control of the result sequence's order).

I followed that with a short note saying since XPath 1.0 doesn't
specify that order, relying on it is implementation-dependent, and
linked to the existing Appendix D discussion.

I would have like to link directly to the <listitem>, but of course
<xref> doesn't know what to call that, so I linked to the <sect3>
instead.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Tom Lane

Дата:

01 апреля 2019 г., 23:22:31

Chapman Flack <chap@anastigmatix.net> writes:
> So, xml-functions-type-docfix-6.patch.

Pushed with some light(?) copy-editing.

I believe this closes out everything discussed in

https://commitfest.postgresql.org/22/1872/

but I haven't gone through all three threads in detail.
Please confirm whether that CF entry can be closed or not.

            regards, tom lane

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

02 апреля 2019 г., 00:24:34

On 4/1/19 4:22 PM, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> So, xml-functions-type-docfix-6.patch.
> 
> Pushed with some light(?) copy-editing.
> 
> I believe this closes out everything discussed in
> 
> https://commitfest.postgresql.org/22/1872/
> 
> but I haven't gone through all three threads in detail.
> Please confirm whether that CF entry can be closed or not.

I think that does wrap up everything in the CF entry. Thanks!
And thanks for the copy-edits; they do read better than what
I came up with.

When I get a moment, I'll update the PostgreSQL vs. SQL/XML wiki page
to reflect the things that were fixed.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Alvaro Herrera

Дата:

02 апреля 2019 г., 00:34:41

On 2019-Apr-01, Chapman Flack wrote:

> When I get a moment, I'll update the PostgreSQL vs. SQL/XML wiki page
> to reflect the things that were fixed.

I think there were some outright bugs in the docs, at least for
XMLTABLE, that maybe we should backpatch.  If you have the energy to
cherry-pick a minimal doc update to 10/11, I offer to back-patch it.

Thanks everyone for taking care of this!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

02 апреля 2019 г., 01:09:38

On 04/01/19 17:34, Alvaro Herrera wrote:
> I think there were some outright bugs in the docs, at least for
> XMLTABLE, that maybe we should backpatch.  If you have the energy to
> cherry-pick a minimal doc update to 10/11, I offer to back-patch it.

I'll see what I can do. There's breathing room for that after the end of
the CF, right?

It seems to me that the conformance-appendix part is worth using,
along with all of the clarifications in datatype.sgml and func.sgml
except the ones clarifying fixed behavior, where the behavior fix
wasn't backpatched. That'll be where the cherry-picking effort lies.

Regards,
-Chap

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

03 августа 2019 г., 18:12:15

On 04/01/19 17:34, Alvaro Herrera wrote:
> I think there were some outright bugs in the docs, at least for
> XMLTABLE, that maybe we should backpatch.  If you have the energy to
> cherry-pick a minimal doc update to 10/11, I offer to back-patch it.

I don't know if this fits your intention for "minimal". What I've done
is taken the doc commit made by Tom for 12 (12d46a), then revised it
so it describes the unfixed behavior for the bugs whose fixes weren't
backpatched to 11 or 10.

I don't know if it's too late to get in the upcoming minor releases,
but maybe it can, if it looks ok, or the next ones, if that's too rushed.

11.patch applies cleanly to 11, 10.patch to 10.

I've confirmed the 11 docs build successfully, but without sgml tools,
I haven't confirmed that for 10.

Regards,
-Chap

Вложения

Re: Fix XML handling with DOCTYPE

От

Alvaro Herrera

Дата:

03 августа 2019 г., 19:15:06

On 2019-Aug-03, Chapman Flack wrote:

> I don't know if it's too late to get in the upcoming minor releases,
> but maybe it can, if it looks ok, or the next ones, if that's too rushed.

Hmm, I'm travelling back home from a conference the weekend, so yeah I
think it would be rushed for me to handle for the upcoming set.  But I
can look at it before the *next* set.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix XML handling with DOCTYPE

От

Chapman Flack

Дата:

06 сентября 2019 г., 01:06:18

Hi Alvaro,

On 08/03/19 12:15, Alvaro Herrera wrote:

>> I don't know if it's too late to get in the upcoming minor releases,
>> but maybe it can, if it looks ok, or the next ones, if that's too rushed.
> 
> Hmm, I'm travelling back home from a conference the weekend, so yeah I
> think it would be rushed for me to handle for the upcoming set.  But I
> can look at it before the *next* set.

Are these on your radar to maybe backpatch in this round of activity?

The latest patches I did for 11 and 10 are in
https://www.postgresql.org/message-id/5D45A44F.8010803%40anastigmatix.net

Cheers,
-Chap

Re: Fix XML handling with DOCTYPE

От

Alvaro Herrera

Дата:

12 сентября 2019 г., 23:34:55

Hi Chapman,

On 2019-Sep-05, Chapman Flack wrote:

> Are these on your radar to maybe backpatch in this round of activity?
> 
> The latest patches I did for 11 and 10 are in
> https://www.postgresql.org/message-id/5D45A44F.8010803%40anastigmatix.net

Thanks!  I just pushed these to those branches.

I think we're finally done with these.  Many thanks for your
persistence.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services