Обсуждение: WIP - xmlvalidate implementation from TODO list

Поиск
Список
Период
Сортировка

WIP - xmlvalidate implementation from TODO list

От
Marcos Magueta
Дата:
Hello!

I am likely one of the few people still using XML, but I noticed XSD schema validation is still a TODO on postgresql, which I have some personal use cases for.

In this patch I attempt to implement XMLVALIDATE with the already in use libxml following a version I got my hands on of the SQL/XML standard of 2016.

In short, I had to add the ACCORDING word to comply with it and completely ignored the version that was already in the src that fetches arbitrary schemas (it refers to validations of dtds, which is another more troublesome TODO).

I had problems running the regression tests on my machine, so I could only test the feature by spawning a modified instance of postgresql and issuing queries through psql, therefore I am marking it as WIP. If anyone can assert the tests pass, I would be glad.

Also, this is my first patch, so I might have not followed standard practices as best as I could, so please pay particular attention to that on review. 

Cheers,
Marcos Magueta.
Вложения

Re: WIP - xmlvalidate implementation from TODO list

От
Kirill Reshke
Дата:
On Sun, 7 Dec 2025 at 04:38, Marcos Magueta <maguetamarcos@gmail.com> wrote:
>
> Hello!
>
> I am likely one of the few people still using XML, but I noticed XSD schema validation is still a TODO on postgresql,
whichI have some personal use cases for. 
>
> In this patch I attempt to implement XMLVALIDATE with the already in use libxml following a version I got my hands on
ofthe SQL/XML standard of 2016. 
>
> In short, I had to add the ACCORDING word to comply with it and completely ignored the version that was already in
thesrc that fetches arbitrary schemas (it refers to validations of dtds, which is another more troublesome TODO). 
>
> I had problems running the regression tests on my machine, so I could only test the feature by spawning a modified
instanceof postgresql and issuing queries through psql, therefore I am marking it as WIP. If anyone can assert the
testspass, I would be glad. 
>
> Also, this is my first patch, so I might have not followed standard practices as best as I could, so please pay
particularattention to that on review. 
>
> Cheers,
> Marcos Magueta.

HI!

1)
> + // Default case since nothing got returned
> + // out of the normal path for validation calls to libxml

PostgreSQL uses /**/ comments style.

2)
XML regression test suite fails, see attached. By the way, what are
your issues with running `make check` ?

3)
By the way, in [0] we have this

`
The function in PostgreSQL produces an “unimplemented” error, because
PostgreSQL does not have any implementation of the mechanism assumed
in the standard for registering schemas in advance, which is necessary
to address the security implications of a function that could refer to
schemas from arbitrary locations.
`

How does your patch resolve this? I did not find any change in this area

4)
Also I want to mention that without a doc, the patch is not in a
commitable shape.

5) I am a bit surprised by this grammar rule:

>  XMLVALIDATE '(' document_or_content a_expr ACCORDING TO XMLSCHEMA a_expr ')'

this allow a wide class of expressions accepted by parser, like

`SELECT xmlvalidate(DOCUMENT  (select oid from pg_class) ACCORDING TO
XMLSCHEMA (select 32)) AS is_valid FROM xml_validation_test;`

Is this expected? a_expr is way more than string constants and column
references..  If yes, the regression test that you added, does not
test this..


p.s. I failed to find in google SQL/XML standard of 2016. So, I cannot
double-check if this feature is compliant with it...

[0] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards

--
Best regards,
Kirill Reshke

Вложения

Re: WIP - xmlvalidate implementation from TODO list

От
Marcos Magueta
Дата:
Thank you for your kind review!

Before I continue with the implementation, I would like to address
your concerns and discuss it further and see if it's worth carrying
on.

1) Will do!

2) The issue is the following:
```
./../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. --bindir=     --dlpath=. --max-concurrent-tests=20  --schedule=./parallel_schedule  
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# could not exec "sh": No such file or directory
Bail out!# postmaster failed, examine
```

This is likely due to my setup on Nix. If any command assumes paths on
conventional *nix I am often in trouble. I am checking that with a
friend, but any insights are welcome. The out file I generated was
while being completely blindfolded.

By the way, the diff you sent is assuming global paths for some
reason, so I couldn't apply it without manually changing them.

3) From what I understand, that refers to ISO/IEC 9075-14:2016, chapter and section 11.6, page 245:
 - Parts of the grammar that reference an URI: <XML valid according to what> ::= <XML valid according to URI> | <XML valid according to identifier>
 - NO NAMESPACE, which unspecifies the qualification but can access something through a LOCATI ON
 - ID which should allow access to a registered schema
So they amount to:
  ACCORDING TO XMLSCHEMA URI <uri> [LOCATION <uri>]
  ACCORDING TO XMLSCHEMA NO NAMESPACE [LOCATION <uri>]
  ACCORDING TO XMLSCHEMA ID <registered_schema_name>

What I did is rely on the protection mechanisms that are already
implemented to just side-step the issue of arbitrary retrieval.

`PgXmlErrorContext * pg_xml_init(PgXmlStrictness strictness)` starts
the xml error context preventing any attempt by libxml2 to load an
external entity (DTD, XSD from URL, local file, etc.) returns an empty
string instead. Check around the line 1420 of xml.c:

```
/*
 * Also, install an entity loader to prevent unwanted fetches of external
 * files and URLs.
 */
errcxt->saved_entityfunc = xmlGetExternalEntityLoader();
xmlSetExternalEntityLoader(xmlPgEntityLoader);
```

So since I am relying on a TEXT for the schema, there should be no
issues of that sort. It does however cut part of the grammar that
handles locations, which is part of the standard, and would require
this feature to be much bigger in scope...

4) I suppose you refer to doc/src/sgml/func/func-xml.sgml. Will do

5) Hmm my intent was to simply handle TEXT on the xmlschema portion,
so the expr rule on that side is indeed an oversight. Now about the
first argument, that is just following the pattern already specified
in other xml functions, like XMLPARSE and XMLSERIALIZE, which have the
same <XML Value Expression> specified in the grammar. So that might
have alraedy diverged from the standard a while back...

This is currently grammatically valid, for example:
```
select xmlparse(DOCUMENT  (select oid from pg_class limit 1));
ERROR:  invalid XML document
DETAIL:  line 1: Start tag expected, '<' not found
2619
^
```

As a summary, it does not fully implement schemas as first-class
objects, as that would require extra parts of the grammar specified in
the standard, so I capitulated to use schemas as provided text. That's
already safe in my understanding given the shielding in place. If we
are to implement the rest, I think other serious concerns would arise,
like role management, how to store schemas, etc. And when it comes
to that, is it worth all the trouble just for xml? I would like this
feature and I think the solution of relying on text is decent, since
the cost of complying 100% seems very high for low returns.

Em dom., 7 de dez. de 2025 às 04:34, Kirill Reshke <reshkekirill@gmail.com> escreveu:
On Sun, 7 Dec 2025 at 04:38, Marcos Magueta <maguetamarcos@gmail.com> wrote:
>
> Hello!
>
> I am likely one of the few people still using XML, but I noticed XSD schema validation is still a TODO on postgresql, which I have some personal use cases for.
>
> In this patch I attempt to implement XMLVALIDATE with the already in use libxml following a version I got my hands on of the SQL/XML standard of 2016.
>
> In short, I had to add the ACCORDING word to comply with it and completely ignored the version that was already in the src that fetches arbitrary schemas (it refers to validations of dtds, which is another more troublesome TODO).
>
> I had problems running the regression tests on my machine, so I could only test the feature by spawning a modified instance of postgresql and issuing queries through psql, therefore I am marking it as WIP. If anyone can assert the tests pass, I would be glad.
>
> Also, this is my first patch, so I might have not followed standard practices as best as I could, so please pay particular attention to that on review.
>
> Cheers,
> Marcos Magueta.

HI!

1)
> + // Default case since nothing got returned
> + // out of the normal path for validation calls to libxml

PostgreSQL uses /**/ comments style.

2)
XML regression test suite fails, see attached. By the way, what are
your issues with running `make check` ?

3)
By the way, in [0] we have this

`
The function in PostgreSQL produces an “unimplemented” error, because
PostgreSQL does not have any implementation of the mechanism assumed
in the standard for registering schemas in advance, which is necessary
to address the security implications of a function that could refer to
schemas from arbitrary locations.
`

How does your patch resolve this? I did not find any change in this area

4)
Also I want to mention that without a doc, the patch is not in a
commitable shape.

5) I am a bit surprised by this grammar rule:

>  XMLVALIDATE '(' document_or_content a_expr ACCORDING TO XMLSCHEMA a_expr ')'

this allow a wide class of expressions accepted by parser, like

`SELECT xmlvalidate(DOCUMENT  (select oid from pg_class) ACCORDING TO
XMLSCHEMA (select 32)) AS is_valid FROM xml_validation_test;`

Is this expected? a_expr is way more than string constants and column
references..  If yes, the regression test that you added, does not
test this..


p.s. I failed to find in google SQL/XML standard of 2016. So, I cannot
double-check if this feature is compliant with it...

[0] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards

--
Best regards,
Kirill Reshke

Re: WIP - xmlvalidate implementation from TODO list

От
Marcos Magueta
Дата:
Hello again!

I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:

> Prevent access to external files/URLs via XML entity references.

> xml_parse() would attempt to fetch external files or URLs as needed to
> resolve DTD and entity references in an XML value, thus allowing
> unprivileged database users to attempt to fetch data with the privileges
> of the database server.  While the external data wouldn't get returned
> directly to the user, portions of it could be exposed in error messages
> if the data didn't parse as valid XML; and in any case the mere ability
> to check existence of a file might be useful to an attacker.
>
> The ideal solution to this would still allow fetching of references that
> are listed in the host system's XML catalogs, so that documents can be
> validated according to installed DTDs.  However, doing that with the
> available libxml2 APIs appears complex and error-prone, so we're not going
> to risk it in a security patch that necessarily hasn't gotten wide review.
> So this patch merely shuts off all access, causing any external fetch to
> silently expand to an empty string.  A future patch may improve this.

With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.

I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.

At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.
Вложения

Re: WIP - xmlvalidate implementation from TODO list

От
Marcos Magueta
Дата:
Hello again!

Is there any interest in this? I understand PostgreSQL has bigger fish to fry, but I would like to at least know; in case this was just forgotten.

Regards!

Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <maguetamarcos@gmail.com> escreveu:
Hello again!

I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:

> Prevent access to external files/URLs via XML entity references.

> xml_parse() would attempt to fetch external files or URLs as needed to
> resolve DTD and entity references in an XML value, thus allowing
> unprivileged database users to attempt to fetch data with the privileges
> of the database server.  While the external data wouldn't get returned
> directly to the user, portions of it could be exposed in error messages
> if the data didn't parse as valid XML; and in any case the mere ability
> to check existence of a file might be useful to an attacker.
>
> The ideal solution to this would still allow fetching of references that
> are listed in the host system's XML catalogs, so that documents can be
> validated according to installed DTDs.  However, doing that with the
> available libxml2 APIs appears complex and error-prone, so we're not going
> to risk it in a security patch that necessarily hasn't gotten wide review.
> So this patch merely shuts off all access, causing any external fetch to
> silently expand to an empty string.  A future patch may improve this.

With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.

I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.

At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.

Re: WIP - xmlvalidate implementation from TODO list

От
Kirill Reshke
Дата:


On Thu, 1 Jan 2026, 01:27 Marcos Magueta, <maguetamarcos@gmail.com> wrote:
Hello again!

Is there any interest in this? I understand PostgreSQL has bigger fish to fry, but I would like to at least know; in case this was just forgotten.

Regards!

Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <maguetamarcos@gmail.com> escreveu:
Hello again!

I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:

> Prevent access to external files/URLs via XML entity references.

> xml_parse() would attempt to fetch external files or URLs as needed to
> resolve DTD and entity references in an XML value, thus allowing
> unprivileged database users to attempt to fetch data with the privileges
> of the database server.  While the external data wouldn't get returned
> directly to the user, portions of it could be exposed in error messages
> if the data didn't parse as valid XML; and in any case the mere ability
> to check existence of a file might be useful to an attacker.
>
> The ideal solution to this would still allow fetching of references that
> are listed in the host system's XML catalogs, so that documents can be
> validated according to installed DTDs.  However, doing that with the
> available libxml2 APIs appears complex and error-prone, so we're not going
> to risk it in a security patch that necessarily hasn't gotten wide review.
> So this patch merely shuts off all access, causing any external fetch to
> silently expand to an empty string.  A future patch may improve this.

With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.

I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.

At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.


Hi!
First of all, please do not top post  🙏 . Use down-posting.

About general interest in feature - I suspect that we as a community generally interested in implementing items from TODO list. This feature also increases SQL standard compatibility. But I am myself not a big SQL/XML user, so I can only give limited review here. I also did not have much time last month. I will try to find my cycles to give another look here. 

Re: WIP - xmlvalidate implementation from TODO list

От
Marcos Magueta
Дата:


On 1 Jan 2026, at 05:25, Kirill Reshke <reshkekirill@gmail.com> wrote:



On Thu, 1 Jan 2026, 01:27 Marcos Magueta, <maguetamarcos@gmail.com> wrote:
Hello again!

Is there any interest in this? I understand PostgreSQL has bigger fish to fry, but I would like to at least know; in case this was just forgotten.

Regards!

Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <maguetamarcos@gmail.com> escreveu:
Hello again!

I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:

> Prevent access to external files/URLs via XML entity references.

> xml_parse() would attempt to fetch external files or URLs as needed to
> resolve DTD and entity references in an XML value, thus allowing
> unprivileged database users to attempt to fetch data with the privileges
> of the database server.  While the external data wouldn't get returned
> directly to the user, portions of it could be exposed in error messages
> if the data didn't parse as valid XML; and in any case the mere ability
> to check existence of a file might be useful to an attacker.
>
> The ideal solution to this would still allow fetching of references that
> are listed in the host system's XML catalogs, so that documents can be
> validated according to installed DTDs.  However, doing that with the
> available libxml2 APIs appears complex and error-prone, so we're not going
> to risk it in a security patch that necessarily hasn't gotten wide review.
> So this patch merely shuts off all access, causing any external fetch to
> silently expand to an empty string.  A future patch may improve this.

With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.

I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.

At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.


Hi!
First of all, please do not top post  🙏 . Use down-posting.

About general interest in feature - I suspect that we as a community generally interested in implementing items from TODO list. This feature also increases SQL standard compatibility. But I am myself not a big SQL/XML user, so I can only give limited review here. I also did not have much time last month. I will try to find my cycles to give another look here. 

Thank you very much for reaching back. Sorry about the bad e-mail etiquette, hopefully it’s corrected now.

About the patch, let me know if you find the time to review!

Thanks once again!

Re: WIP - xmlvalidate implementation from TODO list

От
Kirill Reshke
Дата:
On Fri, 2 Jan 2026 at 23:07, Marcos Magueta <maguetamarcos@gmail.com> wrote:
>
>
>
> On 1 Jan 2026, at 05:25, Kirill Reshke <reshkekirill@gmail.com> wrote:
>
>
>
> On Thu, 1 Jan 2026, 01:27 Marcos Magueta, <maguetamarcos@gmail.com> wrote:
>>
>> Hello again!
>>
>> Is there any interest in this? I understand PostgreSQL has bigger fish to fry, but I would like to at least know; in
casethis was just forgotten. 
>>
>> Regards!
>>
>> Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <maguetamarcos@gmail.com> escreveu:
>>>
>>> Hello again!
>>>
>>> I took some time to actually finish this feature. I think the answers
>>> for the previous questions are now clearer. I checked the
>>> initialization and the protections are indeed in place since commit
>>> a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:
>>>
>>> > Prevent access to external files/URLs via XML entity references.
>>>
>>> > xml_parse() would attempt to fetch external files or URLs as needed to
>>> > resolve DTD and entity references in an XML value, thus allowing
>>> > unprivileged database users to attempt to fetch data with the privileges
>>> > of the database server.  While the external data wouldn't get returned
>>> > directly to the user, portions of it could be exposed in error messages
>>> > if the data didn't parse as valid XML; and in any case the mere ability
>>> > to check existence of a file might be useful to an attacker.
>>> >
>>> > The ideal solution to this would still allow fetching of references that
>>> > are listed in the host system's XML catalogs, so that documents can be
>>> > validated according to installed DTDs.  However, doing that with the
>>> > available libxml2 APIs appears complex and error-prone, so we're not going
>>> > to risk it in a security patch that necessarily hasn't gotten wide review.
>>> > So this patch merely shuts off all access, causing any external fetch to
>>> > silently expand to an empty string.  A future patch may improve this.
>>>
>>> With that, the obvious affordance on the xmlvalidate implementation
>>> was to not rely on external schema sources on the host
>>> catalog. Therefore the implementation relies solely on expressions
>>> that necessarily evaluate to a schema in plain text.
>>>
>>> I added the requested documentation and a bunch of tests for each
>>> scenario. I would appreciate another round of reviews whenever someone
>>> has the time and patience.
>>>
>>> At last, to nourish the curiosity: I had issues with make check, as
>>> stated above on the e-mail thread. These got resolved when I changed
>>> `execl` to `execlp` on `pg_regress.c`. I of course did not commit
>>> such, but more people I know have had the very same issue while
>>> relying on immutable package managers.
>
>
>
> Hi!
> First of all, please do not top post  🙏 . Use down-posting.
>
> About general interest in feature - I suspect that we as a community generally interested in implementing items from
TODOlist. This feature also increases SQL standard compatibility. But I am myself not a big SQL/XML user, so I can only
givelimited review here. I also did not have much time last month. I will try to find my cycles to give another look
here.
>
>
> Thank you very much for reaching back. Sorry about the bad e-mail etiquette, hopefully it’s corrected now.
>
> About the patch, let me know if you find the time to review!
>
> Thanks once again!
>

I registered this thread in the commitfest application[0] to get more
attention from the community and also CF tests status.
It was already too late for january commitfest (PG-4), so this patch
is on PG19-Final.
As I understand you do not have any account, so please create one and
add yourself as a reviewer.

[0] https://commitfest.postgresql.org/patch/6372/

--
Best regards,
Kirill Reshke



Re: WIP - xmlvalidate implementation from TODO list

От
Andrey Borodin
Дата:

> On 2 Jan 2026, at 23:07, Marcos Magueta <maguetamarcos@gmail.com> wrote:
>
> About the patch, let me know if you find the time to review!

I was looking to review something on commitfest and decided to look into this patch.

Unfortunately, I cannot verify adherence to SQL standard. But I'll take it as granted, grammar changes are minimal.

I'm not a big XML user, but definitely there are a lot of use cases. E.g. If someone want to check whole database
againstnew schema - this feature would be invaluable. I can think of many different use cases. But I heard some
complaintsabout libxml. I'm not sure, but maybe at some point we would like to get rid of it? [0] 

The patch fails regression tests on Windows. See [1]. Regression taken from [2]
Meaningfull part is:
diff --strip-trailing-cr -U3 C:/cirrus/src/test/regress/expected/xml_1.out
C:/cirrus/build/testrun/regress/regress/results/xml.out
--- C:/cirrus/src/test/regress/expected/xml_1.out 2026-01-03 19:13:07.092850000 +0000
+++ C:/cirrus/build/testrun/regress/regress/results/xml.out 2026-01-03 19:17:23.497562500 +0000
@@ -1496,3 +1496,278 @@
LINE 1: SELECT xmltext('x'|| '<P>73</P>'::xml || .42 || true || 'j':...
^
DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlvalidate(DOCUMENT '<person><name>John</name><age>30</age></person>'
+ ACCORDING TO XMLSCHEMA '<?xml version="1.0"?>
....

So you need to update src/test/regress/expected/xml_1.out for systems without libxml.

You use PG_TRY(); and return from that block. I found no other cases of returning without PG_END_TRY(), looks like it
isnot supported. 

xmloption is document_or_content. But xmlvalidate_text_schema() always validates as a document. IDK, maybe it's
correct,or maybe it works by accident. 

+#else
+    NO_XML_SUPPORT();
+    return NULL;
+#endif
This NULL is returned from bool function xmlvalidate_text_schema(). I know it's unreachable, but let's return false or
true.

Also, single-line comments worth converting to our usual C comments. The patch could benefit from pgindent.

That's all what I could find for now. Thanks for working on this!


Best regards, Andrey Borodin.


[0] https://www.postgresql.org/message-id/flat/aUK8aBluNzMZTatU%40momjian.us
[1] https://api.cirrus-ci.com/v1/artifact/task/5580601438240768/testrun/build/testrun/regress/regress/regression.diffs
[2] https://cirrus-ci.com/task/5580601438240768


Re: WIP - xmlvalidate implementation from TODO list

От
Jim Jones
Дата:
Hi!

On 04/01/2026 10:46, Andrey Borodin wrote:
> So you need to update src/test/regress/expected/xml_1.out for systems without libxml.

+1

There is also a xml_2.out. I don't really remember why it exists, but in
your case copying the changes from xml.out will most likely do the trick.

I am wondering whether passing the XML schema in every query is really
the right approach here. My main concern is that making the schema fully
user-controlled at execution time could lead to unwanted CPU or memory
usage -- not to mention the usability aspect of it. A catalog-level
schema registration mechanism would IMHO be a better fit, and it's also
supported by the SQL/XML standard:

"11.5 <XML valid according to clause>

Format

...
<XML valid according to identifier> ::= ID <registered XML schema name>
...

Syntax Rules

1) If <XML valid according to identifier> is specified, then the
<registered XML schema name> shall identify a registered XML schema RXS...
"

DB2 handles this similarly via REGISTER XMLSCHEMA.[1]

Thanks for working on this!

Best, Jim

[1] https://www.ibm.com/docs/en/db2/11.5.x?topic=commands-register-xmlschema




Re: WIP - xmlvalidate implementation from TODO list

От
Marcos Magueta
Дата:
Thank you all for the careful review!

I'll go through the topics to fix the test and code changes today, but I have a couple of questions about a catalog.

If we were to implement a catalog, I believe it would be either copying an insert to a specified relation (created on demand) or to something in the catalog, like pg_xmlschema. That could be a realistic change I could work on. But what about the privilege level and file fetch support? I believe it's not really an issue if the user is sufficiently privileged, so should it mirror COPY FROM? I haven't seen its implementation, but I suppose it already has security checks at the user privilege level. A valid alternative to not deal with privileges and to leave the same restrictions already in place to fetch arbitrary extensions to a specified schema; in that way we are just moving the schema definition to another command before being invoked and ignoring if it has any references outside of the plain text specified (therefore, not using file://, like IBM, just text).

Surprisingly, the standard (I only have the 2016 here) leaves a great room for freedom on how to implement the registration. It just specifies what it should have:

An XML namespace NS contained in a registered XML Schema is non-deterministic if NS contains a global
element declaration schema component that is non-deterministic.
A registered XML Schema is non-deterministic if it contains a non-deterministic XML namespace.
A registered XML Schema is described by a registered XML Schema descriptor. A registered XML Schema
descriptor includes:
— The target namespace URI of the registered XML Schema.
— The schema location URI of the registered XML Schema.
— The <registered XML Schema name> of the registered XML Schema.
— An indication of whether the registered XML Schema is permanently registered.
— An indication of whether the registered XML Schema is non-deterministic.
— An unordered collection of the namespaces defined by the registered XML Schema (the target namespace
is one of these namespaces).
— For each namespace defined by the registered XML Schema, an unordered collection of the global element
declaration schema components in that namespace, with an indication for each global element declaration
schema component whether that global element declaration schema component is non-deterministic.
NOTE 9 — Without Feature X161, “Advanced Information Schema for registered XML Schemas”, information whether an XML
Schema is deterministic, information about the collection of namespaces defined in that XML Schema, and, for each such namespace
information about the global element declaration schema components in that namespace, is not available in the XML_SCHEMAS,
XML_SCHEMA_NAMESPACES, and XML_SCHEMA_ELEMENTS views.
A registered XML Schema is identified by its <registered XML Schema name>.

I am tempted to go with a pg_xmlschema definition on the catalog and an interface like the one IBM has, but still restricting file access. Dealing with the security problems for that sounds excruciating. Any opinions?

Regards, Magueta.

Re: WIP - xmlvalidate implementation from TODO list

От
Jim Jones
Дата:
Hi Marcos

On 05.01.26 18:49, Marcos Magueta wrote:
> I am tempted to go with a pg_xmlschema definition on the catalog and an
> interface like the one IBM has, but still restricting file access.
> Dealing with the security problems for that sounds excruciating. Any
> opinions?


Perhaps we need to first agree on some foundational aspects such as
design and syntax before going deeper into the code.

== return type ==

Your proposal returns a boolean, but the SQL/XML standard specifies
otherwise. In 6.21 <XML validate>:

"General Rule 8)
The result of <XML validate> is R."

where R is constructed as an XQuery sequence of nodes:

"General Rule 7)
Let R be an XQuery sequence enumerated by Rⱼ, 1 ≤ j ≤ N."

This may sound surprising at first glance, but it enables useful
patterns such as:

INSERT INTO t (c)
VALUES (XMLVALIDATE(x ACCORDING TO XMLSCHEMA s));

SELECT XMLSERIALIZE(
  XMLVALIDATE(x ACCORDING TO XMLSCHEMA s)
  AS text
);

In this model, validation failure is signaled via an error condition,
not by returning false.

== registered XML schemas ==

AFAICT the standard does not mandate any particular syntax for
registering XML schemas, so we are not required to implement REGISTER
XMLSCHEMA. Also, registered XML schemas must also be manageable objects,
which should be reflected in the proposed syntax. For example:

CREATE XMLSCHEMA foo AS '... XSD text ...';
CREATE XMLSCHEMA foo FROM file;

DROP XMLSCHEMA foo;

ALTER XMLSCHEMA foo RENAME TO bar;
ALTER XMLSCHEMA foo OWNER TO u;
ALTER XMLSCHEMA foo ADD '... new value ...';

and so on...

== permissions ==

Schema registration and usage should be privilege-controlled, for
example via dedicated roles:

GRANT pg_read_xmlschemas TO u;
GRANT pg_write_xmlschemas TO u;
...

After we have the XML schema management aspects figured out, we can move
on to XMLVALIDATE itself.

These are just my opinions. Let's also hear what the other reviewers
have to say before you start working on a v2.

Best, Jim



Re: WIP - xmlvalidate implementation from TODO list

От
Marcos Magueta
Дата:
Hey Jim!

On 06.01.26, Jim Jones <jim.jones@uni-muenster.de> wrote:
> The result of <XML validate> is R.
That was an oversight on my behalf, I had a hard time understanding the standard, but now the validation of DOCUMENT and CONTENT being accepted makes more sense.

The current patch has some issues.

>  xmloption is document_or_content. But xmlvalidate_text_schema() always validates as a document.
As Andrey noticed, we should indeed support both a document and content. Which entails into an iterative validation (for each node provided) on content mode, so I should likely add the xmloption back. The fact it worked with the example I created was actually luck.

Also, I am not sure if some variables used inside of the PG_TRY are memory safe -- notice that none right now is set to volatile, despite being accessed in different parts of the block; other functions in xml.c do handle such correctly it seems (like xml_parse).

About the syntax proposal by Jim, I have no problems with complying to it. It does increase considerably the scope from what I originally intended, but that's the price to have something actually nice.

I can think of several useful extensions we could consider in a further implementation:

Schema Dependencies/Imports
CREATE XMLSCHEMA base AS '...';
CREATE XMLSCHEMA extended
  IMPORTS base
  AS '...';

Schema Versioning
CREATE XMLSCHEMA patient VERSION '1.0' AS '...';
CREATE XMLSCHEMA patient VERSION '2.0' AS '...';
XMLVALIDATE(doc ACCORDING TO XMLSCHEMA patient VERSION '2.0')

Custom Error Messages
CREATE XMLSCHEMA patient
  AS '...'
  ERROR MESSAGE 'Patient record does not match schema v2.0';

Schema inference from samples (if the lib supports it, that is)
CREATE XMLSCHEMA patient
    INFER FROM (SELECT data FROM patient_samples);

And much more, but perhaps that's already too ambitious for a first version.

I'll wait for the others to ring their bells.

Regards, Magueta.