Обсуждение: XML schemas and PG column names

Поиск
Список
Период
Сортировка

XML schemas and PG column names

От
Andrew Dunstan
Дата:
I'm doing some work with the output of query_to_xml_and_xmlschema(). The 
output is a bit unfortunate in my opinion when the column names in the 
query are not legal XML names. We quite reasonably translate the names 
so that legal XML names result, but we don't actually put the original 
name anywhere in the output, meaning that the end processor has some 
work to do to recover the original. Here are two snippets from the 
output when the column names are "z" and "25 %ile":
    <xsd:complexType name="RowType">      <xsd:sequence>        <xsd:element name="z" type="INTEGER"
nillable="true"></xsd:element>       <xsd:element name="_x0032_5_x0020__x0025_ile" type="INTEGER"
nillable="true"></xsd:element>     </xsd:sequence>    </xsd:complexType>
 
    <row>      <z>1</z>      <_x0032_5_x0020__x0025_ile>2</_x0032_5_x0020__x0025_ile>    </row>

Of course, we can recover the original name by using something like perl 
to do operations like this:
   $column_name =~ s/_x([[:xdigit:]]{4})_/pack("U",hex($1))/ge;

but that's ugly and not as simply available in many XSL processors (I am 
using XSL to transform the XML.)

I propose that we annotate the schema section RowType elements with the 
original names, so we would have something like this in the schema section:
   <xsd:complexType name="RowType"   xmlns:pg="http://www.postgresql.org/schemas/column-names">      <xsd:sequence>
  <xsd:element name="z" type="INTEGER" nillable="true">                 <xsd:annotation>                <xsd:appinfo>
                <pg:column-name>z</pg:column-name>               </xsd:appinfo>            </xsd:annotation>*   *
</xsd:element>       <xsd:element name="_x0032_5_x0020__x0025_ile" type="INTEGER"   nillable="true">
<xsd:annotation>                   <xsd:appinfo>                        <pg:column-name>25 %ile</pg:column-name>
          </xsd:appinfo>            *    *</xsd:annotation>       </xsd:element>      </xsd:sequence>
</xsd:complexType>

While it might be a bit longwinded, it's not going to add to the per-row 
output, just the schema section.

Thoughts?

cheers

andrew




Re: XML schemas and PG column names

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> I propose that we annotate the schema section RowType elements with the 
> original names, so we would have something like this in the schema section:

1. Is that legal per the SQL/XML spec?

2. What happens when the column name contains characters that would have
to be escaped, such as "<" --- haven't you just replaced one de-escaping
problem with another?
        regards, tom lane


Re: XML schemas and PG column names

От
Andrew Dunstan
Дата:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I propose that we annotate the schema section RowType elements with the 
>> original names, so we would have something like this in the schema section:
>>     
>  
> 1. Is that legal per the SQL/XML spec?
>   

It is certainly legal per XML and XSD specs, and the SQL/XML spec has 
annotations using appinfo elements. It would be rather surprising if the 
SQL/XML spec forbade annotations such as I propose. The spec is 
mind-bogglingly impenetrable, though. Perhaps Peter or Nicholas might know.

> 2. What happens when the column name contains characters that would have
> to be escaped, such as "<" --- haven't you just replaced one de-escaping
> problem with another?
>
>             
>   

No. say the name is "foo & bar < baz". Then the annotation would be:
   <pg:column-name>foo & bar < baz</pg:column-name>

But the difference is that the XML processor will automatically unescape 
this value (and re-escape it on output if necessary). The user won't 
have to do anything (or shouldn't if their XML processor is worth 
anything at all). So in a stylesheet, I'd be able to do something like:
   <xsl:for-each select="//[complexType[@name="RowType"]//pg:column-name">      <th><xsl:value-of select="." /></th>
</xsl:for-each>

and it would just Do The Right Thing. (If we didn't want the output 
re-escaped, say when the otuput format was not XML or HTML, we could 
make it do that too).

cheers

andrew



Re: XML schemas and PG column names

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> 2. What happens when the column name contains characters that would have
>> to be escaped, such as "<" --- haven't you just replaced one de-escaping
>> problem with another?

> But the difference is that the XML processor will automatically unescape 
> this value (and re-escape it on output if necessary). The user won't 
> have to do anything (or shouldn't if their XML processor is worth 
> anything at all).

OK, so your argument is that this is a standard escaping rule and the
one in the SQL standard is, um, not standard.  I wonder why the SQL
committee felt compelled to invent their own, then?
        regards, tom lane


Re: XML schemas and PG column names

От
Andrew Dunstan
Дата:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> 2. What happens when the column name contains characters that would have
>>> to be escaped, such as "<" --- haven't you just replaced one de-escaping
>>> problem with another?
>>>       
>
>   
>> But the difference is that the XML processor will automatically unescape 
>> this value (and re-escape it on output if necessary). The user won't 
>> have to do anything (or shouldn't if their XML processor is worth 
>> anything at all).
>>     
>
> OK, so your argument is that this is a standard escaping rule and the
> one in the SQL standard is, um, not standard.  I wonder why the SQL
> committee felt compelled to invent their own, then?
>
>             
>   

They are two different things. An XML-escaped text value is by no means 
necessarily a legal XML tag name (e.g. an XML name can't have spaces). 
Possibly what they really did wrong was to try to map SQL column names 
to XML tags at all. It might have been better to do something like:
   <column name="foo & bar">some value</column>

instead of what we produce, which I assume is in the standard:
   <foo_x0020__x0026__x0020_bar>somevalue</foo_x0020__x0026__x0020_bar>

which I think is just plain ugly. OTOH, then it would have been far 
harder (maybe impossible) to create an XML schema for such a mechanism, 
so I assume that's why they did it this way.

Anyway, It would be nice to have a way of providing the non-mangled 
names - I think what I have suggested should meet the case.

cheers

andrew



Re: XML schemas and PG column names

От
Peter Eisentraut
Дата:
On lör, 2009-12-12 at 11:51 -0500, Andrew Dunstan wrote:
> It is certainly legal per XML and XSD specs, and the SQL/XML spec has 
> annotations using appinfo elements. It would be rather surprising if
> the 
> SQL/XML spec forbade annotations such as I propose. The spec is 
> mind-bogglingly impenetrable, though. Perhaps Peter or Nicholas might
> know.

I think we can of course add our own annotations.  It would be good to
go through the SQL/XML standard document and check what style they use
for their annotations so that we can structure and name ours similarly
and have room for future work, in case someone also wants annotations
for table names, schema names, etc. (Or was that part of your project as
well?)



Re: XML schemas and PG column names

От
Andrew Dunstan
Дата:

Peter Eisentraut wrote:
> On lör, 2009-12-12 at 11:51 -0500, Andrew Dunstan wrote:
>   
>> It is certainly legal per XML and XSD specs, and the SQL/XML spec has 
>> annotations using appinfo elements. It would be rather surprising if
>> the 
>> SQL/XML spec forbade annotations such as I propose. The spec is 
>> mind-bogglingly impenetrable, though. Perhaps Peter or Nicholas might
>> know.
>>     
>
> I think we can of course add our own annotations.  It would be good to
> go through the SQL/XML standard document and check what style they use
> for their annotations so that we can structure and name ours similarly
> and have room for future work, in case someone also wants annotations
> for table names, schema names, etc. (Or was that part of your project as
> well?)
>
>   

Well, the standard has an element specifically for annotations 
concerning certain objects: sqlxml:sqlname. However, I am not sure if it 
can be used in this context. Can you try reading the standard 
(<http://www.sqlx.org/SQL-XML-documents/5FCD-14-XML-2004-07.pdf>) and 
tell me? :-) If it's not comprehended by the standard then we should at 
least use a different namespace, and probably a different element name. 
The style can be made to match, though.

I certainly think we could do more of this, although the column names 
are what matter to me right now. We can also do it bit by bit.

cheers

andrew