Merlin Moncure wrote:
> On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus@comcast.net> wrote:
>> One of the problem with shredding XML is that it is very kludgy to get a
>> scalar value back from xpath. The xpath function always returns an array of
>> XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
>
> I just happen to be dealing with XML right now as well and my initial
> thought is that your suggestion doesn't buy you a whole lot: the root
> problem IMO is not dealing with what xpath gives you but that there is
> no DOMish representation of the xml document for you to query. You
> have to continually feed the entire document to xpath which is
> absolutely not scalable (if it works the way I think it does --
> haven't looked at the code).
No typically you'll only be passing the xml for a single "row" so what
we end up doing in Postgres typically looks something like this:
SELECT xmlvalue('/row/@id', bitesizexml)::int AS id, xmlvalue('/row/@lat', bitesizexml)::numeric AS lat,
xmlvalue('/row/@lon',bitesizexml)::numeric, xmlvalue('/row/comment', bitesizexml) AS cmt
FROM ( SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml
) sub
So only the one call has to work with the entire document. All the calls
to xmlvalue are passed a much smaller node to work with.
> xpath is great for simple things but it's too texty and you need a> more robust API to handle documents for serious
parsingon the> backend. In the short term i'd advise doing work in another pl like> perl.
This is basically the method used for Oracle too until they provided
XMLTable functionality. They had a function xmlsequence that basically
did the unnest(xpath()) part. Hopefully we'll get xmltable support soon.
Scott