Re: XML and postgres
От | John Gray |
---|---|
Тема | Re: XML and postgres |
Дата | |
Msg-id | 1054506966.1456.25.camel@adzuki.azuli.co.uk обсуждение исходный текст |
Ответ на | XML and postgres (Oleg Bartunov <oleg@sai.msu.su>) |
Список | pgsql-hackers |
On Fri, 2003-05-30 at 17:06, Oleg Bartunov wrote: > Hello, > > Is there interest to storing and indexed access methods for xml in > postgresql ? While I don't use xml in my applications but I see Yes. But my GiST skills were never up to it! > possible directions to develop contrib module with indexed access methods > to xml-like data type. We have already contrib/ltree for tree-like > structures and recently we developed (not released yet) hstore module, > which implements hash data type like in perl with indexed AM to keys, values. > Motivation for this modules is need to store data with weak structure > (semi-structured data), i.e. we have several obligatory fields and a bunch > of optional data. Obligatory fields could be stored as usual, while > for optional columns we use special data type - hstore, which serves as > a storage of (key,value) pairs. There are could be many (key,value) pairs and > hstore provides AM to them. We've realized that combination of > ltree, hstore could be used for xml. This is a possible way to approach it, but because my interest has come from the direction of doing XPath queries, I'm particularly interested in how you can efficiently index XML documents for use with XPath. The arrival of expressional indexes is good because you can now optimise select article_xml from table where xpath(article_xml,'/article/author[@surname]') = 'Smith'; which is great for single-valued expressions, but it's quite possible to have a multi-valued result, for which we want a different operator: select article_xml from table where xpath(article_xml,'/article/author[@surname]') =# 'Smith'; which returns the case where one of the values is Smith. Now, this is the case that interests me for indexing - how feasible is an index that records the fact that xpath(...) as above is multi-valued and has (say) three entries, and records them in the appropriate places in the index. 2. We could alternatively say "We want to optimise XPath queries" - and produce an index 'tree' which enumerated possible paths. From a root node, we would have an edge labelled 'descendant' to 'article' and so on (the XPath spec describes this model of document paths in great detail. Obviously, not all queries are rooted to the top of the hierarchy, which means that you might end up with an index with 'bypass' nodes - making it not a binary tree (in other words, I'm not sure it could actually work...). This is all speculative stuff from the perspective of someone who tried to get to grips with GiST and struggled! Obviously this kind of index would require some integration with XPath query support - which at present is (in contrib/xml) just passed over to libxml to do. In test cases it's proved to be sufficiently fast even doing full parsing, and I suspect that the index technique above could help. Why Xpath? Well, it seems to be getting adopted more widely (it is also part of XSLT) and it represents a fairly easy syntax for navigating hierarchical XML documents. > > We have no spare time to elaborate this, so if someone could work on > this, we could provide hstore module and help with developing. > > Oh, I sympathise completely. I have been able to do a little more work on contrib/xml for a company in my spare time (work which I hope we'll be able to release back) but my day job has no XPath in it now! (I am now a broadcast engineer, rather than an IT consultant...) Regards John -- John Gray <jgray@azuli.co.uk>
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Francisco Figueiredo Jr."Дата:
Сообщение: initdb problems handling super username with spaces.