Multi-entry indexes (with a view to XPath queries)
От | John Gray |
---|---|
Тема | Multi-entry indexes (with a view to XPath queries) |
Дата | |
Msg-id | io28h9.nk1.ln@adzuki обсуждение исходный текст |
Ответы |
Re: Multi-entry indexes (with a view to XPath queries)
|
Список | pgsql-hackers |
Firstly, I appreciate this may be a hare-brained scheme, but I've been thinking about indexes in which the tuple pointer is not unique. The reason for my interest is storing XML documents in text fields in the database. (It could also help with particular kinds of full-text search?) I would like to be able to construct indexes on a collection of XML documents, based on the "value" of certain "fields" within the document. (In jargon terms, producing an index whose key is the CDATA content of a particular XML element). This could tie in with the Xpath and XQuery proposals A simplified example (from an archaeological site classification system): <site><name>Glebe Farm, Long Itchington</name><location scheme="osgb">SU41793684</location><feature> <type>Agricultural:StockControl</type> <date scheme="code">med</date></feature><feature> <type>Unassigned:Ditch</type> <size type="depth" unit="m">1.5</size></feature> </site> I'd like to produce an index on feature types so that I could type (roughly): SELECT siteid, xpath(doc,'//site/name'), xpath(doc,'//site/location') FROM documents WHERE xpath(doc,'feature/type') = 'Agricultural: Stock Control'; [create table documents (integer siteid, text doc)] Obviously I need to write a basic XML parser that can support such an xpath function, but it would also be good to index by the results of that function-i.e. to have an index containing feature type values. As each document could have any number of these instances, the number of index tuples would differ from the number of heap tuples. As far as I can see, there is no particular reason why a btree index could not be used[*]. However, vacuum.c makes assertions about number of index tuples == number of heap tuples. I realise this is a useful consistency check, but would it be possible to have a field in pg_index (indnoidentity?) that indicates that a given index hasn't got a 1:1 index:heap relationship. I have tried the approach of decomposing documents into cdata, element and attribute tables, and I can use joins to extract a list of feature types etc. (and could use triggers to update this) but the idea of not having to parse a document to enter it into the database and not requiring application logic to reconstruct it again seems a potential win for a system which might store complex documents but usually searches on limited criteria.
В списке pgsql-hackers по дате отправления: