Is using the xml column type a good idea for queries that need to be fast?

Поиск

Список

Период

Сортировка

От	Denis Papathanasiou
Тема	Is using the xml column type a good idea for queries that need to be fast?
Дата	22 февраля 2013 г. 20:21:00
Msg-id	CAEpnaGwHPBU1G-PCihgVvP+xRLtaPMGgG4aQJRb=rnics2BmTQ@mail.gmail.com обсуждение исходный текст
Список	pgsql-sql

Дерево обсуждения

I was inspired by this article to store xml documents I want to query
in a single column:
http://robots.thoughtbot.com/post/13829210385/the-durable-document-store-you-didnt-know-you-had-but

But I'm finding it difficult to get the query plan to recognize or use
the indexes efficiently; I'm using the indexes he suggests for each
xpath value.

In my case they look like this:

CREATE INDEX ind_entity_name ON forms
USING GIN ( CAST(xpath('//primaryEntity/entityName/text()', data) as TEXT[]) );

CREATE INDEX ind_prior_entity_name ON forms
USING GIN ( CAST(xpath('//primaryEntity/entityPreviousNameList/value/text()',
data) as TEXT[]) );

The first xpath -- '//primaryEntity/entityName/text()' -- has only one
possible value, so when I query it like this, the explain seems to
show that the index is being used (the forms' table has 7,085 rows):

mydb=> explain select id from forms where
(xpath('//primaryEntity/entityName/text()', data))[1]::text ~*
'^banc';                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------Seq
Scanon forms  (cost=0.00..770.27 rows=11 width=16)  Filter: (((xpath('//primaryEntity/entityName/text()'::text, data,
 
'{}'::text[]))[1])::text ~* '^banc'::text)
(2 rows)

But the second xpath --
'//primaryEntity/entityPreviousNameList/value/text()' -- points to
multiple possible values within the xml document, so to make sure that
I search them all, I have to write the query like this; in this case,
it seems the index is *not* being used and it's doing a full table
scan:

mydb=> explain select entity.id from (select id,
xpath(''//primaryEntity/entityPreviousNameList/value/text()', data),
generate_subscripts(xpath(''//primaryEntity/entityPreviousNameList/value/text()',
data), 1) as s1 from forms) as entity where entity.xpath[s1]::text ~*
'^banc';                           QUERY PLAN
------------------------------------------------------------------Subquery Scan on entity  (cost=0.00..876.55 rows=11
width=16) Filter: ((entity.xpath[entity.s1])::text ~* '^banc'::text)  ->  Seq Scan on forms  (cost=0.00..787.99
rows=7085width=48)
 
(3 rows)

So is there a way to create an index which will help the performance
of this type of query?

Or is there a way of writing the second query differently that will
used the existing index better?

Or should I just normalize the data from the documents I want to query
into other tables, and use simpler joins?

В списке pgsql-sql по дате отправления:

Предыдущее

От: Russell Keane
Дата: 22 февраля 2013 г., 14:18:51
Сообщение: Re: need help

Следующее

От: denero team
Дата: 22 февраля 2013 г., 21:30:08
Сообщение: Re: need help

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Is using the xml column type a good idea for queries that need to be fast?

Предыдущее

Следующее