Обсуждение: Slow select times on select with xpath

Поиск
Список
Период
Сортировка

Slow select times on select with xpath

От
astro77
Дата:
I've got a table set up with an XML field that I would like to search on with
2.5 million records. The xml are serialized objects from my application
which are too complex to break out into separate tables. I'm trying to run a
query similar to this:

    SELECT  serialized_object as outVal
     from object  where
    (
    array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
serialized_object,
             ARRAY
             [
             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']

             ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'

    )
    limit 1000;

I've also set up an index on the xpath query like this...

CREATE INDEX concurrently
idx_object_nodeid
ON
object
USING
btree(

 cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object,
             ARRAY
             [
             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']

             ])as text[])
);

The query takes around 30 minutes to complete with or without the index in
place and does not cache the query. Additionally the EXPLAIN say that the
index is not being used. I've looked everywhere but can't seem to find solid
info on how to achieve this. Any ideas would be greatly appreciated.
--
View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25259351.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Slow select times on select with xpath

От
"Kevin Grittner"
Дата:
astro77 <astro_coder@yahoo.com> wrote:

> I've got a table set up with an XML field that I would like to search
on
> with
> 2.5 million records. The xml are serialized objects from my
application
> which are too complex to break out into separate tables. I'm trying
to run a
> query similar to this:
>
>     SELECT  serialized_object as outVal
>      from object  where
>     (
>
    array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
> serialized_object,
>              ARRAY
>              [
>              ARRAY['a',
'http://schemas.datacontract.org/2004/07/Objects'],
>              ARRAY['b',
'http://schemas.datacontract.org/2004/07/Security']
>
>              ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
>
>     )
>     limit 1000;

I would try to minimize how many XML values it had to read, parse, and
search.  The best approach that comes to mind would be to use tsearch2
techniques (with a GIN or GiST index on the tsvector) to identify
which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
to combine that with your xpath search.

-Kevin

Re: Slow select times on select with xpath

От
Robert Haas
Дата:
On Wed, Sep 2, 2009 at 11:04 AM, astro77<astro_coder@yahoo.com> wrote:
>
> I've got a table set up with an XML field that I would like to search on with
> 2.5 million records. The xml are serialized objects from my application
> which are too complex to break out into separate tables. I'm trying to run a
> query similar to this:
>
>        SELECT  serialized_object as outVal
>         from object  where
>        (
>        array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
> serialized_object,
>             ARRAY
>             [
>             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
>             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
>
>             ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
>
>        )
>        limit 1000;
>
> I've also set up an index on the xpath query like this...
>
> CREATE INDEX concurrently
> idx_object_nodeid
> ON
> object
> USING
> btree(
>
>  cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object,
>             ARRAY
>             [
>             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
>             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
>
>             ])as text[])
> );
>
> The query takes around 30 minutes to complete with or without the index in
> place and does not cache the query. Additionally the EXPLAIN say that the
> index is not being used. I've looked everywhere but can't seem to find solid
> info on how to achieve this. Any ideas would be greatly appreciated.

Why do you have a cast in the index definition?

...Robert

Re: Slow select times on select with xpath

От
astro77
Дата:
I was receiving an error that an XML field does not support the various
indexes available in postgresql. Is there an example of how to do this
properly?


Robert Haas wrote:
>
> On Wed, Sep 2, 2009 at 11:04 AM, astro77<astro_coder@yahoo.com> wrote:
>>
>> I've got a table set up with an XML field that I would like to search on
>> with
>> 2.5 million records. The xml are serialized objects from my application
>> which are too complex to break out into separate tables. I'm trying to
>> run a
>> query similar to this:
>>
>>        SELECT  serialized_object as outVal
>>         from object  where
>>        (
>>      
>>  array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
>> serialized_object,
>>             ARRAY
>>             [
>>             ARRAY['a',
>> 'http://schemas.datacontract.org/2004/07/Objects'],
>>             ARRAY['b',
>> 'http://schemas.datacontract.org/2004/07/Security']
>>
>>             ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
>>
>>        )
>>        limit 1000;
>>
>> I've also set up an index on the xpath query like this...
>>
>> CREATE INDEX concurrently
>> idx_object_nodeid
>> ON
>> object
>> USING
>> btree(
>>
>>  cast(xpath('/a:root/a:Identification/b:ObjectId/text()',
>> serialized_object,
>>             ARRAY
>>             [
>>             ARRAY['a',
>> 'http://schemas.datacontract.org/2004/07/Objects'],
>>             ARRAY['b',
>> 'http://schemas.datacontract.org/2004/07/Security']
>>
>>             ])as text[])
>> );
>>
>> The query takes around 30 minutes to complete with or without the index
>> in
>> place and does not cache the query. Additionally the EXPLAIN say that the
>> index is not being used. I've looked everywhere but can't seem to find
>> solid
>> info on how to achieve this. Any ideas would be greatly appreciated.
>
> Why do you have a cast in the index definition?
>
> ...Robert
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>

--
View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25283175.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Slow select times on select with xpath

От
Robert Haas
Дата:
On Thu, Sep 3, 2009 at 4:06 PM, astro77<astro_coder@yahoo.com> wrote:
> I was receiving an error that an XML field does not support the various
> indexes available in postgresql.

Please post what happens when you try.

> Is there an example of how to do this
> properly?

Not sure.

...Robert

Re: Slow select times on select with xpath

От
astro77
Дата:
CREATE INDEX CONCURRENTLY idx_serializedxml
   ON "object" (serialized_object ASC NULLS LAST);

yields the error:
ERROR: data type xml has no default operator class for access method "btree"

The same error occurs when I try to use the other access methods as well.


On Thu, Sep 3, 2009 at 4:06 PM, astro77<astro_coder@yahoo.com> wrote:
> I was receiving an error that an XML field does not support the various
> indexes available in postgresql.

Please post what happens when you try.


--
View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530433.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Slow select times on select with xpath

От
astro77
Дата:
Thanks Kevin. I thought about using tsearch2 but I need to be able to select
exact values on other numerical queries and cannot use "contains" queries.
It's got to be fast so I cannot have lots of records returned and have to do
secondary processing on the xml for the records which contain the exact
value I'm looking for. This is one of the reasons I moved from using Lucene
for searching. I hope this makes sense.


Kevin Grittner wrote:
>  wrote:
>
>
> I would try to minimize how many XML values it had to read, parse, and
> search.  The best approach that comes to mind would be to use tsearch2
> techniques (with a GIN or GiST index on the tsvector) to identify
> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
> to combine that with your xpath search.
>
> -Kevin
>
>
>

--
View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530439.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Slow select times on select with xpath

От
astro77
Дата:
As a follow-up, when I try to create the index like this...

CREATE INDEX concurrently
idx_object_nodeid2
 ON
 object
 USING
 btree(
 xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object,
             ARRAY
             [
             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
             ])
 ) ;

The index begins to build but fails after about 90 seconds with this error:

ERROR: could not identify a comparison function for type xml
SQL state: 42883



Robert Haas wrote:
>
> On Thu, Sep 3, 2009 at 4:06 PM, astro77<astro_coder@yahoo.com> wrote:
>> I was receiving an error that an XML field does not support the various
>> indexes available in postgresql.
>
> Please post what happens when you try.
>
>> Is there an example of how to do this
>> properly?
>
> Not sure.
>
> ...Robert
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>

--
View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530455.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Slow select times on select with xpath

От
Ron Mayer
Дата:
astro77 wrote:
> Thanks Kevin. I thought about using tsearch2 but I need to be able to select
> exact values on other numerical queries and cannot use "contains" queries.

You might be able to make use of a custom parser for tsearch2 that creates
something like a single "word" for xml fragments like <whatever>1</whatever>
which would let you quickly find exact matches for those words/phrases.

> It's got to be fast so I cannot have lots of records returned and have to do
> secondary processing on the xml for the records which contain the exact
> value I'm looking for. This is one of the reasons I moved from using Lucene
> for searching. I hope this makes sense.
>
>
> Kevin Grittner wrote:
>>  wrote:
>>
>>
>> I would try to minimize how many XML values it had to read, parse, and
>> search.  The best approach that comes to mind would be to use tsearch2
>> techniques (with a GIN or GiST index on the tsvector) to identify
>> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
>> to combine that with your xpath search.
>>
>> -Kevin
>>
>>
>>
>


Re: Slow select times on select with xpath

От
Dimitri Fontaine
Дата:
astro77 <astro_coder@yahoo.com> writes:
> Kevin Grittner wrote:
>> I would try to minimize how many XML values it had to read, parse, and
>> search.  The best approach that comes to mind would be to use tsearch2
>> techniques (with a GIN or GiST index on the tsvector) to identify
>> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
>> to combine that with your xpath search.
>
> Thanks Kevin. I thought about using tsearch2 but I need to be able to select
> exact values on other numerical queries and cannot use "contains" queries.
> It's got to be fast so I cannot have lots of records returned and have to do
> secondary processing on the xml for the records which contain the exact
> value I'm looking for. This is one of the reasons I moved from using Lucene
> for searching. I hope this makes sense.

I think he meant something following this skeleton:

  SELECT ...
    FROM ( SELECT ...
             FROM ...
            WHERE /* insert preliminary filtering here */
         )

   WHERE /* insert xpath related filtering here */

Hopefully you have a preliminary filtering available that's restrictive
enough for the xpath filtering to only have to check few rows. Kevin
proposes that this preliminary filtering be based on Tsearch with an
adequate index (GiST for data changing a lot, GIN for pretty static
set).

As you can see the two-steps filtering can be done in a single SQL query.

Regards,
--
dim