Обсуждение: Slow search.. quite clueless

Поиск
Список
Период
Сортировка

Slow search.. quite clueless

От
Yonatan Ben-Nes
Дата:
Hi all,

Im building a site where the users can search for products with up to 4
diffrent keywords which all MUST match to each product which found as a
result to the search.

I got 2 tables (which are relevant to the issue :)), one is the product
table (5 million rows) and the other is the keyword table which hold the
keywords of each product (60 million rows).

The scheme of the tables is as follows:

                       Table "public.product"
            Column           |     Type      |      Modifiers
----------------------------+---------------+---------------------
  product_id                 | text          | not null
  product_name               | text          | not null
  retail_price               | numeric(10,2) | not null
  etc...
Indexes:
     "product_product_id_key" UNIQUE, btree (product_id)

          Table "public.keyword"
    Column    |     Type      | Modifiers
-------------+---------------+-----------
  product_id  | text          | not null
  keyword     | text          | not null
Indexes:
     "keyword_keyword" btree (keyword)

The best query which I succeded to do till now is adding the keyword
table for each keyword searched for example if someone search for "belt"
& "black" & "pants" it will create the following query:

poweraise.com=# EXPLAIN ANALYZE SELECT

product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price

FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword
t2 USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13;

        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
time=969.798..1520.354 rows=6 loops=1)
    ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578)
(actual time=969.794..1520.337 rows=6 loops=1)
          Hash Cond: ("outer".product_id = "inner".product_id)
          ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309
width=612) (actual time=82.266..1474.018 rows=156 loops=1)
                ->  Hash Join  (cost=18867.07..2581181.09 rows=55309
width=34) (actual time=82.170..1462.104 rows=156 loops=1)
                      Hash Cond: ("outer".product_id = "inner".product_id)
                      ->  Index Scan using keyword_keyword on keyword t2
  (cost=0.00..331244.43 rows=140771 width=17) (actual
time=0.033..1307.167 rows=109007 loops=1)
                            Index Cond: (keyword = 'black'::text)
                      ->  Hash  (cost=18851.23..18851.23 rows=6337
width=17) (actual time=16.145..16.145 rows=0 loops=1)
                            ->  Index Scan using keyword_keyword on
keyword t1  (cost=0.00..18851.23 rows=6337 width=17) (actual
time=0.067..11.050 rows=3294 loops=1)
                                  Index Cond: (keyword = 'belt'::text)
                ->  Index Scan using product_product_id_key on product
(cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
loops=156)
                      Index Cond: (product.product_id = "outer".product_id)
          ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
time=42.863..42.863 rows=0 loops=1)
                ->  Index Scan using keyword_keyword on keyword t3
(cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
rows=3932 loops=1)
                      Index Cond: (keyword = 'pants'::text)
  Total runtime: 1521.441 ms
(17 rows)

Sometimes the query work fast even for 3 keywords but that doesnt help
me if at other times it take ages....

Now to find a result for 1 keyword its really flying so I also tried to
make 3 queries and do INTERSECT between them but it was found out to be
extremly slow...

Whats make this query slow as far as I understand is all the merging
between the results of each table... I tried to divide the keyword table
into lots of keywords table which each hold keywords which start only
with a specific letter, it did improve the speeds but not in a real
significant way.. tried clusters,indexes,SET STATISTICS,WITHOUT OIDS on
the keyword table and what not.. im quite clueless...

Actually I even started to look on other solutions and maybe you can say
something about them also.. maybe they can help me:
1. Omega (From the Xapian project) - http://www.xapian.org/
2. mnoGoSearch - http://www.mnogosearch.org/doc.html
3. Swish-e - http://swish-e.org/index.html

To add on everything I want at the end to be able to ORDER BY the
results like order the product by price, but im less concerned about
that cause I saw that with cluster I can do it without any extra overhead.

Thanks alot in advance,
Yonatan Ben-Nes


Re: Slow search.. quite clueless

От
Oleg Bartunov
Дата:
contrib/tsearch2 ( http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ )
might works for you. It might because performance depends on
cardinality of your keywords.

      Oleg
On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:

> Hi all,
>
> Im building a site where the users can search for products with up to 4
> diffrent keywords which all MUST match to each product which found as a
> result to the search.
>
> I got 2 tables (which are relevant to the issue :)), one is the product table
> (5 million rows) and the other is the keyword table which hold the keywords
> of each product (60 million rows).
>
> The scheme of the tables is as follows:
>
>                      Table "public.product"
>           Column           |     Type      |      Modifiers
> ----------------------------+---------------+---------------------
> product_id                 | text          | not null
> product_name               | text          | not null
> retail_price               | numeric(10,2) | not null
> etc...
> Indexes:
>    "product_product_id_key" UNIQUE, btree (product_id)
>
>         Table "public.keyword"
>   Column    |     Type      | Modifiers
> -------------+---------------+-----------
> product_id  | text          | not null
> keyword     | text          | not null
> Indexes:
>    "keyword_keyword" btree (keyword)
>
> The best query which I succeded to do till now is adding the keyword table
> for each keyword searched for example if someone search for "belt" & "black"
> & "pants" it will create the following query:
>
> poweraise.com=# EXPLAIN ANALYZE SELECT
>
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price

> FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword t2
> USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
> t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13;
>
>       QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
> time=969.798..1520.354 rows=6 loops=1)
>   ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578) (actual
> time=969.794..1520.337 rows=6 loops=1)
>         Hash Cond: ("outer".product_id = "inner".product_id)
>         ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309 width=612)
> (actual time=82.266..1474.018 rows=156 loops=1)
>               ->  Hash Join  (cost=18867.07..2581181.09 rows=55309 width=34)
> (actual time=82.170..1462.104 rows=156 loops=1)
>                     Hash Cond: ("outer".product_id = "inner".product_id)
>                     ->  Index Scan using keyword_keyword on keyword t2
> (cost=0.00..331244.43 rows=140771 width=17) (actual time=0.033..1307.167
> rows=109007 loops=1)
>                           Index Cond: (keyword = 'black'::text)
>                     ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17)
> (actual time=16.145..16.145 rows=0 loops=1)
>                           ->  Index Scan using keyword_keyword on keyword t1
> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.067..11.050 rows=3294
> loops=1)
>                                 Index Cond: (keyword = 'belt'::text)
>               ->  Index Scan using product_product_id_key on product
> (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
> loops=156)
>                     Index Cond: (product.product_id = "outer".product_id)
>         ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
> time=42.863..42.863 rows=0 loops=1)
>               ->  Index Scan using keyword_keyword on keyword t3
> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120 rows=3932
> loops=1)
>                     Index Cond: (keyword = 'pants'::text)
> Total runtime: 1521.441 ms
> (17 rows)
>
> Sometimes the query work fast even for 3 keywords but that doesnt help me if
> at other times it take ages....
>
> Now to find a result for 1 keyword its really flying so I also tried to make
> 3 queries and do INTERSECT between them but it was found out to be extremly
> slow...
>
> Whats make this query slow as far as I understand is all the merging between
> the results of each table... I tried to divide the keyword table into lots of
> keywords table which each hold keywords which start only with a specific
> letter, it did improve the speeds but not in a real significant way.. tried
> clusters,indexes,SET STATISTICS,WITHOUT OIDS on the keyword table and what
> not.. im quite clueless...
>
> Actually I even started to look on other solutions and maybe you can say
> something about them also.. maybe they can help me:
> 1. Omega (From the Xapian project) - http://www.xapian.org/
> 2. mnoGoSearch - http://www.mnogosearch.org/doc.html
> 3. Swish-e - http://swish-e.org/index.html
>
> To add on everything I want at the end to be able to ORDER BY the results
> like order the product by price, but im less concerned about that cause I saw
> that with cluster I can do it without any extra overhead.
>
> Thanks alot in advance,
> Yonatan Ben-Nes
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>

      Regards,
          Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Slow search.. quite clueless

От
Alex Turner
Дата:
I"m by no means an expert on this, and perhaps someone with more knowledge can help, but it looks to me like the planner estimate and the actual cost are significantly different which to me means that an analyze is required, or/and increase the stats on these tables would be usefull.  Also I'm wondering if you can avoid the dereference oid lookup by created the index as keyword,product_id instead of just keyword.

Alex Turner
NetEconomist

On 9/20/05, Oleg Bartunov <oleg@sai.msu.su> wrote:
contrib/tsearch2 ( http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ )
might works for you. It might because performance depends on
cardinality of your keywords.

        Oleg
On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:

> Hi all,
>
> Im building a site where the users can search for products with up to 4
> diffrent keywords which all MUST match to each product which found as a
> result to the search.
>
> I got 2 tables (which are relevant to the issue :)), one is the product table
> (5 million rows) and the other is the keyword table which hold the keywords
> of each product (60 million rows).
>
> The scheme of the tables is as follows:
>
>                      Table "public.product"
>           Column           |     Type      |      Modifiers
> ----------------------------+---------------+---------------------
> product_id                 | text          | not null
> product_name               | text          | not null
> retail_price               | numeric(10,2) | not null
> etc...
> Indexes:
>    "product_product_id_key" UNIQUE, btree (product_id)
>
>         Table "public.keyword "
>   Column    |     Type      | Modifiers
> -------------+---------------+-----------
> product_id  | text          | not null
> keyword     | text          | not null
> Indexes:
>    "keyword_keyword" btree (keyword)
>
> The best query which I succeded to do till now is adding the keyword table
> for each keyword searched for example if someone search for "belt" & "black"
> & "pants" it will create the following query:
>
> poweraise.com=# EXPLAIN ANALYZE SELECT
> product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price
> FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword t2
> USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
> t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13;
>
>       QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
> time=969.798..1520.354 rows=6 loops=1)
>   ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578) (actual
> time=969.794..1520.337 rows=6 loops=1)
>         Hash Cond: ("outer".product_id = "inner".product_id)
>         ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309 width=612)
> (actual time=82.266..1474.018 rows=156 loops=1)
>               ->  Hash Join  (cost=18867.07..2581181.09 rows=55309 width=34)
> (actual time=82.170..1462.104 rows=156 loops=1)
>                     Hash Cond: ("outer".product_id = "inner".product_id)
>                     ->  Index Scan using keyword_keyword on keyword t2
> (cost=0.00..331244.43 rows=140771 width=17) (actual time=0.033..1307.167
> rows=109007 loops=1)
>                           Index Cond: (keyword = 'black'::text)
>                     ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17)
> (actual time=16.145..16.145 rows=0 loops=1)
>                           ->  Index Scan using keyword_keyword on keyword t1
> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.067..11.050 rows=3294
> loops=1)
>                                 Index Cond: (keyword = 'belt'::text)
>               ->  Index Scan using product_product_id_key on product
> (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
> loops=156)
>                     Index Cond: (product.product_id = "outer".product_id)
>         ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
> time=42.863..42.863 rows=0 loops=1)
>               ->  Index Scan using keyword_keyword on keyword t3
> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120 rows=3932
> loops=1)
>                     Index Cond: (keyword = 'pants'::text)
> Total runtime: 1521.441 ms
> (17 rows)
>
> Sometimes the query work fast even for 3 keywords but that doesnt help me if
> at other times it take ages....
>
> Now to find a result for 1 keyword its really flying so I also tried to make
> 3 queries and do INTERSECT between them but it was found out to be extremly
> slow...
>
> Whats make this query slow as far as I understand is all the merging between
> the results of each table... I tried to divide the keyword table into lots of
> keywords table which each hold keywords which start only with a specific
> letter, it did improve the speeds but not in a real significant way.. tried
> clusters,indexes,SET STATISTICS,WITHOUT OIDS on the keyword table and what
> not.. im quite clueless...
>
> Actually I even started to look on other solutions and maybe you can say
> something about them also.. maybe they can help me:
> 1. Omega (From the Xapian project) - http://www.xapian.org/
> 2. mnoGoSearch - http://www.mnogosearch.org/doc.html
> 3. Swish-e - http://swish-e.org/index.html
>
> To add on everything I want at the end to be able to ORDER BY the results
> like order the product by price, but im less concerned about that cause I saw
> that with cluster I can do it without any extra overhead.
>
> Thanks alot in advance,
> Yonatan Ben-Nes
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>

        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Slow search.. quite clueless

От
Philip Hallstrom
Дата:
> contrib/tsearch2 ( http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ )
> might works for you. It might because performance depends on cardinality of
> your keywords.

Seconded.  We use tsearch2 to earch about 40,000 rows containing
manufacturer, brand, and product name and it returns a result almost
instantly.  Before when we did normal SQL "manufacture LIKE ..., etc." it
would take 20-30 seconds.

One thing to check is the english.stop file which contains words to skip
(i, a, the, etc.).  In our case we removed almost all of them since one of
our products is "7 up" (the drink) and it would remove "up".  Made it
really hard to pull up 7 up in the results :)

-philip

>
>     Oleg
> On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:
>
>> Hi all,
>>
>> Im building a site where the users can search for products with up to 4
>> diffrent keywords which all MUST match to each product which found as a
>> result to the search.
>>
>> I got 2 tables (which are relevant to the issue :)), one is the product
>> table (5 million rows) and the other is the keyword table which hold the
>> keywords of each product (60 million rows).
>>
>> The scheme of the tables is as follows:
>>
>>                      Table "public.product"
>>           Column           |     Type      |      Modifiers
>> ----------------------------+---------------+---------------------
>> product_id                 | text          | not null
>> product_name               | text          | not null
>> retail_price               | numeric(10,2) | not null
>> etc...
>> Indexes:
>>    "product_product_id_key" UNIQUE, btree (product_id)
>>
>>         Table "public.keyword"
>>   Column    |     Type      | Modifiers
>> -------------+---------------+-----------
>> product_id  | text          | not null
>> keyword     | text          | not null
>> Indexes:
>>    "keyword_keyword" btree (keyword)
>>
>> The best query which I succeded to do till now is adding the keyword table
>> for each keyword searched for example if someone search for "belt" &
>> "black" & "pants" it will create the following query:
>>
>> poweraise.com=# EXPLAIN ANALYZE SELECT
>>
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price

>> FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword t2
>> USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
>> t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13;
>>
>>       QUERY PLAN
>>
>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
>> time=969.798..1520.354 rows=6 loops=1)
>>   ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578) (actual
>> time=969.794..1520.337 rows=6 loops=1)
>>         Hash Cond: ("outer".product_id = "inner".product_id)
>>         ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309 width=612)
>> (actual time=82.266..1474.018 rows=156 loops=1)
>>               ->  Hash Join  (cost=18867.07..2581181.09 rows=55309
>> width=34) (actual time=82.170..1462.104 rows=156 loops=1)
>>                     Hash Cond: ("outer".product_id = "inner".product_id)
>>                     ->  Index Scan using keyword_keyword on keyword t2
>> (cost=0.00..331244.43 rows=140771 width=17) (actual time=0.033..1307.167
>> rows=109007 loops=1)
>>                           Index Cond: (keyword = 'black'::text)
>>                     ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17)
>> (actual time=16.145..16.145 rows=0 loops=1)
>>                           ->  Index Scan using keyword_keyword on keyword
>> t1 (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.067..11.050
>> rows=3294 loops=1)
>>                                 Index Cond: (keyword = 'belt'::text)
>>               ->  Index Scan using product_product_id_key on product
>> (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
>> loops=156)
>>                     Index Cond: (product.product_id = "outer".product_id)
>>         ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
>> time=42.863..42.863 rows=0 loops=1)
>>               ->  Index Scan using keyword_keyword on keyword t3
>> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
>> rows=3932 loops=1)
>>                     Index Cond: (keyword = 'pants'::text)
>> Total runtime: 1521.441 ms
>> (17 rows)
>>
>> Sometimes the query work fast even for 3 keywords but that doesnt help me
>> if at other times it take ages....
>>
>> Now to find a result for 1 keyword its really flying so I also tried to
>> make 3 queries and do INTERSECT between them but it was found out to be
>> extremly slow...
>>
>> Whats make this query slow as far as I understand is all the merging
>> between the results of each table... I tried to divide the keyword table
>> into lots of keywords table which each hold keywords which start only with
>> a specific letter, it did improve the speeds but not in a real significant
>> way.. tried clusters,indexes,SET STATISTICS,WITHOUT OIDS on the keyword
>> table and what not.. im quite clueless...
>>
>> Actually I even started to look on other solutions and maybe you can say
>> something about them also.. maybe they can help me:
>> 1. Omega (From the Xapian project) - http://www.xapian.org/
>> 2. mnoGoSearch - http://www.mnogosearch.org/doc.html
>> 3. Swish-e - http://swish-e.org/index.html
>>
>> To add on everything I want at the end to be able to ORDER BY the results
>> like order the product by price, but im less concerned about that cause I
>> saw that with cluster I can do it without any extra overhead.
>>
>> Thanks alot in advance,
>> Yonatan Ben-Nes
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>      choose an index scan if your joining column's datatypes do not
>      match
>

Re: Slow search.. quite clueless

От
Oleg Bartunov
Дата:
On Tue, 20 Sep 2005, Philip Hallstrom wrote:

>> contrib/tsearch2 ( http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
>> )
>> might works for you. It might because performance depends on cardinality of
>> your keywords.
>
> Seconded.  We use tsearch2 to earch about 40,000 rows containing
> manufacturer, brand, and product name and it returns a result almost
> instantly.  Before when we did normal SQL "manufacture LIKE ..., etc." it
> would take 20-30 seconds.
>
> One thing to check is the english.stop file which contains words to skip (i,
> a, the, etc.).  In our case we removed almost all of them since one of our
> products is "7 up" (the drink) and it would remove "up".  Made it really hard
> to pull up 7 up in the results :)

we have "rewriting query support ( thesauri search)" in our todo
(http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo).


>
> -philip
>
>>
>>     Oleg
>> On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:
>>
>>> Hi all,
>>>
>>> Im building a site where the users can search for products with up to 4
>>> diffrent keywords which all MUST match to each product which found as a
>>> result to the search.
>>>
>>> I got 2 tables (which are relevant to the issue :)), one is the product
>>> table (5 million rows) and the other is the keyword table which hold the
>>> keywords of each product (60 million rows).
>>>
>>> The scheme of the tables is as follows:
>>>
>>>                      Table "public.product"
>>>           Column           |     Type      |      Modifiers
>>> ----------------------------+---------------+---------------------
>>> product_id                 | text          | not null
>>> product_name               | text          | not null
>>> retail_price               | numeric(10,2) | not null
>>> etc...
>>> Indexes:
>>>    "product_product_id_key" UNIQUE, btree (product_id)
>>>
>>>         Table "public.keyword"
>>>   Column    |     Type      | Modifiers
>>> -------------+---------------+-----------
>>> product_id  | text          | not null
>>> keyword     | text          | not null
>>> Indexes:
>>>    "keyword_keyword" btree (keyword)
>>>
>>> The best query which I succeded to do till now is adding the keyword table
>>> for each keyword searched for example if someone search for "belt" &
>>> "black" & "pants" it will create the following query:
>>>
>>> poweraise.com=# EXPLAIN ANALYZE SELECT
>>>
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price

>>> FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword t2
>>> USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
>>> t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13;
>>>
>>>       QUERY PLAN
>>>
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
>>> time=969.798..1520.354 rows=6 loops=1)
>>>   ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578) (actual
>>> time=969.794..1520.337 rows=6 loops=1)
>>>         Hash Cond: ("outer".product_id = "inner".product_id)
>>>         ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309 width=612)
>>> (actual time=82.266..1474.018 rows=156 loops=1)
>>>               ->  Hash Join  (cost=18867.07..2581181.09 rows=55309
>>> width=34) (actual time=82.170..1462.104 rows=156 loops=1)
>>>                     Hash Cond: ("outer".product_id = "inner".product_id)
>>>                     ->  Index Scan using keyword_keyword on keyword t2
>>> (cost=0.00..331244.43 rows=140771 width=17) (actual time=0.033..1307.167
>>> rows=109007 loops=1)
>>>                           Index Cond: (keyword = 'black'::text)
>>>                     ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17)
>>> (actual time=16.145..16.145 rows=0 loops=1)
>>>                           ->  Index Scan using keyword_keyword on keyword
>>> t1 (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.067..11.050
>>> rows=3294 loops=1)
>>>                                 Index Cond: (keyword = 'belt'::text)
>>>               ->  Index Scan using product_product_id_key on product
>>> (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
>>> loops=156)
>>>                     Index Cond: (product.product_id = "outer".product_id)
>>>         ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
>>> time=42.863..42.863 rows=0 loops=1)
>>>               ->  Index Scan using keyword_keyword on keyword t3
>>> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
>>> rows=3932 loops=1)
>>>                     Index Cond: (keyword = 'pants'::text)
>>> Total runtime: 1521.441 ms
>>> (17 rows)
>>>
>>> Sometimes the query work fast even for 3 keywords but that doesnt help me
>>> if at other times it take ages....
>>>
>>> Now to find a result for 1 keyword its really flying so I also tried to
>>> make 3 queries and do INTERSECT between them but it was found out to be
>>> extremly slow...
>>>
>>> Whats make this query slow as far as I understand is all the merging
>>> between the results of each table... I tried to divide the keyword table
>>> into lots of keywords table which each hold keywords which start only with
>>> a specific letter, it did improve the speeds but not in a real significant
>>> way.. tried clusters,indexes,SET STATISTICS,WITHOUT OIDS on the keyword
>>> table and what not.. im quite clueless...
>>>
>>> Actually I even started to look on other solutions and maybe you can say
>>> something about them also.. maybe they can help me:
>>> 1. Omega (From the Xapian project) - http://www.xapian.org/
>>> 2. mnoGoSearch - http://www.mnogosearch.org/doc.html
>>> 3. Swish-e - http://swish-e.org/index.html
>>>
>>> To add on everything I want at the end to be able to ORDER BY the results
>>> like order the product by price, but im less concerned about that cause I
>>> saw that with cluster I can do it without any extra overhead.
>>>
>>> Thanks alot in advance,
>>> Yonatan Ben-Nes
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>>      message can get through to the mailing list cleanly
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>      choose an index scan if your joining column's datatypes do not
>>      match
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Slow search.. quite clueless

От
"Olly Betts"
Дата:
Yonatan Ben-Nes wrote:
> Actually I even started to look on other solutions and maybe you can say
> something about them also.. maybe they can help me:
> 1. Omega (From the Xapian project) - http://www.xapian.org/

You could certainly do this with Xapian and Omega.  With only 5
million records it should be very quick.

The easiest approach would be to periodically dump the SQL tables
and build a new Xapian index which reflects the SQL database - you'd
probably want to customise the "dbi2omega" script in the Omega
distribution.  This approach works particularly well if the tables
are updated in a batch fashion (one big weekly update, say).

Alternatively you could hook into whatever updates the SQL database
and get it to make corresponding updates to the Xapian index.  That
has the advantage that they'll always be in step, but is probably
more work to set up.

The main drawback compared to doing everything in SQL is that you'd
have two systems to deal with rather than just one...

Cheers,
    Olly


Re: Slow search.. quite clueless

От
Yonatan Ben-Nes
Дата:
Oleg Bartunov wrote:
> On Tue, 20 Sep 2005, Philip Hallstrom wrote:
>
>>> contrib/tsearch2 (
>>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ )
>>> might works for you. It might because performance depends on
>>> cardinality of your keywords.
>>
>>
>> Seconded.  We use tsearch2 to earch about 40,000 rows containing
>> manufacturer, brand, and product name and it returns a result almost
>> instantly.  Before when we did normal SQL "manufacture LIKE ..., etc."
>> it would take 20-30 seconds.
>>
>> One thing to check is the english.stop file which contains words to
>> skip (i, a, the, etc.).  In our case we removed almost all of them
>> since one of our products is "7 up" (the drink) and it would remove
>> "up".  Made it really hard to pull up 7 up in the results :)
>
>
> we have "rewriting query support ( thesauri search)" in our todo
> (http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo).
>
>
>>
>> -philip
>>
>>>
>>>     Oleg
>>> On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:
>>>
>>>> Hi all,
>>>>
>>>> Im building a site where the users can search for products with up
>>>> to 4 diffrent keywords which all MUST match to each product which
>>>> found as a result to the search.
>>>>
>>>> I got 2 tables (which are relevant to the issue :)), one is the
>>>> product table (5 million rows) and the other is the keyword table
>>>> which hold the keywords of each product (60 million rows).
>>>>
>>>> The scheme of the tables is as follows:
>>>>
>>>>                      Table "public.product"
>>>>           Column           |     Type      |      Modifiers
>>>> ----------------------------+---------------+---------------------
>>>> product_id                 | text          | not null
>>>> product_name               | text          | not null
>>>> retail_price               | numeric(10,2) | not null
>>>> etc...
>>>> Indexes:
>>>>    "product_product_id_key" UNIQUE, btree (product_id)
>>>>
>>>>         Table "public.keyword"
>>>>   Column    |     Type      | Modifiers
>>>> -------------+---------------+-----------
>>>> product_id  | text          | not null
>>>> keyword     | text          | not null
>>>> Indexes:
>>>>    "keyword_keyword" btree (keyword)
>>>>
>>>> The best query which I succeded to do till now is adding the keyword
>>>> table for each keyword searched for example if someone search for
>>>> "belt" & "black" & "pants" it will create the following query:
>>>>
>>>> poweraise.com=# EXPLAIN ANALYZE SELECT
>>>>
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price

>>>> FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN
>>>> keyword t2 USING(product_id) INNER JOIN keyword t3 USING(product_id)
>>>> WHERE t1.keyword='belt' AND t2.keyword='black' AND
>>>> t3.keyword='pants' LIMIT 13;
>>>>
>>>>       QUERY PLAN
>>>>
>>>>
>>>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

>>>>
>>>> Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
>>>> time=969.798..1520.354 rows=6 loops=1)
>>>>   ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578)
>>>> (actual time=969.794..1520.337 rows=6 loops=1)
>>>>         Hash Cond: ("outer".product_id = "inner".product_id)
>>>>         ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309
>>>> width=612) (actual time=82.266..1474.018 rows=156 loops=1)
>>>>               ->  Hash Join  (cost=18867.07..2581181.09 rows=55309
>>>> width=34) (actual time=82.170..1462.104 rows=156 loops=1)
>>>>                     Hash Cond: ("outer".product_id =
>>>> "inner".product_id)
>>>>                     ->  Index Scan using keyword_keyword on keyword
>>>> t2 (cost=0.00..331244.43 rows=140771 width=17) (actual
>>>> time=0.033..1307.167 rows=109007 loops=1)
>>>>                           Index Cond: (keyword = 'black'::text)
>>>>                     ->  Hash  (cost=18851.23..18851.23 rows=6337
>>>> width=17) (actual time=16.145..16.145 rows=0 loops=1)
>>>>                           ->  Index Scan using keyword_keyword on
>>>> keyword t1 (cost=0.00..18851.23 rows=6337 width=17) (actual
>>>> time=0.067..11.050 rows=3294 loops=1)
>>>>                                 Index Cond: (keyword = 'belt'::text)
>>>>               ->  Index Scan using product_product_id_key on product
>>>> (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
>>>> loops=156)
>>>>                     Index Cond: (product.product_id =
>>>> "outer".product_id)
>>>>         ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17)
>>>> (actual time=42.863..42.863 rows=0 loops=1)
>>>>               ->  Index Scan using keyword_keyword on keyword t3
>>>> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
>>>> rows=3932 loops=1)
>>>>                     Index Cond: (keyword = 'pants'::text)
>>>> Total runtime: 1521.441 ms
>>>> (17 rows)
>>>>
>>>> Sometimes the query work fast even for 3 keywords but that doesnt
>>>> help me if at other times it take ages....
>>>>
>>>> Now to find a result for 1 keyword its really flying so I also tried
>>>> to make 3 queries and do INTERSECT between them but it was found out
>>>> to be extremly slow...
>>>>
>>>> Whats make this query slow as far as I understand is all the merging
>>>> between the results of each table... I tried to divide the keyword
>>>> table into lots of keywords table which each hold keywords which
>>>> start only with a specific letter, it did improve the speeds but not
>>>> in a real significant way.. tried clusters,indexes,SET
>>>> STATISTICS,WITHOUT OIDS on the keyword table and what not.. im quite
>>>> clueless...
>>>>
>>>> Actually I even started to look on other solutions and maybe you can
>>>> say something about them also.. maybe they can help me:
>>>> 1. Omega (From the Xapian project) - http://www.xapian.org/
>>>> 2. mnoGoSearch - http://www.mnogosearch.org/doc.html
>>>> 3. Swish-e - http://swish-e.org/index.html
>>>>
>>>> To add on everything I want at the end to be able to ORDER BY the
>>>> results like order the product by price, but im less concerned about
>>>> that cause I saw that with cluster I can do it without any extra
>>>> overhead.
>>>>
>>>> Thanks alot in advance,
>>>> Yonatan Ben-Nes
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>>>      message can get through to the mailing list cleanly
>>>>
>>>
>>>     Regards,
>>>         Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>> Sternberg Astronomical Institute, Moscow University (Russia)
>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>> phone: +007(095)939-16-83, +007(095)939-23-83
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>      choose an index scan if your joining column's datatypes do not
>>>      match
>>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83

Hi all,

First of all sorry for the delay we had a problem with out mail server...

The tsearch2 looks really promising, im starting to work with it now and
ill report what ill find.

And to Alex thanks but I tried already all of the things you recommended
and sadly it didnt help.

Thanks alot for the help everyone!
Yonatan Ben-Nes

Re: Slow search.. quite clueless

От
Yonatan Ben-Nes
Дата:
Olly Betts wrote:
> Yonatan Ben-Nes wrote:
>
>>Actually I even started to look on other solutions and maybe you can say
>>something about them also.. maybe they can help me:
>>1. Omega (From the Xapian project) - http://www.xapian.org/
>
>
> You could certainly do this with Xapian and Omega.  With only 5
> million records it should be very quick.
>
> The easiest approach would be to periodically dump the SQL tables
> and build a new Xapian index which reflects the SQL database - you'd
> probably want to customise the "dbi2omega" script in the Omega
> distribution.  This approach works particularly well if the tables
> are updated in a batch fashion (one big weekly update, say).
>
> Alternatively you could hook into whatever updates the SQL database
> and get it to make corresponding updates to the Xapian index.  That
> has the advantage that they'll always be in step, but is probably
> more work to set up.
>
> The main drawback compared to doing everything in SQL is that you'd
> have two systems to deal with rather than just one...
>
> Cheers,
>     Olly

Ok then ill try this option if the tsearch2 wont work fast enough for me
(hopefully it will :)).

Thanks alot,
    Ben-Nes Yonatan

Re: Slow search.. quite clueless

От
Dawid Kuroczko
Дата:
On 9/20/05, Yonatan Ben-Nes <da@canaan.co.il> wrote:
Hi all,

Im building a site where the users can search for products with up to 4
diffrent keywords which all MUST match to each product which found as a
result to the search.

I got 2 tables (which are relevant to the issue :)), one is the product
table (5 million rows) and the other is the keyword table which hold the
keywords of each product (60 million rows).

The scheme of the tables is as follows:

                       Table "public.product"
            Column           |     Type      |      Modifiers
----------------------------+---------------+---------------------
  product_id                 | text          | not null
  product_name               | text          | not null
  retail_price               | numeric(10,2) | not null
  etc...
Indexes:
     "product_product_id_key" UNIQUE, btree (product_id)

          Table "public.keyword"
    Column    |     Type      | Modifiers
-------------+---------------+-----------
  product_id  | text          | not null
  keyword     | text          | not null
Indexes:
     "keyword_keyword" btree (keyword)

The best query which I succeded to do till now is adding the keyword
table for each keyword searched for example if someone search for "belt"
& "black" & "pants" it will create the following query:

poweraise.com=# EXPLAIN ANALYZE SELECT
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price
FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword
t2 USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13;

        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
time=969.798..1520.354 rows=6 loops=1)
    ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578)
(actual time=969.794..1520.337 rows=6 loops=1)
          Hash Cond: ("outer".product_id = "inner".product_id)
          ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309
width=612) (actual time=82.266..1474.018 rows=156 loops=1)
                ->  Hash Join  (cost=18867.07..2581181.09 rows=55309
width=34) (actual time=82.170..1462.104 rows=156 loops=1)
                      Hash Cond: ("outer".product_id = "inner".product_id)
                      ->  Index Scan using keyword_keyword on keyword t2
  (cost=0.00..331244.43 rows=140771 width=17) (actual
time=0.033..1307.167 rows=109007 loops=1)
                            Index Cond: (keyword = 'black'::text)
                      ->  Hash  (cost=18851.23..18851.23 rows=6337
width=17) (actual time=16.145..16.145 rows=0 loops=1)
                            ->  Index Scan using keyword_keyword on
keyword t1  (cost=0.00..18851.23 rows=6337 width=17) (actual
time=0.067..11.050 rows=3294 loops=1)
                                  Index Cond: (keyword = 'belt'::text)
                ->  Index Scan using product_product_id_key on product
(cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
loops=156)
                      Index Cond: (product.product_id = "outer".product_id)
          ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
time=42.863..42.863 rows=0 loops=1)
                ->  Index Scan using keyword_keyword on keyword t3
(cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
rows=3932 loops=1)
                      Index Cond: (keyword = 'pants'::text)
  Total runtime: 1521.441 ms
(17 rows)

Sometimes the query work fast even for 3 keywords but that doesnt help
me if at other times it take ages....


Hmm, JOIN on a Huge table with LIMIT.  You may be suffering from
the same problem I had:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php

Tom came up with a patch which worked marvellous in my case:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php

Try applying this patch, it may solve your problem!

   Regards,
        Dawid

Re: Slow search.. quite clueless

От
Yonatan Ben-Nes
Дата:
Dawid Kuroczko wrote:
> On 9/20/05, *Yonatan Ben-Nes* <da@canaan.co.il <mailto:da@canaan.co.il>>
> wrote:
>
>     Hi all,
>
>     Im building a site where the users can search for products with up to 4
>     diffrent keywords which all MUST match to each product which found as a
>     result to the search.
>
>     I got 2 tables (which are relevant to the issue :)), one is the product
>     table (5 million rows) and the other is the keyword table which hold the
>     keywords of each product (60 million rows).
>
>     The scheme of the tables is as follows:
>
>                            Table "public.product"
>                 Column           |     Type      |      Modifiers
>     ----------------------------+---------------+---------------------
>       product_id                 | text          | not null
>       product_name               | text          | not null
>       retail_price               | numeric(10,2) | not null
>       etc...
>     Indexes:
>          "product_product_id_key" UNIQUE, btree (product_id)
>
>               Table "public.keyword"
>         Column    |     Type      | Modifiers
>     -------------+---------------+-----------
>       product_id  | text          | not null
>       keyword     | text          | not null
>     Indexes:
>          "keyword_keyword" btree (keyword)
>
>     The best query which I succeded to do till now is adding the keyword
>     table for each keyword searched for example if someone search for "belt"
>     & "black" & "pants" it will create the following query:
>
>     poweraise.com=# EXPLAIN ANALYZE SELECT
>
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price
>     FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword
>     t2 USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
>     t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants'
>     LIMIT 13;
>
>             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>       Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
>     time=969.798..1520.354 rows=6 loops=1)
>         ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578)
>     (actual time=969.794..1520.337 rows=6 loops=1)
>               Hash Cond: ("outer".product_id = "inner".product_id)
>               ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309
>     width=612) (actual time=82.266..1474.018 rows=156 loops=1)
>                     ->  Hash Join  (cost=18867.07..2581181.09 rows=55309
>     width=34) (actual time=82.170..1462.104 rows=156 loops=1)
>                           Hash Cond: ("outer".product_id =
>     "inner".product_id)
>                           ->  Index Scan using keyword_keyword on keyword t2
>       (cost=0.00..331244.43 rows=140771 width=17) (actual
>     time=0.033..1307.167 rows=109007 loops=1)
>                                 Index Cond: (keyword = 'black'::text)
>                           ->  Hash  (cost=18851.23..18851.23 rows=6337
>     width=17) (actual time=16.145..16.145 rows=0 loops=1)
>                                 ->  Index Scan using keyword_keyword on
>     keyword t1  (cost=0.00..18851.23 rows=6337 width=17) (actual
>     time=0.067..11.050 rows=3294 loops=1)
>                                       Index Cond: (keyword = 'belt'::text)
>                     ->  Index Scan using product_product_id_key on product
>     (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
>     loops=156)
>                           Index Cond: (product.product_id =
>     "outer".product_id)
>               ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
>     time=42.863..42.863 rows=0 loops=1)
>                     ->  Index Scan using keyword_keyword on keyword t3
>     (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
>     rows=3932 loops=1)
>                           Index Cond: (keyword = 'pants'::text)
>       Total runtime: 1521.441 ms
>     (17 rows)
>
>     Sometimes the query work fast even for 3 keywords but that doesnt help
>     me if at other times it take ages....
>
>
>
> Hmm, JOIN on a Huge table with LIMIT.  You may be suffering from
> the same problem I had:
>
> http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php
>
> Tom came up with a patch which worked marvellous in my case:
>
> http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php
>
> Try applying this patch, it may solve your problem!
>
>    Regards,
>         Dawid
>

Great then ill check it if the tsearch2 wont work (testing in about 2-3
hours...).

Thanks alot,
    Ben-Nes Yonatan

Re: Slow search.. quite clueless

От
Gábor Farkas
Дата:
Yonatan Ben-Nes wrote:
> Dawid Kuroczko wrote:
>> Hmm, JOIN on a Huge table with LIMIT.  You may be suffering from
>> the same problem I had:
>>
>> http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php
>>
>> Tom came up with a patch which worked marvellous in my case:
>>
>> http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php
>>
>> Try applying this patch, it may solve your problem!
>>
>>    Regards,
>>         Dawid
>>
>
> Great then ill check it if the tsearch2 wont work (testing in about 2-3
> hours...).

please post your results here too (if possible) ;)

i'm very interested in your research (we are also having performance
problems, so i also thought about tsearch2...)

thanks,
gabor

Re: Slow search.. quite clueless

От
Yonatan Ben-Nes
Дата:
Gábor Farkas wrote:
> Yonatan Ben-Nes wrote:
>
>> Dawid Kuroczko wrote:
>>
>>> Hmm, JOIN on a Huge table with LIMIT.  You may be suffering from
>>> the same problem I had:
>>>
>>> http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php
>>>
>>> Tom came up with a patch which worked marvellous in my case:
>>>
>>> http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php
>>>
>>> Try applying this patch, it may solve your problem!
>>>
>>>    Regards,
>>>         Dawid
>>>
>>
>> Great then ill check it if the tsearch2 wont work (testing in about
>> 2-3 hours...).
>
>
> please post your results here too (if possible) ;)
>
> i'm very interested in your research (we are also having performance
> problems, so i also thought about tsearch2...)
>
> thanks,
> gabor

No problem, ill do so :)

Regards,
   Yonatan Ben-Nes

Re: Slow search.. quite clueless

От
Yonatan Ben-Nes
Дата:
Oleg Bartunov wrote:
> contrib/tsearch2 (
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ )
> might works for you. It might because performance depends on cardinality
> of your keywords.
>
>      Oleg
> On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:
>
>> Hi all,
>>
>> Im building a site where the users can search for products with up to
>> 4 diffrent keywords which all MUST match to each product which found
>> as a result to the search.
>>
>> I got 2 tables (which are relevant to the issue :)), one is the
>> product table (5 million rows) and the other is the keyword table
>> which hold the keywords of each product (60 million rows).
>>
>> The scheme of the tables is as follows:
>>
>>                      Table "public.product"
>>           Column           |     Type      |      Modifiers
>> ----------------------------+---------------+---------------------
>> product_id                 | text          | not null
>> product_name               | text          | not null
>> retail_price               | numeric(10,2) | not null
>> etc...
>> Indexes:
>>    "product_product_id_key" UNIQUE, btree (product_id)
>>
>>         Table "public.keyword"
>>   Column    |     Type      | Modifiers
>> -------------+---------------+-----------
>> product_id  | text          | not null
>> keyword     | text          | not null
>> Indexes:
>>    "keyword_keyword" btree (keyword)
>>
>> The best query which I succeded to do till now is adding the keyword
>> table for each keyword searched for example if someone search for
>> "belt" & "black" & "pants" it will create the following query:
>>
>> poweraise.com=# EXPLAIN ANALYZE SELECT
>>
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price

>> FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN
>> keyword t2 USING(product_id) INNER JOIN keyword t3 USING(product_id)
>> WHERE t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants'
>> LIMIT 13;
>>
>>       QUERY PLAN
>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

>>
>> Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual
>> time=969.798..1520.354 rows=6 loops=1)
>>   ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578)
>> (actual time=969.794..1520.337 rows=6 loops=1)
>>         Hash Cond: ("outer".product_id = "inner".product_id)
>>         ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309
>> width=612) (actual time=82.266..1474.018 rows=156 loops=1)
>>               ->  Hash Join  (cost=18867.07..2581181.09 rows=55309
>> width=34) (actual time=82.170..1462.104 rows=156 loops=1)
>>                     Hash Cond: ("outer".product_id = "inner".product_id)
>>                     ->  Index Scan using keyword_keyword on keyword t2
>> (cost=0.00..331244.43 rows=140771 width=17) (actual
>> time=0.033..1307.167 rows=109007 loops=1)
>>                           Index Cond: (keyword = 'black'::text)
>>                     ->  Hash  (cost=18851.23..18851.23 rows=6337
>> width=17) (actual time=16.145..16.145 rows=0 loops=1)
>>                           ->  Index Scan using keyword_keyword on
>> keyword t1 (cost=0.00..18851.23 rows=6337 width=17) (actual
>> time=0.067..11.050 rows=3294 loops=1)
>>                                 Index Cond: (keyword = 'belt'::text)
>>               ->  Index Scan using product_product_id_key on product
>> (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
>> loops=156)
>>                     Index Cond: (product.product_id = "outer".product_id)
>>         ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual
>> time=42.863..42.863 rows=0 loops=1)
>>               ->  Index Scan using keyword_keyword on keyword t3
>> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
>> rows=3932 loops=1)
>>                     Index Cond: (keyword = 'pants'::text)
>> Total runtime: 1521.441 ms
>> (17 rows)
>>
>> Sometimes the query work fast even for 3 keywords but that doesnt help
>> me if at other times it take ages....
>>
>> Now to find a result for 1 keyword its really flying so I also tried
>> to make 3 queries and do INTERSECT between them but it was found out
>> to be extremly slow...
>>
>> Whats make this query slow as far as I understand is all the merging
>> between the results of each table... I tried to divide the keyword
>> table into lots of keywords table which each hold keywords which start
>> only with a specific letter, it did improve the speeds but not in a
>> real significant way.. tried clusters,indexes,SET STATISTICS,WITHOUT
>> OIDS on the keyword table and what not.. im quite clueless...
>>
>> Actually I even started to look on other solutions and maybe you can
>> say something about them also.. maybe they can help me:
>> 1. Omega (From the Xapian project) - http://www.xapian.org/
>> 2. mnoGoSearch - http://www.mnogosearch.org/doc.html
>> 3. Swish-e - http://swish-e.org/index.html
>>
>> To add on everything I want at the end to be able to ORDER BY the
>> results like order the product by price, but im less concerned about
>> that cause I saw that with cluster I can do it without any extra
>> overhead.
>>
>> Thanks alot in advance,
>> Yonatan Ben-Nes
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match

Hi again everyone,

Oleg I tried tsearch2 and happily it does work wonderfully for me
returning results extremly fast and actually its working even better
then I wanted with all of those neat features like: lexem, weight & stop
words.

I got only one problem which is when I want the results to be ordered by
a diffrent field (like print INT field) it takes quite alot of time for
it to do it if the query can return lots of results (for example search
for the word "computer") and thats even if I limit the results.
The best way to improve its speed for such quereies (that I've found...)
is to create an index on the field which I want to order by and using it
CLUSTER the table, after the clustering I drop the the index so it won't
be used when I run queries with ORDER BY on that field, that seem to
improve the speed, if anyone got a better idea ill be glad to hear it.

Anyway thanks alot everyone!
   Ben-Nes Yonatan

Re: Slow search.. quite clueless

От
Oleg Bartunov
Дата:
On Mon, 26 Sep 2005, Yonatan Ben-Nes wrote:

> Hi again everyone,
>
> Oleg I tried tsearch2 and happily it does work wonderfully for me returning
> results extremly fast and actually its working even better then I wanted with
> all of those neat features like: lexem, weight & stop words.
>
> I got only one problem which is when I want the results to be ordered by a
> diffrent field (like print INT field) it takes quite alot of time for it to
> do it if the query can return lots of results (for example search for the
> word "computer") and thats even if I limit the results.
> The best way to improve its speed for such quereies (that I've found...) is
> to create an index on the field which I want to order by and using it CLUSTER
> the table, after the clustering I drop the the index so it won't be used when
> I run queries with ORDER BY on that field, that seem to improve the speed, if
> anyone got a better idea ill be glad to hear it.

what's your actual query ?  have you tried multicolumn index ?


>
> Anyway thanks alot everyone!
>  Ben-Nes Yonatan
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Slow search.. quite clueless

От
Yonatan Ben-Nes
Дата:
Oleg Bartunov wrote:
> On Mon, 26 Sep 2005, Yonatan Ben-Nes wrote:
>
>> Hi again everyone,
>>
>> Oleg I tried tsearch2 and happily it does work wonderfully for me
>> returning results extremly fast and actually its working even better
>> then I wanted with all of those neat features like: lexem, weight &
>> stop words.
>>
>> I got only one problem which is when I want the results to be ordered
>> by a diffrent field (like print INT field) it takes quite alot of time
>> for it to do it if the query can return lots of results (for example
>> search for the word "computer") and thats even if I limit the results.
>> The best way to improve its speed for such quereies (that I've
>> found...) is to create an index on the field which I want to order by
>> and using it CLUSTER the table, after the clustering I drop the the
>> index so it won't be used when I run queries with ORDER BY on that
>> field, that seem to improve the speed, if anyone got a better idea ill
>> be glad to hear it.
>
>
> what's your actual query ?  have you tried multicolumn index ?
>
>
>>
>> Anyway thanks alot everyone!
>>  Ben-Nes Yonatan
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83

Hi Oleg,

I can't use a multicolumn index cause I already use on that table the
tsearch2 index, here is the query:

EXPLAIN ANALYZE SELECT product_id,final_price FROM product WHERE
keywords_vector @@ to_tsquery('cat') ORDER BY retail_price LIMIT 13;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=4.02..4.03 rows=1 width=39) (actual time=367.627..367.654
rows=13 loops=1)
    ->  Sort  (cost=4.02..4.03 rows=1 width=39) (actual
time=367.622..367.630 rows=13 loops=1)
          Sort Key: retail_price
          ->  Index Scan using product_keywords_vector_idx on product
(cost=0.00..4.01 rows=1 width=39) (actual time=0.056..276.385 rows=14295
loops=1)
                Index Cond: (keywords_vector @@ '\'cat\''::tsquery)
  Total runtime: 370.916 ms
(6 rows)

Now this is the result after its already at the cache (made such a query
b4), the first time I ran this query it took few seconds...

Thanks as always :),
   Ben-Nes Yonatan