Обсуждение: Using FTI-Search (likely a more general runtime-puzzle)

Поиск

Список

Период

Сортировка

Using FTI-Search (likely a more general runtime-puzzle)

От

"Markus Wollny"

Дата:

24 июля 2002 г., 12:38:00

Hello!

I have implemented the full text index feature from /contrib and so far
the results look quite promising. We use it to index several columns of
an article table. The main column is called 'text'. An example for a
search would be to retrieve all the articles with "grand theft auto" in
them somewhere (not necessarily one string but individual words).

------------------------------------------------------------------------
--------------------
Case 1: Our Good Old Fashioned Search (TM) over the article-table:

ANALYZE select distinct (p1.article_id)
from article p1, article p2, article p3
where lower(p1.text) LIKE '%grand%'
AND lower(p2.text) LIKE '%theft%'
AND lower(p3.text) LIKE '%auto%'
AND p1.article_id=p2.article_id
and p1.article_id=p3.article_id

Unique  (cost=3067.03..3067.04 rows=1 width=12) (actual
time=1455.25..1455.44 rows=30 loops=1)
  ->  Sort  (cost=3067.03..3067.03 rows=1 width=12) (actual
time=1455.24..1455.29 rows=30 loops=1)
        ->  Nested Loop  (cost=1098.52..3067.02 rows=1 width=12) (actual
time=1211.54..1453.82 rows=30 loops=1)
              ->  Hash Join  (cost=1098.52..2197.47 rows=6 width=8)
(actual time=1121.32..1309.23 rows=30 loops=1)
                    ->  Seq Scan on article p2  (cost=0.00..1098.43
rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1)
                    ->  Hash  (cost=1098.43..1098.43 rows=35 width=4)
(actual time=766.21..766.21 rows=0 loops=1)
                          ->  Seq Scan on article p3
(cost=0.00..1098.43 rows=35 width=4) (actual time=13.63..763.73 rows=411
loops=1)
              ->  Index Scan using idx0_article on article p1
(cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80 rows=1
loops=30)
Total runtime: 1456.00 msec

This is the time, fti needs to beat in order to be of any use to us.

------------------------------------------------------------------------
--------------------

Case 2:
Searching the FTI-Table:
EXPLAIN ANALYZE select distinct (p.article_id)
from article p, article_fti f1, article_fti f2, article_fti f3
where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;


Unique  (cost=39200652.79..39647427.57 rows=17870991 width=20) (actual
time=52506.81..52506.81 rows=0 loops=1)
  ->  Sort  (cost=39200652.79..39200652.79 rows=178709913 width=20)
(actual time=52506.81..52506.81 rows=0 loops=1)
        ->  Merge Join  (cost=222181.40..313718.83 rows=178709913
width=20) (actual time=52505.69..52505.69 rows=0 loops=1)
              ->  Merge Join  (cost=147175.62..170464.70 rows=1855421
width=8) (actual time=37084.36..37084.36 rows=1 loops=1)
                    ->  Sort  (cost=73587.81..73587.81 rows=19264
width=4) (actual time=22423.27..22423.27 rows=1 loops=1)
                          ->  Seq Scan on article_fti f2
(cost=0.00..72216.86 rows=19264 width=4) (actual time=18297.22..22422.20
rows=97 loops=1)
                    ->  Sort  (cost=73587.81..73587.81 rows=19264
width=4) (actual time=14661.04..14661.04 rows=1 loops=1)
                          ->  Seq Scan on article_fti f3
(cost=0.00..72216.86 rows=19264 width=4) (actual time=1344.87..14659.59
rows=159 loops=1)
              ->  Sort  (cost=75005.78..75005.78 rows=19264 width=12)
(actual time=15421.32..15421.32 rows=0 loops=1)
                    ->  Hash Join  (cost=1080.86..73634.84 rows=19264
width=12) (actual time=15421.10..15421.10 rows=0 loops=1)
                          ->  Seq Scan on article_fti f1
(cost=0.00..72216.86 rows=19264 width=4) (actual time=4879.59..14306.97
rows=350 loops=1)
                          ->  Hash  (cost=1063.29..1063.29 rows=7029
width=8) (actual time=1109.79..1109.79 rows=0 loops=1)
                                ->  Seq Scan on article p
(cost=0.00..1063.29 rows=7029 width=8) (actual time=12.20..1076.46
rows=7029 loops=1)
Total runtime: 52507.92 msec

This is way too long. I don't know why it doesn't use the indexes I gave
to it... If someone has got some idea as to what can be done to optimize
query-planner decisions, I'd be happy to comply. But anyway, I don't
worry about this too much and just force it to use them, once I know its
decision for seqential scan is wrong:

------------------------------------------------------------------------
--------------------

Case 3:
Searching the FTI-Table forcing use of indexes:
SET ENABLE_SEQSCAN=OFF;
EXPLAIN ANALYZE select distinct (p.article_id)
from article p, article_fti f1, article_fti f2, article_fti f3
where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
SET ENABLE_SEQSCAN=ON;

NOTICE:  QUERY PLAN:

Unique  (cost=5341.68..5342.61 rows=37 width=20) (actual
time=40.52..41.83 rows=65 loops=1)
  ->  Sort  (cost=5341.68..5341.68 rows=372 width=20) (actual
time=40.52..40.96rows=376 loops=1)
        ->  Nested Loop  (cost=3067.05..5325.79 rows=372 width=20)
(actual time=14.02..38.38 rows=376 loops=1)
              ->  Merge Join  (cost=3067.05..3084.32 rows=372 width=12)
(actualtime=13.95..20.45 rows=376 loops=1)
                    ->  Merge Join  (cost=2044.70..2054.37 rows=502
width=8) (actual time=5.75..7.81 rows=176 loops=1)
                          ->  Sort  (cost=1022.35..1022.35 rows=678
width=4) (actual time=2.31..2.44 rows=97 loops=1)
                                ->  Index Scan using
article_fti_string_idx on article_fti f2  (cost=0.00..990.47 rows=678
width=4) (actual time=0.16..1.74 rows=97 loops=1)
                          ->  Sort  (cost=1022.35..1022.35 rows=678
width=4) (actual time=3.42..3.66 rows=194 loops=1)
                                ->  Index Scan using
article_fti_string_idx on article_fti f3  (cost=0.00..990.47 rows=678
width=4) (actual time=0.13..2.53 rows=159 loops=1)
                    ->  Sort  (cost=1022.35..1022.35 rows=678 width=4)
(actual time=8.18..8.89 rows=532 loops=1)
                          ->  Index Scan using article_fti_string_idx on
article_fti f1  (cost=0.00..990.47 rows=678 width=4) (actual
time=0.13..5.98 rows=350 loops=1)
              ->  Index Scan using article_oid_idx on article p
(cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 rows=1
loops=376)
Total runtime: 42.41 msec

Not that's much more like it :) This query returns exactly 65 numbers.
But really we want not only the article-IDs, but also some info on the
article as well:

------------------------------------------------------------------------
--------------------

Case 3b:
Retrieving additional article-info:

SET ENABLE_SEQSCAN=OFF;
EXPLAIN ANALYZE
SELECT article_id, site_id, article_type, topstory, headline, published
from article where article_id in (
select distinct (p.article_id)
from article p, article_fti f1, article_fti f2, article_fti f3
where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id);
SET ENABLE_SEQSCAN=ON;

NOTICE:  QUERY PLAN:

Seq Scan on article  (cost=100000000.00..137645255.53 rows=3523
width=77) (actual time=1756.30..3720.44 rows=65 loops=1)
  SubPlan
    ->  Materialize  (cost=5342.61..5342.61 rows=37 width=20) (actual
time=0.17..0.28 rows=65 loops=7046)
          ->  Unique  (cost=5341.68..5342.61 rows=37 width=20) (actual
time=1182.07..1183.29 rows=65 loops=1)
                ->  Sort  (cost=5341.68..5341.68 rows=372 width=20)
(actual time=1182.06..1182.52 rows=376 loops=1)
                      ->  Nested Loop  (cost=3067.05..5325.79 rows=372
width=20) (actual time=318.55..1179.04 rows=376 loops=1)
                            ->  Merge Join  (cost=3067.05..3084.32
rows=372 width=12) (actual time=300.97..314.39 rows=376 loops=1)
                                  ->  Merge Join  (cost=2044.70..2054.37
rows=502 width=8) (actual time=159.24..163.45 rows=176 loops=1)
                                        ->  Sort  (cost=1022.35..1022.35
rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1)
                                              ->  Index Scan using
article_fti_string_idx on article_fti f2  (cost=0.00..990.47 rows=678
width=4) (actual time=1.46..69.08 rows=97 loops=1)
                                        ->  Sort  (cost=1022.35..1022.35
rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1)
                                              ->  Index Scan using
article_fti_string_idx on article_fti f3  (cost=0.00..990.47 rows=678
width=4) (actual time=0.19..85.06 rows=159 loops=1)
                                  ->  Sort  (cost=1022.35..1022.35
rows=678 width=4) (actual time=141.70..143.03 rows=532 loops=1)
                                        ->  Index Scan using
article_fti_string_idx on article_fti f1  (cost=0.00..990.47 rows=678
width=4) (actual time=0.17..138.95 rows=350 loops=1)
                            ->  Index Scan using article_oid_idx on
article p  (cost=0.00..6.01 rows=1 width=8) (actual time=1.65..2.27
rows=1 loops=376)
Total runtime: 3722.74 msec

Whoa! Why is that? How come this takes so extremely much longer?
Because...

------------------------------------------------------------------------
--------------------

Case 3c:
If I just select all the ids, I get a total runtime of  42.41 msec. And
if I do a select ... where article_id in (all the results from case3):

explain analyze SELECT article_id, site_id, article_type, topstory,
headline, published from article where article_id in
(8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,12784,13324,14
049,30982,31697,31924,31941,31944,32470,36810,37950,38913,39736,40470,41
330,45651,47050,47997,49692,51253,51730,51817,52111,52233,52653,52770,53
090,54410,54812,54890,55310,55974,56051,56590,56610,56630,56950,57210,57
690,59670,60951,62630,62650,63070,63130,63551,63570,65691,65710,65771,66
050,66490,66531,67030,67110);

NOTICE:  QUERY PLAN:

Index Scan using idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
idx0_article on article  (cost=0.00..378.61 rows=67 width=77) (actual
time=0.08..5.17 rows=65 loops=1)
Total runtime: 6.15 msec

I get just another 6.15msec.

Now I do think that the total runtime of 3b should really be 3+3c, which
amounts to roughly 50msec, but not nearly 4 seconds. What am I doing
wrong?

Thanks for your advice!

Regards,

    Markus

Re: Using FTI-Search (likely a more general runtime-puzzle)

От

Stephan Szabo

Дата:

24 июля 2002 г., 13:07:50

> This is way too long. I don't know why it doesn't use the indexes I gave
> to it... If someone has got some idea as to what can be done to optimize
> query-planner decisions, I'd be happy to comply. But anyway, I don't
> worry about this too much and just force it to use them, once I know its
> decision for seqential scan is wrong:

Have you vaccum analyzed the fti table?

> Case 3b:
> Retrieving additional article-info:
>
> SET ENABLE_SEQSCAN=OFF;
> EXPLAIN ANALYZE
> SELECT article_id, site_id, article_type, topstory, headline, published
> from article where article_id in (
> select distinct (p.article_id)
> from article p, article_fti f1, article_fti f2, article_fti f3
> where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
> and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id);
> SET ENABLE_SEQSCAN=ON;

I was going to suggest a subselect in from rather than the IN,
but perhaps you could just use distinct on and get the data from p.
Admittedly this uses a postgresql extension, and assumes that p.article_id
is unique, maybe like:

select distinct on (p.article_id) p.* from
 article p, article_fti f1, article_fti f2, article_fti f3
 where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
 and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;

Re: Using FTI-Search (likely a more general runtime-puzzle)

От

Oleg Bartunov

Дата:

24 июля 2002 г., 13:39:48

Markus,

have you tried our contrib/tsearch module ?

    Oleg
On Wed, 24 Jul 2002, Markus Wollny wrote:

> Hello!
>
> I have implemented the full text index feature from /contrib and so far
> the results look quite promising. We use it to index several columns of
> an article table. The main column is called 'text'. An example for a
> search would be to retrieve all the articles with "grand theft auto" in
> them somewhere (not necessarily one string but individual words).
>
> ------------------------------------------------------------------------
> --------------------
> Case 1: Our Good Old Fashioned Search (TM) over the article-table:
>
> ANALYZE select distinct (p1.article_id)
> from article p1, article p2, article p3
> where lower(p1.text) LIKE '%grand%'
> AND lower(p2.text) LIKE '%theft%'
> AND lower(p3.text) LIKE '%auto%'
> AND p1.article_id=p2.article_id
> and p1.article_id=p3.article_id
>
> Unique  (cost=3067.03..3067.04 rows=1 width=12) (actual
> time=1455.25..1455.44 rows=30 loops=1)
>   ->  Sort  (cost=3067.03..3067.03 rows=1 width=12) (actual
> time=1455.24..1455.29 rows=30 loops=1)
>         ->  Nested Loop  (cost=1098.52..3067.02 rows=1 width=12) (actual
> time=1211.54..1453.82 rows=30 loops=1)
>               ->  Hash Join  (cost=1098.52..2197.47 rows=6 width=8)
> (actual time=1121.32..1309.23 rows=30 loops=1)
>                     ->  Seq Scan on article p2  (cost=0.00..1098.43
> rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1)
>                     ->  Hash  (cost=1098.43..1098.43 rows=35 width=4)
> (actual time=766.21..766.21 rows=0 loops=1)
>                           ->  Seq Scan on article p3
> (cost=0.00..1098.43 rows=35 width=4) (actual time=13.63..763.73 rows=411
> loops=1)
>               ->  Index Scan using idx0_article on article p1
> (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80 rows=1
> loops=30)
> Total runtime: 1456.00 msec
>
> This is the time, fti needs to beat in order to be of any use to us.
>
> ------------------------------------------------------------------------
> --------------------
>
> Case 2:
> Searching the FTI-Table:
> EXPLAIN ANALYZE select distinct (p.article_id)
> from article p, article_fti f1, article_fti f2, article_fti f3
> where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
> and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
>
>
> Unique  (cost=39200652.79..39647427.57 rows=17870991 width=20) (actual
> time=52506.81..52506.81 rows=0 loops=1)
>   ->  Sort  (cost=39200652.79..39200652.79 rows=178709913 width=20)
> (actual time=52506.81..52506.81 rows=0 loops=1)
>         ->  Merge Join  (cost=222181.40..313718.83 rows=178709913
> width=20) (actual time=52505.69..52505.69 rows=0 loops=1)
>               ->  Merge Join  (cost=147175.62..170464.70 rows=1855421
> width=8) (actual time=37084.36..37084.36 rows=1 loops=1)
>                     ->  Sort  (cost=73587.81..73587.81 rows=19264
> width=4) (actual time=22423.27..22423.27 rows=1 loops=1)
>                           ->  Seq Scan on article_fti f2
> (cost=0.00..72216.86 rows=19264 width=4) (actual time=18297.22..22422.20
> rows=97 loops=1)
>                     ->  Sort  (cost=73587.81..73587.81 rows=19264
> width=4) (actual time=14661.04..14661.04 rows=1 loops=1)
>                           ->  Seq Scan on article_fti f3
> (cost=0.00..72216.86 rows=19264 width=4) (actual time=1344.87..14659.59
> rows=159 loops=1)
>               ->  Sort  (cost=75005.78..75005.78 rows=19264 width=12)
> (actual time=15421.32..15421.32 rows=0 loops=1)
>                     ->  Hash Join  (cost=1080.86..73634.84 rows=19264
> width=12) (actual time=15421.10..15421.10 rows=0 loops=1)
>                           ->  Seq Scan on article_fti f1
> (cost=0.00..72216.86 rows=19264 width=4) (actual time=4879.59..14306.97
> rows=350 loops=1)
>                           ->  Hash  (cost=1063.29..1063.29 rows=7029
> width=8) (actual time=1109.79..1109.79 rows=0 loops=1)
>                                 ->  Seq Scan on article p
> (cost=0.00..1063.29 rows=7029 width=8) (actual time=12.20..1076.46
> rows=7029 loops=1)
> Total runtime: 52507.92 msec
>
> This is way too long. I don't know why it doesn't use the indexes I gave
> to it... If someone has got some idea as to what can be done to optimize
> query-planner decisions, I'd be happy to comply. But anyway, I don't
> worry about this too much and just force it to use them, once I know its
> decision for seqential scan is wrong:
>
> ------------------------------------------------------------------------
> --------------------
>
> Case 3:
> Searching the FTI-Table forcing use of indexes:
> SET ENABLE_SEQSCAN=OFF;
> EXPLAIN ANALYZE select distinct (p.article_id)
> from article p, article_fti f1, article_fti f2, article_fti f3
> where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
> and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
> SET ENABLE_SEQSCAN=ON;
>
> NOTICE:  QUERY PLAN:
>
> Unique  (cost=5341.68..5342.61 rows=37 width=20) (actual
> time=40.52..41.83 rows=65 loops=1)
>   ->  Sort  (cost=5341.68..5341.68 rows=372 width=20) (actual
> time=40.52..40.96rows=376 loops=1)
>         ->  Nested Loop  (cost=3067.05..5325.79 rows=372 width=20)
> (actual time=14.02..38.38 rows=376 loops=1)
>               ->  Merge Join  (cost=3067.05..3084.32 rows=372 width=12)
> (actualtime=13.95..20.45 rows=376 loops=1)
>                     ->  Merge Join  (cost=2044.70..2054.37 rows=502
> width=8) (actual time=5.75..7.81 rows=176 loops=1)
>                           ->  Sort  (cost=1022.35..1022.35 rows=678
> width=4) (actual time=2.31..2.44 rows=97 loops=1)
>                                 ->  Index Scan using
> article_fti_string_idx on article_fti f2  (cost=0.00..990.47 rows=678
> width=4) (actual time=0.16..1.74 rows=97 loops=1)
>                           ->  Sort  (cost=1022.35..1022.35 rows=678
> width=4) (actual time=3.42..3.66 rows=194 loops=1)
>                                 ->  Index Scan using
> article_fti_string_idx on article_fti f3  (cost=0.00..990.47 rows=678
> width=4) (actual time=0.13..2.53 rows=159 loops=1)
>                     ->  Sort  (cost=1022.35..1022.35 rows=678 width=4)
> (actual time=8.18..8.89 rows=532 loops=1)
>                           ->  Index Scan using article_fti_string_idx on
> article_fti f1  (cost=0.00..990.47 rows=678 width=4) (actual
> time=0.13..5.98 rows=350 loops=1)
>               ->  Index Scan using article_oid_idx on article p
> (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 rows=1
> loops=376)
> Total runtime: 42.41 msec
>
> Not that's much more like it :) This query returns exactly 65 numbers.
> But really we want not only the article-IDs, but also some info on the
> article as well:
>
> ------------------------------------------------------------------------
> --------------------
>
> Case 3b:
> Retrieving additional article-info:
>
> SET ENABLE_SEQSCAN=OFF;
> EXPLAIN ANALYZE
> SELECT article_id, site_id, article_type, topstory, headline, published
> from article where article_id in (
> select distinct (p.article_id)
> from article p, article_fti f1, article_fti f2, article_fti f3
> where f1.string ='grand' and f2.string ='theft' and f3.string ='auto'
> and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id);
> SET ENABLE_SEQSCAN=ON;
>
> NOTICE:  QUERY PLAN:
>
> Seq Scan on article  (cost=100000000.00..137645255.53 rows=3523
> width=77) (actual time=1756.30..3720.44 rows=65 loops=1)
>   SubPlan
>     ->  Materialize  (cost=5342.61..5342.61 rows=37 width=20) (actual
> time=0.17..0.28 rows=65 loops=7046)
>           ->  Unique  (cost=5341.68..5342.61 rows=37 width=20) (actual
> time=1182.07..1183.29 rows=65 loops=1)
>                 ->  Sort  (cost=5341.68..5341.68 rows=372 width=20)
> (actual time=1182.06..1182.52 rows=376 loops=1)
>                       ->  Nested Loop  (cost=3067.05..5325.79 rows=372
> width=20) (actual time=318.55..1179.04 rows=376 loops=1)
>                             ->  Merge Join  (cost=3067.05..3084.32
> rows=372 width=12) (actual time=300.97..314.39 rows=376 loops=1)
>                                   ->  Merge Join  (cost=2044.70..2054.37
> rows=502 width=8) (actual time=159.24..163.45 rows=176 loops=1)
>                                         ->  Sort  (cost=1022.35..1022.35
> rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1)
>                                               ->  Index Scan using
> article_fti_string_idx on article_fti f2  (cost=0.00..990.47 rows=678
> width=4) (actual time=1.46..69.08 rows=97 loops=1)
>                                         ->  Sort  (cost=1022.35..1022.35
> rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1)
>                                               ->  Index Scan using
> article_fti_string_idx on article_fti f3  (cost=0.00..990.47 rows=678
> width=4) (actual time=0.19..85.06 rows=159 loops=1)
>                                   ->  Sort  (cost=1022.35..1022.35
> rows=678 width=4) (actual time=141.70..143.03 rows=532 loops=1)
>                                         ->  Index Scan using
> article_fti_string_idx on article_fti f1  (cost=0.00..990.47 rows=678
> width=4) (actual time=0.17..138.95 rows=350 loops=1)
>                             ->  Index Scan using article_oid_idx on
> article p  (cost=0.00..6.01 rows=1 width=8) (actual time=1.65..2.27
> rows=1 loops=376)
> Total runtime: 3722.74 msec
>
> Whoa! Why is that? How come this takes so extremely much longer?
> Because...
>
> ------------------------------------------------------------------------
> --------------------
>
> Case 3c:
> If I just select all the ids, I get a total runtime of  42.41 msec. And
> if I do a select ... where article_id in (all the results from case3):
>
> explain analyze SELECT article_id, site_id, article_type, topstory,
> headline, published from article where article_id in
> (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,12784,13324,14
> 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,39736,40470,41
> 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,52653,52770,53
> 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,56950,57210,57
> 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,65710,65771,66
> 050,66490,66531,67030,67110);
>
> NOTICE:  QUERY PLAN:
>
> Index Scan using idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article, idx0_article, idx0_article, idx0_article, idx0_article,
> idx0_article on article  (cost=0.00..378.61 rows=67 width=77) (actual
> time=0.08..5.17 rows=65 loops=1)
> Total runtime: 6.15 msec
>
> I get just another 6.15msec.
>
> Now I do think that the total runtime of 3b should really be 3+3c, which
> amounts to roughly 50msec, but not nearly 4 seconds. What am I doing
> wrong?
>
> Thanks for your advice!
>
> Regards,
>
>     Markus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Using FTI-Search (likely a more general runtime-puzzle)

От

"Markus Wollny"

Дата:

24 июля 2002 г., 15:17:48

Hello!
 
You seem to have hit a bullseye here :) p.article_id is of course unique
and your suggestion gives us everything we could ask for - in just under
60ms - that's not only wonderful but lightning fast! As we are not
planning on switching the database at all (we are just finishing
migration from Oracle to PostgreSQL), we are more than happy with any
extension that does the job so much, much faster.
 
Thank you very much!
 
Regards,
 
   Markus Wollny

    -----Ursprüngliche Nachricht----- 
    Von: Stephan Szabo 
    Gesendet: Mi 24.07.2002 19:07 
    An: Markus Wollny 
    Cc: pgsql-general@postgresql.org 
    Betreff: Re: [GENERAL] Using FTI-Search (likely a more general
runtime-puzzle)
    
    


    > This is way too long. I don't know why it doesn't use the
indexes I gave
    > to it... If someone has got some idea as to what can be done
to optimize
    > query-planner decisions, I'd be happy to comply. But anyway, I
don't
    > worry about this too much and just force it to use them, once
I know its
    > decision for seqential scan is wrong:
    
    Have you vaccum analyzed the fti table?
    
    > Case 3b:
    > Retrieving additional article-info:
    >
    > SET ENABLE_SEQSCAN=OFF;
    > EXPLAIN ANALYZE
    > SELECT article_id, site_id, article_type, topstory, headline,
published
    > from article where article_id in (
    > select distinct (p.article_id)
    > from article p, article_fti f1, article_fti f2, article_fti f3
    > where f1.string ='grand' and f2.string ='theft' and f3.string
='auto'
    > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id);
    > SET ENABLE_SEQSCAN=ON;
    
    I was going to suggest a subselect in from rather than the IN,
    but perhaps you could just use distinct on and get the data from
p.
    Admittedly this uses a postgresql extension, and assumes that
p.article_id
    is unique, maybe like:
    
    select distinct on (p.article_id) p.* from
     article p, article_fti f1, article_fti f2, article_fti f3
     where f1.string ='grand' and f2.string ='theft' and f3.string
='auto'
     and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;

Re: Using FTI-Search (likely a more general runtime-puzzle)

От

"Markus Wollny"

Дата:

29 июля 2002 г., 07:49:21

Hi!

I am about to evaluate the tsearch module from contrib, but as yet I
haven't succeeded in getting it to run.

I followed instructions on installation of tsearch; the new datatype and
the corresponding functions seem all to be there. So I want to try it
out on some existing data. In table article I want to set up a full-text
index on column text. After adding the column textindex of type txtidx,
I try to fill that column with data, but this just crashes the backend
immediately:

pcgames=# update article set textindex=txt2txtidx(text);
pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The logfile doesn't really give much information on what exactly
happened:

2002-07-29 13:37:20 [22127]  DEBUG:  database system is ready
2002-07-29 13:37:46 [22124]  DEBUG:  server process (pid 22137) was
terminated by signal 11
2002-07-29 13:37:46 [22124]  DEBUG:  terminating any other active server
processes
2002-07-29 13:37:46 [22124]  DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-07-29 13:37:46 [22139]  FATAL 1:  The database system is starting
up
2002-07-29 13:37:46 [22138]  DEBUG:  database system was interrupted at
2002-07-29 13:37:20 CEST
2002-07-29 13:37:46 [22138]  DEBUG:  checkpoint record is at E/3B785168
2002-07-29 13:37:46 [22138]  DEBUG:  redo record is at E/3B785168; undo
record is at 0/0; shutdown TRUE
2002-07-29 13:37:46 [22138]  DEBUG:  next transaction id: 179915804;
next oid: 95552489
2002-07-29 13:37:46 [22138]  DEBUG:  database system was not properly
shut down; automatic recovery in progress
2002-07-29 13:37:46 [22138]  DEBUG:  ReadRecord: record with zero length
at E/3B7851A8
2002-07-29 13:37:46 [22138]  DEBUG:  redo is not required
2002-07-29 13:37:48 [22138]  DEBUG:  database system is ready

Any ideas?

Regards,

    Markus

> -----Ursprungliche Nachricht-----
> Von: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Gesendet: Mittwoch, 24. Juli 2002 21:20
> An: Markus Wollny
> Betreff: Re: AW: [GENERAL] Using FTI-Search (likely a more
> general runtime-puzzle)
>
>
> On Wed, 24 Jul 2002, Markus Wollny wrote:
>
> > Hi!
> >
> > No I haven't as yet - does it provide other/better/more
> features than
> > the fti-module?
> >
>
> Sure, it's designed mostly for full text search and comes from
> Information Retrieval. There is a README file with some intro.
>
> > Regards,
> >
> >    Markus
> >
> >     -----UrsprУМngliche Nachricht-----
> >     Von: Oleg Bartunov
> >     Gesendet: Mi 24.07.2002 19:37
> >     An: Markus Wollny
> >     Cc: pgsql-general@postgresql.org
> >     Betreff: Re: [GENERAL] Using FTI-Search (likely a more general
> > runtime-puzzle)
> >
> >
> >
> >     Markus,
> >
> >     have you tried our contrib/tsearch module ?
> >
> >             Oleg
> >     On Wed, 24 Jul 2002, Markus Wollny wrote:
> >
> >     > Hello!
> >     >
> >     > I have implemented the full text index feature from /contrib
> > and so far
> >     > the results look quite promising. We use it to index several
> > columns of
> >     > an article table. The main column is called 'text'. An example
> > for a
> >     > search would be to retrieve all the articles with "grand theft
> > auto" in
> >     > them somewhere (not necessarily one string but individual
> > words).
> >     >
> >     >
> >
> --------------------------------------------------------------
> ----------
> >     > --------------------
> >     > Case 1: Our Good Old Fashioned Search (TM) over the
> > article-table:
> >     >
> >     > ANALYZE select distinct (p1.article_id)
> >     > from article p1, article p2, article p3
> >     > where lower(p1.text) LIKE '%grand%'
> >     > AND lower(p2.text) LIKE '%theft%'
> >     > AND lower(p3.text) LIKE '%auto%'
> >     > AND p1.article_id=p2.article_id
> >     > and p1.article_id=p3.article_id
> >     >
> >     > Unique  (cost=3067.03..3067.04 rows=1 width=12) (actual
> >     > time=1455.25..1455.44 rows=30 loops=1)
> >     >   ->  Sort  (cost=3067.03..3067.03 rows=1 width=12) (actual
> >     > time=1455.24..1455.29 rows=30 loops=1)
> >     >         ->  Nested Loop  (cost=1098.52..3067.02 rows=1
> > width=12) (actual
> >     > time=1211.54..1453.82 rows=30 loops=1)
> >     >               ->  Hash Join  (cost=1098.52..2197.47 rows=6
> > width=8)
> >     > (actual time=1121.32..1309.23 rows=30 loops=1)
> >     >                     ->  Seq Scan on article p2
> > (cost=0.00..1098.43
> >     > rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1)
> >     >                     ->  Hash  (cost=1098.43..1098.43 rows=35
> > width=4)
> >     > (actual time=766.21..766.21 rows=0 loops=1)
> >     >                           ->  Seq Scan on article p3
> >     > (cost=0.00..1098.43 rows=35 width=4) (actual
> > time=13.63..763.73 rows=411
> >     > loops=1)
> >     >               ->  Index Scan using idx0_article on article p1
> >     > (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80
> > rows=1
> >     > loops=30)
> >     > Total runtime: 1456.00 msec
> >     >
> >     > This is the time, fti needs to beat in order to be of any use
> > to us.
> >     >
> >     >
> >
> --------------------------------------------------------------
> ----------
> >     > --------------------
> >     >
> >     > Case 2:
> >     > Searching the FTI-Table:
> >     > EXPLAIN ANALYZE select distinct (p.article_id)
> >     > from article p, article_fti f1, article_fti f2, article_fti f3
> >     > where f1.string ='grand' and f2.string ='theft' and f3.string
> > ='auto'
> >     > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
> >     >
> >     >
> >     > Unique  (cost=39200652.79..39647427.57 rows=17870991 width=20)
> > (actual
> >     > time=52506.81..52506.81 rows=0 loops=1)
> >     >   ->  Sort  (cost=39200652.79..39200652.79 rows=178709913
> > width=20)
> >     > (actual time=52506.81..52506.81 rows=0 loops=1)
> >     >         ->  Merge Join  (cost=222181.40..313718.83
> > rows=178709913
> >     > width=20) (actual time=52505.69..52505.69 rows=0 loops=1)
> >     >               ->  Merge Join  (cost=147175.62..170464.70
> > rows=1855421
> >     > width=8) (actual time=37084.36..37084.36 rows=1 loops=1)
> >     >                     ->  Sort  (cost=73587.81..73587.81
> > rows=19264
> >     > width=4) (actual time=22423.27..22423.27 rows=1 loops=1)
> >     >                           ->  Seq Scan on article_fti f2
> >     > (cost=0.00..72216.86 rows=19264 width=4) (actual
> > time=18297.22..22422.20
> >     > rows=97 loops=1)
> >     >                     ->  Sort  (cost=73587.81..73587.81
> > rows=19264
> >     > width=4) (actual time=14661.04..14661.04 rows=1 loops=1)
> >     >                           ->  Seq Scan on article_fti f3
> >     > (cost=0.00..72216.86 rows=19264 width=4) (actual
> > time=1344.87..14659.59
> >     > rows=159 loops=1)
> >     >               ->  Sort  (cost=75005.78..75005.78 rows=19264
> > width=12)
> >     > (actual time=15421.32..15421.32 rows=0 loops=1)
> >     >                     ->  Hash Join  (cost=1080.86..73634.84
> > rows=19264
> >     > width=12) (actual time=15421.10..15421.10 rows=0 loops=1)
> >     >                           ->  Seq Scan on article_fti f1
> >     > (cost=0.00..72216.86 rows=19264 width=4) (actual
> > time=4879.59..14306.97
> >     > rows=350 loops=1)
> >     >                           ->  Hash  (cost=1063.29..1063.29
> > rows=7029
> >     > width=8) (actual time=1109.79..1109.79 rows=0 loops=1)
> >     >                                 ->  Seq Scan on article p
> >     > (cost=0.00..1063.29 rows=7029 width=8) (actual
> > time=12.20..1076.46
> >     > rows=7029 loops=1)
> >     > Total runtime: 52507.92 msec
> >     >
> >     > This is way too long. I don't know why it doesn't use the
> > indexes I gave
> >     > to it... If someone has got some idea as to what can be done
> > to optimize
> >     > query-planner decisions, I'd be happy to comply. But anyway, I
> > don't
> >     > worry about this too much and just force it to use them, once
> > I know its
> >     > decision for seqential scan is wrong:
> >     >
> >     >
> >
> --------------------------------------------------------------
> ----------
> >     > --------------------
> >     >
> >     > Case 3:
> >     > Searching the FTI-Table forcing use of indexes:
> >     > SET ENABLE_SEQSCAN=OFF;
> >     > EXPLAIN ANALYZE select distinct (p.article_id)
> >     > from article p, article_fti f1, article_fti f2, article_fti f3
> >     > where f1.string ='grand' and f2.string ='theft' and f3.string
> > ='auto'
> >     > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
> >     > SET ENABLE_SEQSCAN=ON;
> >     >
> >     > NOTICE:  QUERY PLAN:
> >     >
> >     > Unique  (cost=5341.68..5342.61 rows=37 width=20) (actual
> >     > time=40.52..41.83 rows=65 loops=1)
> >     >   ->  Sort  (cost=5341.68..5341.68 rows=372 width=20) (actual
> >     > time=40.52..40.96rows=376 loops=1)
> >     >         ->  Nested Loop  (cost=3067.05..5325.79 rows=372
> > width=20)
> >     > (actual time=14.02..38.38 rows=376 loops=1)
> >     >               ->  Merge Join  (cost=3067.05..3084.32 rows=372
> > width=12)
> >     > (actualtime=13.95..20.45 rows=376 loops=1)
> >     >                     ->  Merge Join  (cost=2044.70..2054.37
> > rows=502
> >     > width=8) (actual time=5.75..7.81 rows=176 loops=1)
> >     >                           ->  Sort  (cost=1022.35..1022.35
> > rows=678
> >     > width=4) (actual time=2.31..2.44 rows=97 loops=1)
> >     >                                 ->  Index Scan using
> >     > article_fti_string_idx on article_fti f2  (cost=0.00..990.47
> > rows=678
> >     > width=4) (actual time=0.16..1.74 rows=97 loops=1)
> >     >                           ->  Sort  (cost=1022.35..1022.35
> > rows=678
> >     > width=4) (actual time=3.42..3.66 rows=194 loops=1)
> >     >                                 ->  Index Scan using
> >     > article_fti_string_idx on article_fti f3  (cost=0.00..990.47
> > rows=678
> >     > width=4) (actual time=0.13..2.53 rows=159 loops=1)
> >     >                     ->  Sort  (cost=1022.35..1022.35 rows=678
> > width=4)
> >     > (actual time=8.18..8.89 rows=532 loops=1)
> >     >                           ->  Index Scan using
> > article_fti_string_idx on
> >     > article_fti f1  (cost=0.00..990.47 rows=678 width=4) (actual
> >     > time=0.13..5.98 rows=350 loops=1)
> >     >               ->  Index Scan using article_oid_idx on article
> > p
> >     > (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03
> > rows=1
> >     > loops=376)
> >     > Total runtime: 42.41 msec
> >     >
> >     > Not that's much more like it :) This query returns exactly 65
> > numbers.
> >     > But really we want not only the article-IDs, but also some
> > info on the
> >     > article as well:
> >     >
> >     >
> >
> --------------------------------------------------------------
> ----------
> >     > --------------------
> >     >
> >     > Case 3b:
> >     > Retrieving additional article-info:
> >     >
> >     > SET ENABLE_SEQSCAN=OFF;
> >     > EXPLAIN ANALYZE
> >     > SELECT article_id, site_id, article_type, topstory, headline,
> > published
> >     > from article where article_id in (
> >     > select distinct (p.article_id)
> >     > from article p, article_fti f1, article_fti f2, article_fti f3
> >     > where f1.string ='grand' and f2.string ='theft' and f3.string
> > ='auto'
> >     > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id);
> >     > SET ENABLE_SEQSCAN=ON;
> >     >
> >     > NOTICE:  QUERY PLAN:
> >     >
> >     > Seq Scan on article  (cost=100000000.00..137645255.53
> > rows=3523
> >     > width=77) (actual time=1756.30..3720.44 rows=65 loops=1)
> >     >   SubPlan
> >     >     ->  Materialize  (cost=5342.61..5342.61 rows=37 width=20)
> > (actual
> >     > time=0.17..0.28 rows=65 loops=7046)
> >     >           ->  Unique  (cost=5341.68..5342.61 rows=37 width=20)
> > (actual
> >     > time=1182.07..1183.29 rows=65 loops=1)
> >     >                 ->  Sort  (cost=5341.68..5341.68 rows=372
> > width=20)
> >     > (actual time=1182.06..1182.52 rows=376 loops=1)
> >     >                       ->  Nested Loop  (cost=3067.05..5325.79
> > rows=372
> >     > width=20) (actual time=318.55..1179.04 rows=376 loops=1)
> >     >                             ->  Merge Join
> > (cost=3067.05..3084.32
> >     > rows=372 width=12) (actual time=300.97..314.39 rows=376
> > loops=1)
> >     >                                   ->  Merge Join
> > (cost=2044.70..2054.37
> >     > rows=502 width=8) (actual time=159.24..163.45 rows=176
> > loops=1)
> >     >                                         ->  Sort
> > (cost=1022.35..1022.35
> >     > rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1)
> >     >                                               ->  Index Scan
> > using
> >     > article_fti_string_idx on article_fti f2  (cost=0.00..990.47
> > rows=678
> >     > width=4) (actual time=1.46..69.08 rows=97 loops=1)
> >     >                                         ->  Sort
> > (cost=1022.35..1022.35
> >     > rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1)
> >     >                                               ->  Index Scan
> > using
> >     > article_fti_string_idx on article_fti f3  (cost=0.00..990.47
> > rows=678
> >     > width=4) (actual time=0.19..85.06 rows=159 loops=1)
> >     >                                   ->  Sort
> > (cost=1022.35..1022.35
> >     > rows=678 width=4) (actual time=141.70..143.03 rows=532
> > loops=1)
> >     >                                         ->  Index Scan using
> >     > article_fti_string_idx on article_fti f1  (cost=0.00..990.47
> > rows=678
> >     > width=4) (actual time=0.17..138.95 rows=350 loops=1)
> >     >                             ->  Index Scan using
> > article_oid_idx on
> >     > article p  (cost=0.00..6.01 rows=1 width=8) (actual
> > time=1.65..2.27
> >     > rows=1 loops=376)
> >     > Total runtime: 3722.74 msec
> >     >
> >     > Whoa! Why is that? How come this takes so extremely much
> > longer?
> >     > Because...
> >     >
> >     >
> >
> --------------------------------------------------------------
> ----------
> >     > --------------------
> >     >
> >     > Case 3c:
> >     > If I just select all the ids, I get a total runtime of  42.41
> > msec. And
> >     > if I do a select ... where article_id in (all the results from
> > case3):
> >     >
> >     > explain analyze SELECT article_id, site_id, article_type,
> > topstory,
> >     > headline, published from article where article_id in
> >     >
> >
> (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,1278
> 4,13324,14
> >     >
> >
> 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,3973
> 6,40470,41
> >     >
> >
> 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,5265
> 3,52770,53
> >     >
> >
> 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,5695
> 0,57210,57
> >     >
> >
> 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,6571
> 0,65771,66
> >     > 050,66490,66531,67030,67110);
> >     >
> >     > NOTICE:  QUERY PLAN:
> >     >
> >     > Index Scan using idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > idx0_article,
> >     > idx0_article on article  (cost=0.00..378.61 rows=67 width=77)
> > (actual
> >     > time=0.08..5.17 rows=65 loops=1)
> >     > Total runtime: 6.15 msec
> >     >
> >     > I get just another 6.15msec.
> >     >
> >     > Now I do think that the total runtime of 3b should really be
> > 3+3c, which
> >     > amounts to roughly 50msec, but not nearly 4 seconds. What am I
> > doing
> >     > wrong?
> >     >
> >     > Thanks for your advice!
> >     >
> >     > Regards,
> >     >
> >     >       Markus
> >     >
> >     > ---------------------------(end of
> > broadcast)---------------------------
> >     > TIP 1: subscribe and unsubscribe commands go to
> > majordomo@postgresql.org
> >     >
> >
> >             Regards,
> >                     Oleg
> >     _____________________________________________________________
> >     Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> >     Sternberg Astronomical Institute, Moscow University (Russia)
> >     Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> >     phone: +007(095)939-16-83, +007(095)939-23-83
> >
> >
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>

Re: Using FTI-Search (likely a more general

От

Oleg Bartunov

Дата:

29 июля 2002 г., 11:50:45

Markus,

version of PostgreSQL ? Also, could you try just

select txt2txtidx(text); where 'text' is a sample entry from your table.

    Oleg

On Mon, 29 Jul 2002, Markus Wollny wrote:

> Hi!
>
> I am about to evaluate the tsearch module from contrib, but as yet I
> haven't succeeded in getting it to run.
>
> I followed instructions on installation of tsearch; the new datatype and
> the corresponding functions seem all to be there. So I want to try it
> out on some existing data. In table article I want to set up a full-text
> index on column text. After adding the column textindex of type txtidx,
> I try to fill that column with data, but this just crashes the backend
> immediately:
>
> pcgames=# update article set textindex=txt2txtidx(text);
> pqReadData() -- backend closed the channel unexpectedly.
>         This probably means the backend terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> The logfile doesn't really give much information on what exactly
> happened:
>
> 2002-07-29 13:37:20 [22127]  DEBUG:  database system is ready
> 2002-07-29 13:37:46 [22124]  DEBUG:  server process (pid 22137) was
> terminated by signal 11
> 2002-07-29 13:37:46 [22124]  DEBUG:  terminating any other active server
> processes
> 2002-07-29 13:37:46 [22124]  DEBUG:  all server processes terminated;
> reinitializing shared memory and semaphores
> 2002-07-29 13:37:46 [22139]  FATAL 1:  The database system is starting
> up
> 2002-07-29 13:37:46 [22138]  DEBUG:  database system was interrupted at
> 2002-07-29 13:37:20 CEST
> 2002-07-29 13:37:46 [22138]  DEBUG:  checkpoint record is at E/3B785168
> 2002-07-29 13:37:46 [22138]  DEBUG:  redo record is at E/3B785168; undo
> record is at 0/0; shutdown TRUE
> 2002-07-29 13:37:46 [22138]  DEBUG:  next transaction id: 179915804;
> next oid: 95552489
> 2002-07-29 13:37:46 [22138]  DEBUG:  database system was not properly
> shut down; automatic recovery in progress
> 2002-07-29 13:37:46 [22138]  DEBUG:  ReadRecord: record with zero length
> at E/3B7851A8
> 2002-07-29 13:37:46 [22138]  DEBUG:  redo is not required
> 2002-07-29 13:37:48 [22138]  DEBUG:  database system is ready
>
> Any ideas?
>
> Regards,
>
>     Markus
>
> > -----Ursprungliche Nachricht-----
> > Von: Oleg Bartunov [mailto:oleg@sai.msu.su]
> > Gesendet: Mittwoch, 24. Juli 2002 21:20
> > An: Markus Wollny
> > Betreff: Re: AW: [GENERAL] Using FTI-Search (likely a more
> > general runtime-puzzle)
> >
> >
> > On Wed, 24 Jul 2002, Markus Wollny wrote:
> >
> > > Hi!
> > >
> > > No I haven't as yet - does it provide other/better/more
> > features than
> > > the fti-module?
> > >
> >
> > Sure, it's designed mostly for full text search and comes from
> > Information Retrieval. There is a README file with some intro.
> >
> > > Regards,
> > >
> > >    Markus
> > >
> > >     -----UrsprУМngliche Nachricht-----
> > >     Von: Oleg Bartunov
> > >     Gesendet: Mi 24.07.2002 19:37
> > >     An: Markus Wollny
> > >     Cc: pgsql-general@postgresql.org
> > >     Betreff: Re: [GENERAL] Using FTI-Search (likely a more general
> > > runtime-puzzle)
> > >
> > >
> > >
> > >     Markus,
> > >
> > >     have you tried our contrib/tsearch module ?
> > >
> > >             Oleg
> > >     On Wed, 24 Jul 2002, Markus Wollny wrote:
> > >
> > >     > Hello!
> > >     >
> > >     > I have implemented the full text index feature from /contrib
> > > and so far
> > >     > the results look quite promising. We use it to index several
> > > columns of
> > >     > an article table. The main column is called 'text'. An example
> > > for a
> > >     > search would be to retrieve all the articles with "grand theft
> > > auto" in
> > >     > them somewhere (not necessarily one string but individual
> > > words).
> > >     >
> > >     >
> > >
> > --------------------------------------------------------------
> > ----------
> > >     > --------------------
> > >     > Case 1: Our Good Old Fashioned Search (TM) over the
> > > article-table:
> > >     >
> > >     > ANALYZE select distinct (p1.article_id)
> > >     > from article p1, article p2, article p3
> > >     > where lower(p1.text) LIKE '%grand%'
> > >     > AND lower(p2.text) LIKE '%theft%'
> > >     > AND lower(p3.text) LIKE '%auto%'
> > >     > AND p1.article_id=p2.article_id
> > >     > and p1.article_id=p3.article_id
> > >     >
> > >     > Unique  (cost=3067.03..3067.04 rows=1 width=12) (actual
> > >     > time=1455.25..1455.44 rows=30 loops=1)
> > >     >   ->  Sort  (cost=3067.03..3067.03 rows=1 width=12) (actual
> > >     > time=1455.24..1455.29 rows=30 loops=1)
> > >     >         ->  Nested Loop  (cost=1098.52..3067.02 rows=1
> > > width=12) (actual
> > >     > time=1211.54..1453.82 rows=30 loops=1)
> > >     >               ->  Hash Join  (cost=1098.52..2197.47 rows=6
> > > width=8)
> > >     > (actual time=1121.32..1309.23 rows=30 loops=1)
> > >     >                     ->  Seq Scan on article p2
> > > (cost=0.00..1098.43
> > >     > rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1)
> > >     >                     ->  Hash  (cost=1098.43..1098.43 rows=35
> > > width=4)
> > >     > (actual time=766.21..766.21 rows=0 loops=1)
> > >     >                           ->  Seq Scan on article p3
> > >     > (cost=0.00..1098.43 rows=35 width=4) (actual
> > > time=13.63..763.73 rows=411
> > >     > loops=1)
> > >     >               ->  Index Scan using idx0_article on article p1
> > >     > (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80
> > > rows=1
> > >     > loops=30)
> > >     > Total runtime: 1456.00 msec
> > >     >
> > >     > This is the time, fti needs to beat in order to be of any use
> > > to us.
> > >     >
> > >     >
> > >
> > --------------------------------------------------------------
> > ----------
> > >     > --------------------
> > >     >
> > >     > Case 2:
> > >     > Searching the FTI-Table:
> > >     > EXPLAIN ANALYZE select distinct (p.article_id)
> > >     > from article p, article_fti f1, article_fti f2, article_fti f3
> > >     > where f1.string ='grand' and f2.string ='theft' and f3.string
> > > ='auto'
> > >     > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
> > >     >
> > >     >
> > >     > Unique  (cost=39200652.79..39647427.57 rows=17870991 width=20)
> > > (actual
> > >     > time=52506.81..52506.81 rows=0 loops=1)
> > >     >   ->  Sort  (cost=39200652.79..39200652.79 rows=178709913
> > > width=20)
> > >     > (actual time=52506.81..52506.81 rows=0 loops=1)
> > >     >         ->  Merge Join  (cost=222181.40..313718.83
> > > rows=178709913
> > >     > width=20) (actual time=52505.69..52505.69 rows=0 loops=1)
> > >     >               ->  Merge Join  (cost=147175.62..170464.70
> > > rows=1855421
> > >     > width=8) (actual time=37084.36..37084.36 rows=1 loops=1)
> > >     >                     ->  Sort  (cost=73587.81..73587.81
> > > rows=19264
> > >     > width=4) (actual time=22423.27..22423.27 rows=1 loops=1)
> > >     >                           ->  Seq Scan on article_fti f2
> > >     > (cost=0.00..72216.86 rows=19264 width=4) (actual
> > > time=18297.22..22422.20
> > >     > rows=97 loops=1)
> > >     >                     ->  Sort  (cost=73587.81..73587.81
> > > rows=19264
> > >     > width=4) (actual time=14661.04..14661.04 rows=1 loops=1)
> > >     >                           ->  Seq Scan on article_fti f3
> > >     > (cost=0.00..72216.86 rows=19264 width=4) (actual
> > > time=1344.87..14659.59
> > >     > rows=159 loops=1)
> > >     >               ->  Sort  (cost=75005.78..75005.78 rows=19264
> > > width=12)
> > >     > (actual time=15421.32..15421.32 rows=0 loops=1)
> > >     >                     ->  Hash Join  (cost=1080.86..73634.84
> > > rows=19264
> > >     > width=12) (actual time=15421.10..15421.10 rows=0 loops=1)
> > >     >                           ->  Seq Scan on article_fti f1
> > >     > (cost=0.00..72216.86 rows=19264 width=4) (actual
> > > time=4879.59..14306.97
> > >     > rows=350 loops=1)
> > >     >                           ->  Hash  (cost=1063.29..1063.29
> > > rows=7029
> > >     > width=8) (actual time=1109.79..1109.79 rows=0 loops=1)
> > >     >                                 ->  Seq Scan on article p
> > >     > (cost=0.00..1063.29 rows=7029 width=8) (actual
> > > time=12.20..1076.46
> > >     > rows=7029 loops=1)
> > >     > Total runtime: 52507.92 msec
> > >     >
> > >     > This is way too long. I don't know why it doesn't use the
> > > indexes I gave
> > >     > to it... If someone has got some idea as to what can be done
> > > to optimize
> > >     > query-planner decisions, I'd be happy to comply. But anyway, I
> > > don't
> > >     > worry about this too much and just force it to use them, once
> > > I know its
> > >     > decision for seqential scan is wrong:
> > >     >
> > >     >
> > >
> > --------------------------------------------------------------
> > ----------
> > >     > --------------------
> > >     >
> > >     > Case 3:
> > >     > Searching the FTI-Table forcing use of indexes:
> > >     > SET ENABLE_SEQSCAN=OFF;
> > >     > EXPLAIN ANALYZE select distinct (p.article_id)
> > >     > from article p, article_fti f1, article_fti f2, article_fti f3
> > >     > where f1.string ='grand' and f2.string ='theft' and f3.string
> > > ='auto'
> > >     > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
> > >     > SET ENABLE_SEQSCAN=ON;
> > >     >
> > >     > NOTICE:  QUERY PLAN:
> > >     >
> > >     > Unique  (cost=5341.68..5342.61 rows=37 width=20) (actual
> > >     > time=40.52..41.83 rows=65 loops=1)
> > >     >   ->  Sort  (cost=5341.68..5341.68 rows=372 width=20) (actual
> > >     > time=40.52..40.96rows=376 loops=1)
> > >     >         ->  Nested Loop  (cost=3067.05..5325.79 rows=372
> > > width=20)
> > >     > (actual time=14.02..38.38 rows=376 loops=1)
> > >     >               ->  Merge Join  (cost=3067.05..3084.32 rows=372
> > > width=12)
> > >     > (actualtime=13.95..20.45 rows=376 loops=1)
> > >     >                     ->  Merge Join  (cost=2044.70..2054.37
> > > rows=502
> > >     > width=8) (actual time=5.75..7.81 rows=176 loops=1)
> > >     >                           ->  Sort  (cost=1022.35..1022.35
> > > rows=678
> > >     > width=4) (actual time=2.31..2.44 rows=97 loops=1)
> > >     >                                 ->  Index Scan using
> > >     > article_fti_string_idx on article_fti f2  (cost=0.00..990.47
> > > rows=678
> > >     > width=4) (actual time=0.16..1.74 rows=97 loops=1)
> > >     >                           ->  Sort  (cost=1022.35..1022.35
> > > rows=678
> > >     > width=4) (actual time=3.42..3.66 rows=194 loops=1)
> > >     >                                 ->  Index Scan using
> > >     > article_fti_string_idx on article_fti f3  (cost=0.00..990.47
> > > rows=678
> > >     > width=4) (actual time=0.13..2.53 rows=159 loops=1)
> > >     >                     ->  Sort  (cost=1022.35..1022.35 rows=678
> > > width=4)
> > >     > (actual time=8.18..8.89 rows=532 loops=1)
> > >     >                           ->  Index Scan using
> > > article_fti_string_idx on
> > >     > article_fti f1  (cost=0.00..990.47 rows=678 width=4) (actual
> > >     > time=0.13..5.98 rows=350 loops=1)
> > >     >               ->  Index Scan using article_oid_idx on article
> > > p
> > >     > (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03
> > > rows=1
> > >     > loops=376)
> > >     > Total runtime: 42.41 msec
> > >     >
> > >     > Not that's much more like it :) This query returns exactly 65
> > > numbers.
> > >     > But really we want not only the article-IDs, but also some
> > > info on the
> > >     > article as well:
> > >     >
> > >     >
> > >
> > --------------------------------------------------------------
> > ----------
> > >     > --------------------
> > >     >
> > >     > Case 3b:
> > >     > Retrieving additional article-info:
> > >     >
> > >     > SET ENABLE_SEQSCAN=OFF;
> > >     > EXPLAIN ANALYZE
> > >     > SELECT article_id, site_id, article_type, topstory, headline,
> > > published
> > >     > from article where article_id in (
> > >     > select distinct (p.article_id)
> > >     > from article p, article_fti f1, article_fti f2, article_fti f3
> > >     > where f1.string ='grand' and f2.string ='theft' and f3.string
> > > ='auto'
> > >     > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id);
> > >     > SET ENABLE_SEQSCAN=ON;
> > >     >
> > >     > NOTICE:  QUERY PLAN:
> > >     >
> > >     > Seq Scan on article  (cost=100000000.00..137645255.53
> > > rows=3523
> > >     > width=77) (actual time=1756.30..3720.44 rows=65 loops=1)
> > >     >   SubPlan
> > >     >     ->  Materialize  (cost=5342.61..5342.61 rows=37 width=20)
> > > (actual
> > >     > time=0.17..0.28 rows=65 loops=7046)
> > >     >           ->  Unique  (cost=5341.68..5342.61 rows=37 width=20)
> > > (actual
> > >     > time=1182.07..1183.29 rows=65 loops=1)
> > >     >                 ->  Sort  (cost=5341.68..5341.68 rows=372
> > > width=20)
> > >     > (actual time=1182.06..1182.52 rows=376 loops=1)
> > >     >                       ->  Nested Loop  (cost=3067.05..5325.79
> > > rows=372
> > >     > width=20) (actual time=318.55..1179.04 rows=376 loops=1)
> > >     >                             ->  Merge Join
> > > (cost=3067.05..3084.32
> > >     > rows=372 width=12) (actual time=300.97..314.39 rows=376
> > > loops=1)
> > >     >                                   ->  Merge Join
> > > (cost=2044.70..2054.37
> > >     > rows=502 width=8) (actual time=159.24..163.45 rows=176
> > > loops=1)
> > >     >                                         ->  Sort
> > > (cost=1022.35..1022.35
> > >     > rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1)
> > >     >                                               ->  Index Scan
> > > using
> > >     > article_fti_string_idx on article_fti f2  (cost=0.00..990.47
> > > rows=678
> > >     > width=4) (actual time=1.46..69.08 rows=97 loops=1)
> > >     >                                         ->  Sort
> > > (cost=1022.35..1022.35
> > >     > rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1)
> > >     >                                               ->  Index Scan
> > > using
> > >     > article_fti_string_idx on article_fti f3  (cost=0.00..990.47
> > > rows=678
> > >     > width=4) (actual time=0.19..85.06 rows=159 loops=1)
> > >     >                                   ->  Sort
> > > (cost=1022.35..1022.35
> > >     > rows=678 width=4) (actual time=141.70..143.03 rows=532
> > > loops=1)
> > >     >                                         ->  Index Scan using
> > >     > article_fti_string_idx on article_fti f1  (cost=0.00..990.47
> > > rows=678
> > >     > width=4) (actual time=0.17..138.95 rows=350 loops=1)
> > >     >                             ->  Index Scan using
> > > article_oid_idx on
> > >     > article p  (cost=0.00..6.01 rows=1 width=8) (actual
> > > time=1.65..2.27
> > >     > rows=1 loops=376)
> > >     > Total runtime: 3722.74 msec
> > >     >
> > >     > Whoa! Why is that? How come this takes so extremely much
> > > longer?
> > >     > Because...
> > >     >
> > >     >
> > >
> > --------------------------------------------------------------
> > ----------
> > >     > --------------------
> > >     >
> > >     > Case 3c:
> > >     > If I just select all the ids, I get a total runtime of  42.41
> > > msec. And
> > >     > if I do a select ... where article_id in (all the results from
> > > case3):
> > >     >
> > >     > explain analyze SELECT article_id, site_id, article_type,
> > > topstory,
> > >     > headline, published from article where article_id in
> > >     >
> > >
> > (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,1278
> > 4,13324,14
> > >     >
> > >
> > 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,3973
> > 6,40470,41
> > >     >
> > >
> > 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,5265
> > 3,52770,53
> > >     >
> > >
> > 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,5695
> > 0,57210,57
> > >     >
> > >
> > 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,6571
> > 0,65771,66
> > >     > 050,66490,66531,67030,67110);
> > >     >
> > >     > NOTICE:  QUERY PLAN:
> > >     >
> > >     > Index Scan using idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article, idx0_article, idx0_article, idx0_article,
> > > idx0_article,
> > >     > idx0_article on article  (cost=0.00..378.61 rows=67 width=77)
> > > (actual
> > >     > time=0.08..5.17 rows=65 loops=1)
> > >     > Total runtime: 6.15 msec
> > >     >
> > >     > I get just another 6.15msec.
> > >     >
> > >     > Now I do think that the total runtime of 3b should really be
> > > 3+3c, which
> > >     > amounts to roughly 50msec, but not nearly 4 seconds. What am I
> > > doing
> > >     > wrong?
> > >     >
> > >     > Thanks for your advice!
> > >     >
> > >     > Regards,
> > >     >
> > >     >       Markus
> > >     >
> > >     > ---------------------------(end of
> > > broadcast)---------------------------
> > >     > TIP 1: subscribe and unsubscribe commands go to
> > > majordomo@postgresql.org
> > >     >
> > >
> > >             Regards,
> > >                     Oleg
> > >     _____________________________________________________________
> > >     Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > >     Sternberg Astronomical Institute, Moscow University (Russia)
> > >     Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > >     phone: +007(095)939-16-83, +007(095)939-23-83
> > >
> > >
> > >
> >
> >     Regards,
> >         Oleg
> > _____________________________________________________________
> > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > Sternberg Astronomical Institute, Moscow University (Russia)
> > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > phone: +007(095)939-16-83, +007(095)939-23-83
> >
> >
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Using FTI-Search (likely a more general runtime-puzzle)

Using FTI-Search (likely a more general runtime-puzzle)

Re: Using FTI-Search (likely a more general runtime-puzzle)

Re: Using FTI-Search (likely a more general runtime-puzzle)

Re: Using FTI-Search (likely a more general runtime-puzzle)

Re: Using FTI-Search (likely a more general runtime-puzzle)

Re: Using FTI-Search (likely a more general