Обсуждение: Using FTI-Search (likely a more general runtime-puzzle)
Hello! I have implemented the full text index feature from /contrib and so far the results look quite promising. We use it to index several columns of an article table. The main column is called 'text'. An example for a search would be to retrieve all the articles with "grand theft auto" in them somewhere (not necessarily one string but individual words). ------------------------------------------------------------------------ -------------------- Case 1: Our Good Old Fashioned Search (TM) over the article-table: ANALYZE select distinct (p1.article_id) from article p1, article p2, article p3 where lower(p1.text) LIKE '%grand%' AND lower(p2.text) LIKE '%theft%' AND lower(p3.text) LIKE '%auto%' AND p1.article_id=p2.article_id and p1.article_id=p3.article_id Unique (cost=3067.03..3067.04 rows=1 width=12) (actual time=1455.25..1455.44 rows=30 loops=1) -> Sort (cost=3067.03..3067.03 rows=1 width=12) (actual time=1455.24..1455.29 rows=30 loops=1) -> Nested Loop (cost=1098.52..3067.02 rows=1 width=12) (actual time=1211.54..1453.82 rows=30 loops=1) -> Hash Join (cost=1098.52..2197.47 rows=6 width=8) (actual time=1121.32..1309.23 rows=30 loops=1) -> Seq Scan on article p2 (cost=0.00..1098.43 rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1) -> Hash (cost=1098.43..1098.43 rows=35 width=4) (actual time=766.21..766.21 rows=0 loops=1) -> Seq Scan on article p3 (cost=0.00..1098.43 rows=35 width=4) (actual time=13.63..763.73 rows=411 loops=1) -> Index Scan using idx0_article on article p1 (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80 rows=1 loops=30) Total runtime: 1456.00 msec This is the time, fti needs to beat in order to be of any use to us. ------------------------------------------------------------------------ -------------------- Case 2: Searching the FTI-Table: EXPLAIN ANALYZE select distinct (p.article_id) from article p, article_fti f1, article_fti f2, article_fti f3 where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; Unique (cost=39200652.79..39647427.57 rows=17870991 width=20) (actual time=52506.81..52506.81 rows=0 loops=1) -> Sort (cost=39200652.79..39200652.79 rows=178709913 width=20) (actual time=52506.81..52506.81 rows=0 loops=1) -> Merge Join (cost=222181.40..313718.83 rows=178709913 width=20) (actual time=52505.69..52505.69 rows=0 loops=1) -> Merge Join (cost=147175.62..170464.70 rows=1855421 width=8) (actual time=37084.36..37084.36 rows=1 loops=1) -> Sort (cost=73587.81..73587.81 rows=19264 width=4) (actual time=22423.27..22423.27 rows=1 loops=1) -> Seq Scan on article_fti f2 (cost=0.00..72216.86 rows=19264 width=4) (actual time=18297.22..22422.20 rows=97 loops=1) -> Sort (cost=73587.81..73587.81 rows=19264 width=4) (actual time=14661.04..14661.04 rows=1 loops=1) -> Seq Scan on article_fti f3 (cost=0.00..72216.86 rows=19264 width=4) (actual time=1344.87..14659.59 rows=159 loops=1) -> Sort (cost=75005.78..75005.78 rows=19264 width=12) (actual time=15421.32..15421.32 rows=0 loops=1) -> Hash Join (cost=1080.86..73634.84 rows=19264 width=12) (actual time=15421.10..15421.10 rows=0 loops=1) -> Seq Scan on article_fti f1 (cost=0.00..72216.86 rows=19264 width=4) (actual time=4879.59..14306.97 rows=350 loops=1) -> Hash (cost=1063.29..1063.29 rows=7029 width=8) (actual time=1109.79..1109.79 rows=0 loops=1) -> Seq Scan on article p (cost=0.00..1063.29 rows=7029 width=8) (actual time=12.20..1076.46 rows=7029 loops=1) Total runtime: 52507.92 msec This is way too long. I don't know why it doesn't use the indexes I gave to it... If someone has got some idea as to what can be done to optimize query-planner decisions, I'd be happy to comply. But anyway, I don't worry about this too much and just force it to use them, once I know its decision for seqential scan is wrong: ------------------------------------------------------------------------ -------------------- Case 3: Searching the FTI-Table forcing use of indexes: SET ENABLE_SEQSCAN=OFF; EXPLAIN ANALYZE select distinct (p.article_id) from article p, article_fti f1, article_fti f2, article_fti f3 where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; SET ENABLE_SEQSCAN=ON; NOTICE: QUERY PLAN: Unique (cost=5341.68..5342.61 rows=37 width=20) (actual time=40.52..41.83 rows=65 loops=1) -> Sort (cost=5341.68..5341.68 rows=372 width=20) (actual time=40.52..40.96rows=376 loops=1) -> Nested Loop (cost=3067.05..5325.79 rows=372 width=20) (actual time=14.02..38.38 rows=376 loops=1) -> Merge Join (cost=3067.05..3084.32 rows=372 width=12) (actualtime=13.95..20.45 rows=376 loops=1) -> Merge Join (cost=2044.70..2054.37 rows=502 width=8) (actual time=5.75..7.81 rows=176 loops=1) -> Sort (cost=1022.35..1022.35 rows=678 width=4) (actual time=2.31..2.44 rows=97 loops=1) -> Index Scan using article_fti_string_idx on article_fti f2 (cost=0.00..990.47 rows=678 width=4) (actual time=0.16..1.74 rows=97 loops=1) -> Sort (cost=1022.35..1022.35 rows=678 width=4) (actual time=3.42..3.66 rows=194 loops=1) -> Index Scan using article_fti_string_idx on article_fti f3 (cost=0.00..990.47 rows=678 width=4) (actual time=0.13..2.53 rows=159 loops=1) -> Sort (cost=1022.35..1022.35 rows=678 width=4) (actual time=8.18..8.89 rows=532 loops=1) -> Index Scan using article_fti_string_idx on article_fti f1 (cost=0.00..990.47 rows=678 width=4) (actual time=0.13..5.98 rows=350 loops=1) -> Index Scan using article_oid_idx on article p (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 rows=1 loops=376) Total runtime: 42.41 msec Not that's much more like it :) This query returns exactly 65 numbers. But really we want not only the article-IDs, but also some info on the article as well: ------------------------------------------------------------------------ -------------------- Case 3b: Retrieving additional article-info: SET ENABLE_SEQSCAN=OFF; EXPLAIN ANALYZE SELECT article_id, site_id, article_type, topstory, headline, published from article where article_id in ( select distinct (p.article_id) from article p, article_fti f1, article_fti f2, article_fti f3 where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id); SET ENABLE_SEQSCAN=ON; NOTICE: QUERY PLAN: Seq Scan on article (cost=100000000.00..137645255.53 rows=3523 width=77) (actual time=1756.30..3720.44 rows=65 loops=1) SubPlan -> Materialize (cost=5342.61..5342.61 rows=37 width=20) (actual time=0.17..0.28 rows=65 loops=7046) -> Unique (cost=5341.68..5342.61 rows=37 width=20) (actual time=1182.07..1183.29 rows=65 loops=1) -> Sort (cost=5341.68..5341.68 rows=372 width=20) (actual time=1182.06..1182.52 rows=376 loops=1) -> Nested Loop (cost=3067.05..5325.79 rows=372 width=20) (actual time=318.55..1179.04 rows=376 loops=1) -> Merge Join (cost=3067.05..3084.32 rows=372 width=12) (actual time=300.97..314.39 rows=376 loops=1) -> Merge Join (cost=2044.70..2054.37 rows=502 width=8) (actual time=159.24..163.45 rows=176 loops=1) -> Sort (cost=1022.35..1022.35 rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1) -> Index Scan using article_fti_string_idx on article_fti f2 (cost=0.00..990.47 rows=678 width=4) (actual time=1.46..69.08 rows=97 loops=1) -> Sort (cost=1022.35..1022.35 rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1) -> Index Scan using article_fti_string_idx on article_fti f3 (cost=0.00..990.47 rows=678 width=4) (actual time=0.19..85.06 rows=159 loops=1) -> Sort (cost=1022.35..1022.35 rows=678 width=4) (actual time=141.70..143.03 rows=532 loops=1) -> Index Scan using article_fti_string_idx on article_fti f1 (cost=0.00..990.47 rows=678 width=4) (actual time=0.17..138.95 rows=350 loops=1) -> Index Scan using article_oid_idx on article p (cost=0.00..6.01 rows=1 width=8) (actual time=1.65..2.27 rows=1 loops=376) Total runtime: 3722.74 msec Whoa! Why is that? How come this takes so extremely much longer? Because... ------------------------------------------------------------------------ -------------------- Case 3c: If I just select all the ids, I get a total runtime of 42.41 msec. And if I do a select ... where article_id in (all the results from case3): explain analyze SELECT article_id, site_id, article_type, topstory, headline, published from article where article_id in (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,12784,13324,14 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,39736,40470,41 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,52653,52770,53 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,56950,57210,57 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,65710,65771,66 050,66490,66531,67030,67110); NOTICE: QUERY PLAN: Index Scan using idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, idx0_article on article (cost=0.00..378.61 rows=67 width=77) (actual time=0.08..5.17 rows=65 loops=1) Total runtime: 6.15 msec I get just another 6.15msec. Now I do think that the total runtime of 3b should really be 3+3c, which amounts to roughly 50msec, but not nearly 4 seconds. What am I doing wrong? Thanks for your advice! Regards, Markus
> This is way too long. I don't know why it doesn't use the indexes I gave > to it... If someone has got some idea as to what can be done to optimize > query-planner decisions, I'd be happy to comply. But anyway, I don't > worry about this too much and just force it to use them, once I know its > decision for seqential scan is wrong: Have you vaccum analyzed the fti table? > Case 3b: > Retrieving additional article-info: > > SET ENABLE_SEQSCAN=OFF; > EXPLAIN ANALYZE > SELECT article_id, site_id, article_type, topstory, headline, published > from article where article_id in ( > select distinct (p.article_id) > from article p, article_fti f1, article_fti f2, article_fti f3 > where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id); > SET ENABLE_SEQSCAN=ON; I was going to suggest a subselect in from rather than the IN, but perhaps you could just use distinct on and get the data from p. Admittedly this uses a postgresql extension, and assumes that p.article_id is unique, maybe like: select distinct on (p.article_id) p.* from article p, article_fti f1, article_fti f2, article_fti f3 where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
Markus, have you tried our contrib/tsearch module ? Oleg On Wed, 24 Jul 2002, Markus Wollny wrote: > Hello! > > I have implemented the full text index feature from /contrib and so far > the results look quite promising. We use it to index several columns of > an article table. The main column is called 'text'. An example for a > search would be to retrieve all the articles with "grand theft auto" in > them somewhere (not necessarily one string but individual words). > > ------------------------------------------------------------------------ > -------------------- > Case 1: Our Good Old Fashioned Search (TM) over the article-table: > > ANALYZE select distinct (p1.article_id) > from article p1, article p2, article p3 > where lower(p1.text) LIKE '%grand%' > AND lower(p2.text) LIKE '%theft%' > AND lower(p3.text) LIKE '%auto%' > AND p1.article_id=p2.article_id > and p1.article_id=p3.article_id > > Unique (cost=3067.03..3067.04 rows=1 width=12) (actual > time=1455.25..1455.44 rows=30 loops=1) > -> Sort (cost=3067.03..3067.03 rows=1 width=12) (actual > time=1455.24..1455.29 rows=30 loops=1) > -> Nested Loop (cost=1098.52..3067.02 rows=1 width=12) (actual > time=1211.54..1453.82 rows=30 loops=1) > -> Hash Join (cost=1098.52..2197.47 rows=6 width=8) > (actual time=1121.32..1309.23 rows=30 loops=1) > -> Seq Scan on article p2 (cost=0.00..1098.43 > rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1) > -> Hash (cost=1098.43..1098.43 rows=35 width=4) > (actual time=766.21..766.21 rows=0 loops=1) > -> Seq Scan on article p3 > (cost=0.00..1098.43 rows=35 width=4) (actual time=13.63..763.73 rows=411 > loops=1) > -> Index Scan using idx0_article on article p1 > (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80 rows=1 > loops=30) > Total runtime: 1456.00 msec > > This is the time, fti needs to beat in order to be of any use to us. > > ------------------------------------------------------------------------ > -------------------- > > Case 2: > Searching the FTI-Table: > EXPLAIN ANALYZE select distinct (p.article_id) > from article p, article_fti f1, article_fti f2, article_fti f3 > where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > > > Unique (cost=39200652.79..39647427.57 rows=17870991 width=20) (actual > time=52506.81..52506.81 rows=0 loops=1) > -> Sort (cost=39200652.79..39200652.79 rows=178709913 width=20) > (actual time=52506.81..52506.81 rows=0 loops=1) > -> Merge Join (cost=222181.40..313718.83 rows=178709913 > width=20) (actual time=52505.69..52505.69 rows=0 loops=1) > -> Merge Join (cost=147175.62..170464.70 rows=1855421 > width=8) (actual time=37084.36..37084.36 rows=1 loops=1) > -> Sort (cost=73587.81..73587.81 rows=19264 > width=4) (actual time=22423.27..22423.27 rows=1 loops=1) > -> Seq Scan on article_fti f2 > (cost=0.00..72216.86 rows=19264 width=4) (actual time=18297.22..22422.20 > rows=97 loops=1) > -> Sort (cost=73587.81..73587.81 rows=19264 > width=4) (actual time=14661.04..14661.04 rows=1 loops=1) > -> Seq Scan on article_fti f3 > (cost=0.00..72216.86 rows=19264 width=4) (actual time=1344.87..14659.59 > rows=159 loops=1) > -> Sort (cost=75005.78..75005.78 rows=19264 width=12) > (actual time=15421.32..15421.32 rows=0 loops=1) > -> Hash Join (cost=1080.86..73634.84 rows=19264 > width=12) (actual time=15421.10..15421.10 rows=0 loops=1) > -> Seq Scan on article_fti f1 > (cost=0.00..72216.86 rows=19264 width=4) (actual time=4879.59..14306.97 > rows=350 loops=1) > -> Hash (cost=1063.29..1063.29 rows=7029 > width=8) (actual time=1109.79..1109.79 rows=0 loops=1) > -> Seq Scan on article p > (cost=0.00..1063.29 rows=7029 width=8) (actual time=12.20..1076.46 > rows=7029 loops=1) > Total runtime: 52507.92 msec > > This is way too long. I don't know why it doesn't use the indexes I gave > to it... If someone has got some idea as to what can be done to optimize > query-planner decisions, I'd be happy to comply. But anyway, I don't > worry about this too much and just force it to use them, once I know its > decision for seqential scan is wrong: > > ------------------------------------------------------------------------ > -------------------- > > Case 3: > Searching the FTI-Table forcing use of indexes: > SET ENABLE_SEQSCAN=OFF; > EXPLAIN ANALYZE select distinct (p.article_id) > from article p, article_fti f1, article_fti f2, article_fti f3 > where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > SET ENABLE_SEQSCAN=ON; > > NOTICE: QUERY PLAN: > > Unique (cost=5341.68..5342.61 rows=37 width=20) (actual > time=40.52..41.83 rows=65 loops=1) > -> Sort (cost=5341.68..5341.68 rows=372 width=20) (actual > time=40.52..40.96rows=376 loops=1) > -> Nested Loop (cost=3067.05..5325.79 rows=372 width=20) > (actual time=14.02..38.38 rows=376 loops=1) > -> Merge Join (cost=3067.05..3084.32 rows=372 width=12) > (actualtime=13.95..20.45 rows=376 loops=1) > -> Merge Join (cost=2044.70..2054.37 rows=502 > width=8) (actual time=5.75..7.81 rows=176 loops=1) > -> Sort (cost=1022.35..1022.35 rows=678 > width=4) (actual time=2.31..2.44 rows=97 loops=1) > -> Index Scan using > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 rows=678 > width=4) (actual time=0.16..1.74 rows=97 loops=1) > -> Sort (cost=1022.35..1022.35 rows=678 > width=4) (actual time=3.42..3.66 rows=194 loops=1) > -> Index Scan using > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 rows=678 > width=4) (actual time=0.13..2.53 rows=159 loops=1) > -> Sort (cost=1022.35..1022.35 rows=678 width=4) > (actual time=8.18..8.89 rows=532 loops=1) > -> Index Scan using article_fti_string_idx on > article_fti f1 (cost=0.00..990.47 rows=678 width=4) (actual > time=0.13..5.98 rows=350 loops=1) > -> Index Scan using article_oid_idx on article p > (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 rows=1 > loops=376) > Total runtime: 42.41 msec > > Not that's much more like it :) This query returns exactly 65 numbers. > But really we want not only the article-IDs, but also some info on the > article as well: > > ------------------------------------------------------------------------ > -------------------- > > Case 3b: > Retrieving additional article-info: > > SET ENABLE_SEQSCAN=OFF; > EXPLAIN ANALYZE > SELECT article_id, site_id, article_type, topstory, headline, published > from article where article_id in ( > select distinct (p.article_id) > from article p, article_fti f1, article_fti f2, article_fti f3 > where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id); > SET ENABLE_SEQSCAN=ON; > > NOTICE: QUERY PLAN: > > Seq Scan on article (cost=100000000.00..137645255.53 rows=3523 > width=77) (actual time=1756.30..3720.44 rows=65 loops=1) > SubPlan > -> Materialize (cost=5342.61..5342.61 rows=37 width=20) (actual > time=0.17..0.28 rows=65 loops=7046) > -> Unique (cost=5341.68..5342.61 rows=37 width=20) (actual > time=1182.07..1183.29 rows=65 loops=1) > -> Sort (cost=5341.68..5341.68 rows=372 width=20) > (actual time=1182.06..1182.52 rows=376 loops=1) > -> Nested Loop (cost=3067.05..5325.79 rows=372 > width=20) (actual time=318.55..1179.04 rows=376 loops=1) > -> Merge Join (cost=3067.05..3084.32 > rows=372 width=12) (actual time=300.97..314.39 rows=376 loops=1) > -> Merge Join (cost=2044.70..2054.37 > rows=502 width=8) (actual time=159.24..163.45 rows=176 loops=1) > -> Sort (cost=1022.35..1022.35 > rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1) > -> Index Scan using > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 rows=678 > width=4) (actual time=1.46..69.08 rows=97 loops=1) > -> Sort (cost=1022.35..1022.35 > rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1) > -> Index Scan using > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 rows=678 > width=4) (actual time=0.19..85.06 rows=159 loops=1) > -> Sort (cost=1022.35..1022.35 > rows=678 width=4) (actual time=141.70..143.03 rows=532 loops=1) > -> Index Scan using > article_fti_string_idx on article_fti f1 (cost=0.00..990.47 rows=678 > width=4) (actual time=0.17..138.95 rows=350 loops=1) > -> Index Scan using article_oid_idx on > article p (cost=0.00..6.01 rows=1 width=8) (actual time=1.65..2.27 > rows=1 loops=376) > Total runtime: 3722.74 msec > > Whoa! Why is that? How come this takes so extremely much longer? > Because... > > ------------------------------------------------------------------------ > -------------------- > > Case 3c: > If I just select all the ids, I get a total runtime of 42.41 msec. And > if I do a select ... where article_id in (all the results from case3): > > explain analyze SELECT article_id, site_id, article_type, topstory, > headline, published from article where article_id in > (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,12784,13324,14 > 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,39736,40470,41 > 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,52653,52770,53 > 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,56950,57210,57 > 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,65710,65771,66 > 050,66490,66531,67030,67110); > > NOTICE: QUERY PLAN: > > Index Scan using idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article, idx0_article, idx0_article, idx0_article, idx0_article, > idx0_article on article (cost=0.00..378.61 rows=67 width=77) (actual > time=0.08..5.17 rows=65 loops=1) > Total runtime: 6.15 msec > > I get just another 6.15msec. > > Now I do think that the total runtime of 3b should really be 3+3c, which > amounts to roughly 50msec, but not nearly 4 seconds. What am I doing > wrong? > > Thanks for your advice! > > Regards, > > Markus > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Hello! You seem to have hit a bullseye here :) p.article_id is of course unique and your suggestion gives us everything we could ask for - in just under 60ms - that's not only wonderful but lightning fast! As we are not planning on switching the database at all (we are just finishing migration from Oracle to PostgreSQL), we are more than happy with any extension that does the job so much, much faster. Thank you very much! Regards, Markus Wollny -----Ursprüngliche Nachricht----- Von: Stephan Szabo Gesendet: Mi 24.07.2002 19:07 An: Markus Wollny Cc: pgsql-general@postgresql.org Betreff: Re: [GENERAL] Using FTI-Search (likely a more general runtime-puzzle) > This is way too long. I don't know why it doesn't use the indexes I gave > to it... If someone has got some idea as to what can be done to optimize > query-planner decisions, I'd be happy to comply. But anyway, I don't > worry about this too much and just force it to use them, once I know its > decision for seqential scan is wrong: Have you vaccum analyzed the fti table? > Case 3b: > Retrieving additional article-info: > > SET ENABLE_SEQSCAN=OFF; > EXPLAIN ANALYZE > SELECT article_id, site_id, article_type, topstory, headline, published > from article where article_id in ( > select distinct (p.article_id) > from article p, article_fti f1, article_fti f2, article_fti f3 > where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id); > SET ENABLE_SEQSCAN=ON; I was going to suggest a subselect in from rather than the IN, but perhaps you could just use distinct on and get the data from p. Admittedly this uses a postgresql extension, and assumes that p.article_id is unique, maybe like: select distinct on (p.article_id) p.* from article p, article_fti f1, article_fti f2, article_fti f3 where f1.string ='grand' and f2.string ='theft' and f3.string ='auto' and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id;
Hi! I am about to evaluate the tsearch module from contrib, but as yet I haven't succeeded in getting it to run. I followed instructions on installation of tsearch; the new datatype and the corresponding functions seem all to be there. So I want to try it out on some existing data. In table article I want to set up a full-text index on column text. After adding the column textindex of type txtidx, I try to fill that column with data, but this just crashes the backend immediately: pcgames=# update article set textindex=txt2txtidx(text); pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The logfile doesn't really give much information on what exactly happened: 2002-07-29 13:37:20 [22127] DEBUG: database system is ready 2002-07-29 13:37:46 [22124] DEBUG: server process (pid 22137) was terminated by signal 11 2002-07-29 13:37:46 [22124] DEBUG: terminating any other active server processes 2002-07-29 13:37:46 [22124] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-07-29 13:37:46 [22139] FATAL 1: The database system is starting up 2002-07-29 13:37:46 [22138] DEBUG: database system was interrupted at 2002-07-29 13:37:20 CEST 2002-07-29 13:37:46 [22138] DEBUG: checkpoint record is at E/3B785168 2002-07-29 13:37:46 [22138] DEBUG: redo record is at E/3B785168; undo record is at 0/0; shutdown TRUE 2002-07-29 13:37:46 [22138] DEBUG: next transaction id: 179915804; next oid: 95552489 2002-07-29 13:37:46 [22138] DEBUG: database system was not properly shut down; automatic recovery in progress 2002-07-29 13:37:46 [22138] DEBUG: ReadRecord: record with zero length at E/3B7851A8 2002-07-29 13:37:46 [22138] DEBUG: redo is not required 2002-07-29 13:37:48 [22138] DEBUG: database system is ready Any ideas? Regards, Markus > -----Ursprungliche Nachricht----- > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > Gesendet: Mittwoch, 24. Juli 2002 21:20 > An: Markus Wollny > Betreff: Re: AW: [GENERAL] Using FTI-Search (likely a more > general runtime-puzzle) > > > On Wed, 24 Jul 2002, Markus Wollny wrote: > > > Hi! > > > > No I haven't as yet - does it provide other/better/more > features than > > the fti-module? > > > > Sure, it's designed mostly for full text search and comes from > Information Retrieval. There is a README file with some intro. > > > Regards, > > > > Markus > > > > -----UrsprУМngliche Nachricht----- > > Von: Oleg Bartunov > > Gesendet: Mi 24.07.2002 19:37 > > An: Markus Wollny > > Cc: pgsql-general@postgresql.org > > Betreff: Re: [GENERAL] Using FTI-Search (likely a more general > > runtime-puzzle) > > > > > > > > Markus, > > > > have you tried our contrib/tsearch module ? > > > > Oleg > > On Wed, 24 Jul 2002, Markus Wollny wrote: > > > > > Hello! > > > > > > I have implemented the full text index feature from /contrib > > and so far > > > the results look quite promising. We use it to index several > > columns of > > > an article table. The main column is called 'text'. An example > > for a > > > search would be to retrieve all the articles with "grand theft > > auto" in > > > them somewhere (not necessarily one string but individual > > words). > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > Case 1: Our Good Old Fashioned Search (TM) over the > > article-table: > > > > > > ANALYZE select distinct (p1.article_id) > > > from article p1, article p2, article p3 > > > where lower(p1.text) LIKE '%grand%' > > > AND lower(p2.text) LIKE '%theft%' > > > AND lower(p3.text) LIKE '%auto%' > > > AND p1.article_id=p2.article_id > > > and p1.article_id=p3.article_id > > > > > > Unique (cost=3067.03..3067.04 rows=1 width=12) (actual > > > time=1455.25..1455.44 rows=30 loops=1) > > > -> Sort (cost=3067.03..3067.03 rows=1 width=12) (actual > > > time=1455.24..1455.29 rows=30 loops=1) > > > -> Nested Loop (cost=1098.52..3067.02 rows=1 > > width=12) (actual > > > time=1211.54..1453.82 rows=30 loops=1) > > > -> Hash Join (cost=1098.52..2197.47 rows=6 > > width=8) > > > (actual time=1121.32..1309.23 rows=30 loops=1) > > > -> Seq Scan on article p2 > > (cost=0.00..1098.43 > > > rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1) > > > -> Hash (cost=1098.43..1098.43 rows=35 > > width=4) > > > (actual time=766.21..766.21 rows=0 loops=1) > > > -> Seq Scan on article p3 > > > (cost=0.00..1098.43 rows=35 width=4) (actual > > time=13.63..763.73 rows=411 > > > loops=1) > > > -> Index Scan using idx0_article on article p1 > > > (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80 > > rows=1 > > > loops=30) > > > Total runtime: 1456.00 msec > > > > > > This is the time, fti needs to beat in order to be of any use > > to us. > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 2: > > > Searching the FTI-Table: > > > EXPLAIN ANALYZE select distinct (p.article_id) > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > ='auto' > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > > > > > > > > > Unique (cost=39200652.79..39647427.57 rows=17870991 width=20) > > (actual > > > time=52506.81..52506.81 rows=0 loops=1) > > > -> Sort (cost=39200652.79..39200652.79 rows=178709913 > > width=20) > > > (actual time=52506.81..52506.81 rows=0 loops=1) > > > -> Merge Join (cost=222181.40..313718.83 > > rows=178709913 > > > width=20) (actual time=52505.69..52505.69 rows=0 loops=1) > > > -> Merge Join (cost=147175.62..170464.70 > > rows=1855421 > > > width=8) (actual time=37084.36..37084.36 rows=1 loops=1) > > > -> Sort (cost=73587.81..73587.81 > > rows=19264 > > > width=4) (actual time=22423.27..22423.27 rows=1 loops=1) > > > -> Seq Scan on article_fti f2 > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > time=18297.22..22422.20 > > > rows=97 loops=1) > > > -> Sort (cost=73587.81..73587.81 > > rows=19264 > > > width=4) (actual time=14661.04..14661.04 rows=1 loops=1) > > > -> Seq Scan on article_fti f3 > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > time=1344.87..14659.59 > > > rows=159 loops=1) > > > -> Sort (cost=75005.78..75005.78 rows=19264 > > width=12) > > > (actual time=15421.32..15421.32 rows=0 loops=1) > > > -> Hash Join (cost=1080.86..73634.84 > > rows=19264 > > > width=12) (actual time=15421.10..15421.10 rows=0 loops=1) > > > -> Seq Scan on article_fti f1 > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > time=4879.59..14306.97 > > > rows=350 loops=1) > > > -> Hash (cost=1063.29..1063.29 > > rows=7029 > > > width=8) (actual time=1109.79..1109.79 rows=0 loops=1) > > > -> Seq Scan on article p > > > (cost=0.00..1063.29 rows=7029 width=8) (actual > > time=12.20..1076.46 > > > rows=7029 loops=1) > > > Total runtime: 52507.92 msec > > > > > > This is way too long. I don't know why it doesn't use the > > indexes I gave > > > to it... If someone has got some idea as to what can be done > > to optimize > > > query-planner decisions, I'd be happy to comply. But anyway, I > > don't > > > worry about this too much and just force it to use them, once > > I know its > > > decision for seqential scan is wrong: > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 3: > > > Searching the FTI-Table forcing use of indexes: > > > SET ENABLE_SEQSCAN=OFF; > > > EXPLAIN ANALYZE select distinct (p.article_id) > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > ='auto' > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > > > SET ENABLE_SEQSCAN=ON; > > > > > > NOTICE: QUERY PLAN: > > > > > > Unique (cost=5341.68..5342.61 rows=37 width=20) (actual > > > time=40.52..41.83 rows=65 loops=1) > > > -> Sort (cost=5341.68..5341.68 rows=372 width=20) (actual > > > time=40.52..40.96rows=376 loops=1) > > > -> Nested Loop (cost=3067.05..5325.79 rows=372 > > width=20) > > > (actual time=14.02..38.38 rows=376 loops=1) > > > -> Merge Join (cost=3067.05..3084.32 rows=372 > > width=12) > > > (actualtime=13.95..20.45 rows=376 loops=1) > > > -> Merge Join (cost=2044.70..2054.37 > > rows=502 > > > width=8) (actual time=5.75..7.81 rows=176 loops=1) > > > -> Sort (cost=1022.35..1022.35 > > rows=678 > > > width=4) (actual time=2.31..2.44 rows=97 loops=1) > > > -> Index Scan using > > > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.16..1.74 rows=97 loops=1) > > > -> Sort (cost=1022.35..1022.35 > > rows=678 > > > width=4) (actual time=3.42..3.66 rows=194 loops=1) > > > -> Index Scan using > > > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.13..2.53 rows=159 loops=1) > > > -> Sort (cost=1022.35..1022.35 rows=678 > > width=4) > > > (actual time=8.18..8.89 rows=532 loops=1) > > > -> Index Scan using > > article_fti_string_idx on > > > article_fti f1 (cost=0.00..990.47 rows=678 width=4) (actual > > > time=0.13..5.98 rows=350 loops=1) > > > -> Index Scan using article_oid_idx on article > > p > > > (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 > > rows=1 > > > loops=376) > > > Total runtime: 42.41 msec > > > > > > Not that's much more like it :) This query returns exactly 65 > > numbers. > > > But really we want not only the article-IDs, but also some > > info on the > > > article as well: > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 3b: > > > Retrieving additional article-info: > > > > > > SET ENABLE_SEQSCAN=OFF; > > > EXPLAIN ANALYZE > > > SELECT article_id, site_id, article_type, topstory, headline, > > published > > > from article where article_id in ( > > > select distinct (p.article_id) > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > ='auto' > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id); > > > SET ENABLE_SEQSCAN=ON; > > > > > > NOTICE: QUERY PLAN: > > > > > > Seq Scan on article (cost=100000000.00..137645255.53 > > rows=3523 > > > width=77) (actual time=1756.30..3720.44 rows=65 loops=1) > > > SubPlan > > > -> Materialize (cost=5342.61..5342.61 rows=37 width=20) > > (actual > > > time=0.17..0.28 rows=65 loops=7046) > > > -> Unique (cost=5341.68..5342.61 rows=37 width=20) > > (actual > > > time=1182.07..1183.29 rows=65 loops=1) > > > -> Sort (cost=5341.68..5341.68 rows=372 > > width=20) > > > (actual time=1182.06..1182.52 rows=376 loops=1) > > > -> Nested Loop (cost=3067.05..5325.79 > > rows=372 > > > width=20) (actual time=318.55..1179.04 rows=376 loops=1) > > > -> Merge Join > > (cost=3067.05..3084.32 > > > rows=372 width=12) (actual time=300.97..314.39 rows=376 > > loops=1) > > > -> Merge Join > > (cost=2044.70..2054.37 > > > rows=502 width=8) (actual time=159.24..163.45 rows=176 > > loops=1) > > > -> Sort > > (cost=1022.35..1022.35 > > > rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1) > > > -> Index Scan > > using > > > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=1.46..69.08 rows=97 loops=1) > > > -> Sort > > (cost=1022.35..1022.35 > > > rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1) > > > -> Index Scan > > using > > > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.19..85.06 rows=159 loops=1) > > > -> Sort > > (cost=1022.35..1022.35 > > > rows=678 width=4) (actual time=141.70..143.03 rows=532 > > loops=1) > > > -> Index Scan using > > > article_fti_string_idx on article_fti f1 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.17..138.95 rows=350 loops=1) > > > -> Index Scan using > > article_oid_idx on > > > article p (cost=0.00..6.01 rows=1 width=8) (actual > > time=1.65..2.27 > > > rows=1 loops=376) > > > Total runtime: 3722.74 msec > > > > > > Whoa! Why is that? How come this takes so extremely much > > longer? > > > Because... > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 3c: > > > If I just select all the ids, I get a total runtime of 42.41 > > msec. And > > > if I do a select ... where article_id in (all the results from > > case3): > > > > > > explain analyze SELECT article_id, site_id, article_type, > > topstory, > > > headline, published from article where article_id in > > > > > > (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,1278 > 4,13324,14 > > > > > > 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,3973 > 6,40470,41 > > > > > > 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,5265 > 3,52770,53 > > > > > > 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,5695 > 0,57210,57 > > > > > > 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,6571 > 0,65771,66 > > > 050,66490,66531,67030,67110); > > > > > > NOTICE: QUERY PLAN: > > > > > > Index Scan using idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article on article (cost=0.00..378.61 rows=67 width=77) > > (actual > > > time=0.08..5.17 rows=65 loops=1) > > > Total runtime: 6.15 msec > > > > > > I get just another 6.15msec. > > > > > > Now I do think that the total runtime of 3b should really be > > 3+3c, which > > > amounts to roughly 50msec, but not nearly 4 seconds. What am I > > doing > > > wrong? > > > > > > Thanks for your advice! > > > > > > Regards, > > > > > > Markus > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to > > majordomo@postgresql.org > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > >
Markus, version of PostgreSQL ? Also, could you try just select txt2txtidx(text); where 'text' is a sample entry from your table. Oleg On Mon, 29 Jul 2002, Markus Wollny wrote: > Hi! > > I am about to evaluate the tsearch module from contrib, but as yet I > haven't succeeded in getting it to run. > > I followed instructions on installation of tsearch; the new datatype and > the corresponding functions seem all to be there. So I want to try it > out on some existing data. In table article I want to set up a full-text > index on column text. After adding the column textindex of type txtidx, > I try to fill that column with data, but this just crashes the backend > immediately: > > pcgames=# update article set textindex=txt2txtidx(text); > pqReadData() -- backend closed the channel unexpectedly. > This probably means the backend terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > The logfile doesn't really give much information on what exactly > happened: > > 2002-07-29 13:37:20 [22127] DEBUG: database system is ready > 2002-07-29 13:37:46 [22124] DEBUG: server process (pid 22137) was > terminated by signal 11 > 2002-07-29 13:37:46 [22124] DEBUG: terminating any other active server > processes > 2002-07-29 13:37:46 [22124] DEBUG: all server processes terminated; > reinitializing shared memory and semaphores > 2002-07-29 13:37:46 [22139] FATAL 1: The database system is starting > up > 2002-07-29 13:37:46 [22138] DEBUG: database system was interrupted at > 2002-07-29 13:37:20 CEST > 2002-07-29 13:37:46 [22138] DEBUG: checkpoint record is at E/3B785168 > 2002-07-29 13:37:46 [22138] DEBUG: redo record is at E/3B785168; undo > record is at 0/0; shutdown TRUE > 2002-07-29 13:37:46 [22138] DEBUG: next transaction id: 179915804; > next oid: 95552489 > 2002-07-29 13:37:46 [22138] DEBUG: database system was not properly > shut down; automatic recovery in progress > 2002-07-29 13:37:46 [22138] DEBUG: ReadRecord: record with zero length > at E/3B7851A8 > 2002-07-29 13:37:46 [22138] DEBUG: redo is not required > 2002-07-29 13:37:48 [22138] DEBUG: database system is ready > > Any ideas? > > Regards, > > Markus > > > -----Ursprungliche Nachricht----- > > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > > Gesendet: Mittwoch, 24. Juli 2002 21:20 > > An: Markus Wollny > > Betreff: Re: AW: [GENERAL] Using FTI-Search (likely a more > > general runtime-puzzle) > > > > > > On Wed, 24 Jul 2002, Markus Wollny wrote: > > > > > Hi! > > > > > > No I haven't as yet - does it provide other/better/more > > features than > > > the fti-module? > > > > > > > Sure, it's designed mostly for full text search and comes from > > Information Retrieval. There is a README file with some intro. > > > > > Regards, > > > > > > Markus > > > > > > -----UrsprУМngliche Nachricht----- > > > Von: Oleg Bartunov > > > Gesendet: Mi 24.07.2002 19:37 > > > An: Markus Wollny > > > Cc: pgsql-general@postgresql.org > > > Betreff: Re: [GENERAL] Using FTI-Search (likely a more general > > > runtime-puzzle) > > > > > > > > > > > > Markus, > > > > > > have you tried our contrib/tsearch module ? > > > > > > Oleg > > > On Wed, 24 Jul 2002, Markus Wollny wrote: > > > > > > > Hello! > > > > > > > > I have implemented the full text index feature from /contrib > > > and so far > > > > the results look quite promising. We use it to index several > > > columns of > > > > an article table. The main column is called 'text'. An example > > > for a > > > > search would be to retrieve all the articles with "grand theft > > > auto" in > > > > them somewhere (not necessarily one string but individual > > > words). > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------- > > > > -------------------- > > > > Case 1: Our Good Old Fashioned Search (TM) over the > > > article-table: > > > > > > > > ANALYZE select distinct (p1.article_id) > > > > from article p1, article p2, article p3 > > > > where lower(p1.text) LIKE '%grand%' > > > > AND lower(p2.text) LIKE '%theft%' > > > > AND lower(p3.text) LIKE '%auto%' > > > > AND p1.article_id=p2.article_id > > > > and p1.article_id=p3.article_id > > > > > > > > Unique (cost=3067.03..3067.04 rows=1 width=12) (actual > > > > time=1455.25..1455.44 rows=30 loops=1) > > > > -> Sort (cost=3067.03..3067.03 rows=1 width=12) (actual > > > > time=1455.24..1455.29 rows=30 loops=1) > > > > -> Nested Loop (cost=1098.52..3067.02 rows=1 > > > width=12) (actual > > > > time=1211.54..1453.82 rows=30 loops=1) > > > > -> Hash Join (cost=1098.52..2197.47 rows=6 > > > width=8) > > > > (actual time=1121.32..1309.23 rows=30 loops=1) > > > > -> Seq Scan on article p2 > > > (cost=0.00..1098.43 > > > > rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1) > > > > -> Hash (cost=1098.43..1098.43 rows=35 > > > width=4) > > > > (actual time=766.21..766.21 rows=0 loops=1) > > > > -> Seq Scan on article p3 > > > > (cost=0.00..1098.43 rows=35 width=4) (actual > > > time=13.63..763.73 rows=411 > > > > loops=1) > > > > -> Index Scan using idx0_article on article p1 > > > > (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80 > > > rows=1 > > > > loops=30) > > > > Total runtime: 1456.00 msec > > > > > > > > This is the time, fti needs to beat in order to be of any use > > > to us. > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------- > > > > -------------------- > > > > > > > > Case 2: > > > > Searching the FTI-Table: > > > > EXPLAIN ANALYZE select distinct (p.article_id) > > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > > ='auto' > > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > > > > > > > > > > > > Unique (cost=39200652.79..39647427.57 rows=17870991 width=20) > > > (actual > > > > time=52506.81..52506.81 rows=0 loops=1) > > > > -> Sort (cost=39200652.79..39200652.79 rows=178709913 > > > width=20) > > > > (actual time=52506.81..52506.81 rows=0 loops=1) > > > > -> Merge Join (cost=222181.40..313718.83 > > > rows=178709913 > > > > width=20) (actual time=52505.69..52505.69 rows=0 loops=1) > > > > -> Merge Join (cost=147175.62..170464.70 > > > rows=1855421 > > > > width=8) (actual time=37084.36..37084.36 rows=1 loops=1) > > > > -> Sort (cost=73587.81..73587.81 > > > rows=19264 > > > > width=4) (actual time=22423.27..22423.27 rows=1 loops=1) > > > > -> Seq Scan on article_fti f2 > > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > > time=18297.22..22422.20 > > > > rows=97 loops=1) > > > > -> Sort (cost=73587.81..73587.81 > > > rows=19264 > > > > width=4) (actual time=14661.04..14661.04 rows=1 loops=1) > > > > -> Seq Scan on article_fti f3 > > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > > time=1344.87..14659.59 > > > > rows=159 loops=1) > > > > -> Sort (cost=75005.78..75005.78 rows=19264 > > > width=12) > > > > (actual time=15421.32..15421.32 rows=0 loops=1) > > > > -> Hash Join (cost=1080.86..73634.84 > > > rows=19264 > > > > width=12) (actual time=15421.10..15421.10 rows=0 loops=1) > > > > -> Seq Scan on article_fti f1 > > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > > time=4879.59..14306.97 > > > > rows=350 loops=1) > > > > -> Hash (cost=1063.29..1063.29 > > > rows=7029 > > > > width=8) (actual time=1109.79..1109.79 rows=0 loops=1) > > > > -> Seq Scan on article p > > > > (cost=0.00..1063.29 rows=7029 width=8) (actual > > > time=12.20..1076.46 > > > > rows=7029 loops=1) > > > > Total runtime: 52507.92 msec > > > > > > > > This is way too long. I don't know why it doesn't use the > > > indexes I gave > > > > to it... If someone has got some idea as to what can be done > > > to optimize > > > > query-planner decisions, I'd be happy to comply. But anyway, I > > > don't > > > > worry about this too much and just force it to use them, once > > > I know its > > > > decision for seqential scan is wrong: > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------- > > > > -------------------- > > > > > > > > Case 3: > > > > Searching the FTI-Table forcing use of indexes: > > > > SET ENABLE_SEQSCAN=OFF; > > > > EXPLAIN ANALYZE select distinct (p.article_id) > > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > > ='auto' > > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > > > > SET ENABLE_SEQSCAN=ON; > > > > > > > > NOTICE: QUERY PLAN: > > > > > > > > Unique (cost=5341.68..5342.61 rows=37 width=20) (actual > > > > time=40.52..41.83 rows=65 loops=1) > > > > -> Sort (cost=5341.68..5341.68 rows=372 width=20) (actual > > > > time=40.52..40.96rows=376 loops=1) > > > > -> Nested Loop (cost=3067.05..5325.79 rows=372 > > > width=20) > > > > (actual time=14.02..38.38 rows=376 loops=1) > > > > -> Merge Join (cost=3067.05..3084.32 rows=372 > > > width=12) > > > > (actualtime=13.95..20.45 rows=376 loops=1) > > > > -> Merge Join (cost=2044.70..2054.37 > > > rows=502 > > > > width=8) (actual time=5.75..7.81 rows=176 loops=1) > > > > -> Sort (cost=1022.35..1022.35 > > > rows=678 > > > > width=4) (actual time=2.31..2.44 rows=97 loops=1) > > > > -> Index Scan using > > > > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 > > > rows=678 > > > > width=4) (actual time=0.16..1.74 rows=97 loops=1) > > > > -> Sort (cost=1022.35..1022.35 > > > rows=678 > > > > width=4) (actual time=3.42..3.66 rows=194 loops=1) > > > > -> Index Scan using > > > > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 > > > rows=678 > > > > width=4) (actual time=0.13..2.53 rows=159 loops=1) > > > > -> Sort (cost=1022.35..1022.35 rows=678 > > > width=4) > > > > (actual time=8.18..8.89 rows=532 loops=1) > > > > -> Index Scan using > > > article_fti_string_idx on > > > > article_fti f1 (cost=0.00..990.47 rows=678 width=4) (actual > > > > time=0.13..5.98 rows=350 loops=1) > > > > -> Index Scan using article_oid_idx on article > > > p > > > > (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 > > > rows=1 > > > > loops=376) > > > > Total runtime: 42.41 msec > > > > > > > > Not that's much more like it :) This query returns exactly 65 > > > numbers. > > > > But really we want not only the article-IDs, but also some > > > info on the > > > > article as well: > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------- > > > > -------------------- > > > > > > > > Case 3b: > > > > Retrieving additional article-info: > > > > > > > > SET ENABLE_SEQSCAN=OFF; > > > > EXPLAIN ANALYZE > > > > SELECT article_id, site_id, article_type, topstory, headline, > > > published > > > > from article where article_id in ( > > > > select distinct (p.article_id) > > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > > ='auto' > > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id); > > > > SET ENABLE_SEQSCAN=ON; > > > > > > > > NOTICE: QUERY PLAN: > > > > > > > > Seq Scan on article (cost=100000000.00..137645255.53 > > > rows=3523 > > > > width=77) (actual time=1756.30..3720.44 rows=65 loops=1) > > > > SubPlan > > > > -> Materialize (cost=5342.61..5342.61 rows=37 width=20) > > > (actual > > > > time=0.17..0.28 rows=65 loops=7046) > > > > -> Unique (cost=5341.68..5342.61 rows=37 width=20) > > > (actual > > > > time=1182.07..1183.29 rows=65 loops=1) > > > > -> Sort (cost=5341.68..5341.68 rows=372 > > > width=20) > > > > (actual time=1182.06..1182.52 rows=376 loops=1) > > > > -> Nested Loop (cost=3067.05..5325.79 > > > rows=372 > > > > width=20) (actual time=318.55..1179.04 rows=376 loops=1) > > > > -> Merge Join > > > (cost=3067.05..3084.32 > > > > rows=372 width=12) (actual time=300.97..314.39 rows=376 > > > loops=1) > > > > -> Merge Join > > > (cost=2044.70..2054.37 > > > > rows=502 width=8) (actual time=159.24..163.45 rows=176 > > > loops=1) > > > > -> Sort > > > (cost=1022.35..1022.35 > > > > rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1) > > > > -> Index Scan > > > using > > > > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 > > > rows=678 > > > > width=4) (actual time=1.46..69.08 rows=97 loops=1) > > > > -> Sort > > > (cost=1022.35..1022.35 > > > > rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1) > > > > -> Index Scan > > > using > > > > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 > > > rows=678 > > > > width=4) (actual time=0.19..85.06 rows=159 loops=1) > > > > -> Sort > > > (cost=1022.35..1022.35 > > > > rows=678 width=4) (actual time=141.70..143.03 rows=532 > > > loops=1) > > > > -> Index Scan using > > > > article_fti_string_idx on article_fti f1 (cost=0.00..990.47 > > > rows=678 > > > > width=4) (actual time=0.17..138.95 rows=350 loops=1) > > > > -> Index Scan using > > > article_oid_idx on > > > > article p (cost=0.00..6.01 rows=1 width=8) (actual > > > time=1.65..2.27 > > > > rows=1 loops=376) > > > > Total runtime: 3722.74 msec > > > > > > > > Whoa! Why is that? How come this takes so extremely much > > > longer? > > > > Because... > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------- > > > > -------------------- > > > > > > > > Case 3c: > > > > If I just select all the ids, I get a total runtime of 42.41 > > > msec. And > > > > if I do a select ... where article_id in (all the results from > > > case3): > > > > > > > > explain analyze SELECT article_id, site_id, article_type, > > > topstory, > > > > headline, published from article where article_id in > > > > > > > > > (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,1278 > > 4,13324,14 > > > > > > > > > 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,3973 > > 6,40470,41 > > > > > > > > > 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,5265 > > 3,52770,53 > > > > > > > > > 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,5695 > > 0,57210,57 > > > > > > > > > 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,6571 > > 0,65771,66 > > > > 050,66490,66531,67030,67110); > > > > > > > > NOTICE: QUERY PLAN: > > > > > > > > Index Scan using idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article, idx0_article, idx0_article, idx0_article, > > > idx0_article, > > > > idx0_article on article (cost=0.00..378.61 rows=67 width=77) > > > (actual > > > > time=0.08..5.17 rows=65 loops=1) > > > > Total runtime: 6.15 msec > > > > > > > > I get just another 6.15msec. > > > > > > > > Now I do think that the total runtime of 3b should really be > > > 3+3c, which > > > > amounts to roughly 50msec, but not nearly 4 seconds. What am I > > > doing > > > > wrong? > > > > > > > > Thanks for your advice! > > > > > > > > Regards, > > > > > > > > Markus > > > > > > > > ---------------------------(end of > > > broadcast)--------------------------- > > > > TIP 1: subscribe and unsubscribe commands go to > > > majordomo@postgresql.org > > > > > > > > > > Regards, > > > Oleg > > > _____________________________________________________________ > > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > > Sternberg Astronomical Institute, Moscow University (Russia) > > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > > > > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83