Re: Using FTI-Search (likely a more general runtime-puzzle)
От | Markus Wollny |
---|---|
Тема | Re: Using FTI-Search (likely a more general runtime-puzzle) |
Дата | |
Msg-id | 2266D0630E43BB4290742247C891057501B131B6@dozer.computec.de обсуждение исходный текст |
Ответ на | Using FTI-Search (likely a more general runtime-puzzle) ("Markus Wollny" <Markus.Wollny@computec.de>) |
Ответы |
Re: Using FTI-Search (likely a more general
(Oleg Bartunov <oleg@sai.msu.su>)
|
Список | pgsql-general |
Hi! I am about to evaluate the tsearch module from contrib, but as yet I haven't succeeded in getting it to run. I followed instructions on installation of tsearch; the new datatype and the corresponding functions seem all to be there. So I want to try it out on some existing data. In table article I want to set up a full-text index on column text. After adding the column textindex of type txtidx, I try to fill that column with data, but this just crashes the backend immediately: pcgames=# update article set textindex=txt2txtidx(text); pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The logfile doesn't really give much information on what exactly happened: 2002-07-29 13:37:20 [22127] DEBUG: database system is ready 2002-07-29 13:37:46 [22124] DEBUG: server process (pid 22137) was terminated by signal 11 2002-07-29 13:37:46 [22124] DEBUG: terminating any other active server processes 2002-07-29 13:37:46 [22124] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-07-29 13:37:46 [22139] FATAL 1: The database system is starting up 2002-07-29 13:37:46 [22138] DEBUG: database system was interrupted at 2002-07-29 13:37:20 CEST 2002-07-29 13:37:46 [22138] DEBUG: checkpoint record is at E/3B785168 2002-07-29 13:37:46 [22138] DEBUG: redo record is at E/3B785168; undo record is at 0/0; shutdown TRUE 2002-07-29 13:37:46 [22138] DEBUG: next transaction id: 179915804; next oid: 95552489 2002-07-29 13:37:46 [22138] DEBUG: database system was not properly shut down; automatic recovery in progress 2002-07-29 13:37:46 [22138] DEBUG: ReadRecord: record with zero length at E/3B7851A8 2002-07-29 13:37:46 [22138] DEBUG: redo is not required 2002-07-29 13:37:48 [22138] DEBUG: database system is ready Any ideas? Regards, Markus > -----Ursprungliche Nachricht----- > Von: Oleg Bartunov [mailto:oleg@sai.msu.su] > Gesendet: Mittwoch, 24. Juli 2002 21:20 > An: Markus Wollny > Betreff: Re: AW: [GENERAL] Using FTI-Search (likely a more > general runtime-puzzle) > > > On Wed, 24 Jul 2002, Markus Wollny wrote: > > > Hi! > > > > No I haven't as yet - does it provide other/better/more > features than > > the fti-module? > > > > Sure, it's designed mostly for full text search and comes from > Information Retrieval. There is a README file with some intro. > > > Regards, > > > > Markus > > > > -----UrsprУМngliche Nachricht----- > > Von: Oleg Bartunov > > Gesendet: Mi 24.07.2002 19:37 > > An: Markus Wollny > > Cc: pgsql-general@postgresql.org > > Betreff: Re: [GENERAL] Using FTI-Search (likely a more general > > runtime-puzzle) > > > > > > > > Markus, > > > > have you tried our contrib/tsearch module ? > > > > Oleg > > On Wed, 24 Jul 2002, Markus Wollny wrote: > > > > > Hello! > > > > > > I have implemented the full text index feature from /contrib > > and so far > > > the results look quite promising. We use it to index several > > columns of > > > an article table. The main column is called 'text'. An example > > for a > > > search would be to retrieve all the articles with "grand theft > > auto" in > > > them somewhere (not necessarily one string but individual > > words). > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > Case 1: Our Good Old Fashioned Search (TM) over the > > article-table: > > > > > > ANALYZE select distinct (p1.article_id) > > > from article p1, article p2, article p3 > > > where lower(p1.text) LIKE '%grand%' > > > AND lower(p2.text) LIKE '%theft%' > > > AND lower(p3.text) LIKE '%auto%' > > > AND p1.article_id=p2.article_id > > > and p1.article_id=p3.article_id > > > > > > Unique (cost=3067.03..3067.04 rows=1 width=12) (actual > > > time=1455.25..1455.44 rows=30 loops=1) > > > -> Sort (cost=3067.03..3067.03 rows=1 width=12) (actual > > > time=1455.24..1455.29 rows=30 loops=1) > > > -> Nested Loop (cost=1098.52..3067.02 rows=1 > > width=12) (actual > > > time=1211.54..1453.82 rows=30 loops=1) > > > -> Hash Join (cost=1098.52..2197.47 rows=6 > > width=8) > > > (actual time=1121.32..1309.23 rows=30 loops=1) > > > -> Seq Scan on article p2 > > (cost=0.00..1098.43 > > > rows=35 width=4) (actual time=352.03..539.28 rows=30 loops=1) > > > -> Hash (cost=1098.43..1098.43 rows=35 > > width=4) > > > (actual time=766.21..766.21 rows=0 loops=1) > > > -> Seq Scan on article p3 > > > (cost=0.00..1098.43 rows=35 width=4) (actual > > time=13.63..763.73 rows=411 > > > loops=1) > > > -> Index Scan using idx0_article on article p1 > > > (cost=0.00..140.79 rows=1 width=4) (actual time=4.79..4.80 > > rows=1 > > > loops=30) > > > Total runtime: 1456.00 msec > > > > > > This is the time, fti needs to beat in order to be of any use > > to us. > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 2: > > > Searching the FTI-Table: > > > EXPLAIN ANALYZE select distinct (p.article_id) > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > ='auto' > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > > > > > > > > > Unique (cost=39200652.79..39647427.57 rows=17870991 width=20) > > (actual > > > time=52506.81..52506.81 rows=0 loops=1) > > > -> Sort (cost=39200652.79..39200652.79 rows=178709913 > > width=20) > > > (actual time=52506.81..52506.81 rows=0 loops=1) > > > -> Merge Join (cost=222181.40..313718.83 > > rows=178709913 > > > width=20) (actual time=52505.69..52505.69 rows=0 loops=1) > > > -> Merge Join (cost=147175.62..170464.70 > > rows=1855421 > > > width=8) (actual time=37084.36..37084.36 rows=1 loops=1) > > > -> Sort (cost=73587.81..73587.81 > > rows=19264 > > > width=4) (actual time=22423.27..22423.27 rows=1 loops=1) > > > -> Seq Scan on article_fti f2 > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > time=18297.22..22422.20 > > > rows=97 loops=1) > > > -> Sort (cost=73587.81..73587.81 > > rows=19264 > > > width=4) (actual time=14661.04..14661.04 rows=1 loops=1) > > > -> Seq Scan on article_fti f3 > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > time=1344.87..14659.59 > > > rows=159 loops=1) > > > -> Sort (cost=75005.78..75005.78 rows=19264 > > width=12) > > > (actual time=15421.32..15421.32 rows=0 loops=1) > > > -> Hash Join (cost=1080.86..73634.84 > > rows=19264 > > > width=12) (actual time=15421.10..15421.10 rows=0 loops=1) > > > -> Seq Scan on article_fti f1 > > > (cost=0.00..72216.86 rows=19264 width=4) (actual > > time=4879.59..14306.97 > > > rows=350 loops=1) > > > -> Hash (cost=1063.29..1063.29 > > rows=7029 > > > width=8) (actual time=1109.79..1109.79 rows=0 loops=1) > > > -> Seq Scan on article p > > > (cost=0.00..1063.29 rows=7029 width=8) (actual > > time=12.20..1076.46 > > > rows=7029 loops=1) > > > Total runtime: 52507.92 msec > > > > > > This is way too long. I don't know why it doesn't use the > > indexes I gave > > > to it... If someone has got some idea as to what can be done > > to optimize > > > query-planner decisions, I'd be happy to comply. But anyway, I > > don't > > > worry about this too much and just force it to use them, once > > I know its > > > decision for seqential scan is wrong: > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 3: > > > Searching the FTI-Table forcing use of indexes: > > > SET ENABLE_SEQSCAN=OFF; > > > EXPLAIN ANALYZE select distinct (p.article_id) > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > ='auto' > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id; > > > SET ENABLE_SEQSCAN=ON; > > > > > > NOTICE: QUERY PLAN: > > > > > > Unique (cost=5341.68..5342.61 rows=37 width=20) (actual > > > time=40.52..41.83 rows=65 loops=1) > > > -> Sort (cost=5341.68..5341.68 rows=372 width=20) (actual > > > time=40.52..40.96rows=376 loops=1) > > > -> Nested Loop (cost=3067.05..5325.79 rows=372 > > width=20) > > > (actual time=14.02..38.38 rows=376 loops=1) > > > -> Merge Join (cost=3067.05..3084.32 rows=372 > > width=12) > > > (actualtime=13.95..20.45 rows=376 loops=1) > > > -> Merge Join (cost=2044.70..2054.37 > > rows=502 > > > width=8) (actual time=5.75..7.81 rows=176 loops=1) > > > -> Sort (cost=1022.35..1022.35 > > rows=678 > > > width=4) (actual time=2.31..2.44 rows=97 loops=1) > > > -> Index Scan using > > > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.16..1.74 rows=97 loops=1) > > > -> Sort (cost=1022.35..1022.35 > > rows=678 > > > width=4) (actual time=3.42..3.66 rows=194 loops=1) > > > -> Index Scan using > > > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.13..2.53 rows=159 loops=1) > > > -> Sort (cost=1022.35..1022.35 rows=678 > > width=4) > > > (actual time=8.18..8.89 rows=532 loops=1) > > > -> Index Scan using > > article_fti_string_idx on > > > article_fti f1 (cost=0.00..990.47 rows=678 width=4) (actual > > > time=0.13..5.98 rows=350 loops=1) > > > -> Index Scan using article_oid_idx on article > > p > > > (cost=0.00..6.01 rows=1 width=8) (actual time=0.03..0.03 > > rows=1 > > > loops=376) > > > Total runtime: 42.41 msec > > > > > > Not that's much more like it :) This query returns exactly 65 > > numbers. > > > But really we want not only the article-IDs, but also some > > info on the > > > article as well: > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 3b: > > > Retrieving additional article-info: > > > > > > SET ENABLE_SEQSCAN=OFF; > > > EXPLAIN ANALYZE > > > SELECT article_id, site_id, article_type, topstory, headline, > > published > > > from article where article_id in ( > > > select distinct (p.article_id) > > > from article p, article_fti f1, article_fti f2, article_fti f3 > > > where f1.string ='grand' and f2.string ='theft' and f3.string > > ='auto' > > > and p.oid=f1.id and p.oid=f2.id and p.oid=f3.id); > > > SET ENABLE_SEQSCAN=ON; > > > > > > NOTICE: QUERY PLAN: > > > > > > Seq Scan on article (cost=100000000.00..137645255.53 > > rows=3523 > > > width=77) (actual time=1756.30..3720.44 rows=65 loops=1) > > > SubPlan > > > -> Materialize (cost=5342.61..5342.61 rows=37 width=20) > > (actual > > > time=0.17..0.28 rows=65 loops=7046) > > > -> Unique (cost=5341.68..5342.61 rows=37 width=20) > > (actual > > > time=1182.07..1183.29 rows=65 loops=1) > > > -> Sort (cost=5341.68..5341.68 rows=372 > > width=20) > > > (actual time=1182.06..1182.52 rows=376 loops=1) > > > -> Nested Loop (cost=3067.05..5325.79 > > rows=372 > > > width=20) (actual time=318.55..1179.04 rows=376 loops=1) > > > -> Merge Join > > (cost=3067.05..3084.32 > > > rows=372 width=12) (actual time=300.97..314.39 rows=376 > > loops=1) > > > -> Merge Join > > (cost=2044.70..2054.37 > > > rows=502 width=8) (actual time=159.24..163.45 rows=176 > > loops=1) > > > -> Sort > > (cost=1022.35..1022.35 > > > rows=678 width=4) (actual time=71.23..71.45 rows=97 loops=1) > > > -> Index Scan > > using > > > article_fti_string_idx on article_fti f2 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=1.46..69.08 rows=97 loops=1) > > > -> Sort > > (cost=1022.35..1022.35 > > > rows=678 width=4) (actual time=87.97..88.39 rows=194 loops=1) > > > -> Index Scan > > using > > > article_fti_string_idx on article_fti f3 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.19..85.06 rows=159 loops=1) > > > -> Sort > > (cost=1022.35..1022.35 > > > rows=678 width=4) (actual time=141.70..143.03 rows=532 > > loops=1) > > > -> Index Scan using > > > article_fti_string_idx on article_fti f1 (cost=0.00..990.47 > > rows=678 > > > width=4) (actual time=0.17..138.95 rows=350 loops=1) > > > -> Index Scan using > > article_oid_idx on > > > article p (cost=0.00..6.01 rows=1 width=8) (actual > > time=1.65..2.27 > > > rows=1 loops=376) > > > Total runtime: 3722.74 msec > > > > > > Whoa! Why is that? How come this takes so extremely much > > longer? > > > Because... > > > > > > > > > -------------------------------------------------------------- > ---------- > > > -------------------- > > > > > > Case 3c: > > > If I just select all the ids, I get a total runtime of 42.41 > > msec. And > > > if I do a select ... where article_id in (all the results from > > case3): > > > > > > explain analyze SELECT article_id, site_id, article_type, > > topstory, > > > headline, published from article where article_id in > > > > > > (8965,8966,9701,10204,11114,11788,11941,12339,12685,12696,1278 > 4,13324,14 > > > > > > 049,30982,31697,31924,31941,31944,32470,36810,37950,38913,3973 > 6,40470,41 > > > > > > 330,45651,47050,47997,49692,51253,51730,51817,52111,52233,5265 > 3,52770,53 > > > > > > 090,54410,54812,54890,55310,55974,56051,56590,56610,56630,5695 > 0,57210,57 > > > > > > 690,59670,60951,62630,62650,63070,63130,63551,63570,65691,6571 > 0,65771,66 > > > 050,66490,66531,67030,67110); > > > > > > NOTICE: QUERY PLAN: > > > > > > Index Scan using idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article, idx0_article, idx0_article, idx0_article, > > idx0_article, > > > idx0_article on article (cost=0.00..378.61 rows=67 width=77) > > (actual > > > time=0.08..5.17 rows=65 loops=1) > > > Total runtime: 6.15 msec > > > > > > I get just another 6.15msec. > > > > > > Now I do think that the total runtime of 3b should really be > > 3+3c, which > > > amounts to roughly 50msec, but not nearly 4 seconds. What am I > > doing > > > wrong? > > > > > > Thanks for your advice! > > > > > > Regards, > > > > > > Markus > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to > > majordomo@postgresql.org > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > >
В списке pgsql-general по дате отправления: