Re: bitmap heap scan recheck on full text search with gin index
От | Oleg Bartunov |
---|---|
Тема | Re: bitmap heap scan recheck on full text search with gin index |
Дата | |
Msg-id | Pine.LNX.4.64.0902060720070.4158@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | bitmap heap scan recheck on full text search with gin index (Hal Roberts <hroberts@cyber.law.harvard.edu>) |
Список | pgsql-performance |
Hal, just create separate column with tsvector and create index on it. Oleg On Thu, 5 Feb 2009, Hal Roberts wrote: > Hi All, > > I'm getting poor performance on full text searches that return lots of > entries from a table with about 7 million rows. I think the cause is > rechecking the text match on all of the returned rows, even though I'm using > a @@ query on a gin index, which the docs say should not require a recheck. > > Here's the table: > > **** > mediacloud=> \d download_texts; > Table "public.download_texts" > Column | Type | Modifiers > -------------------+---------+---------------------------------------------------------------------------- > download_texts_id | integer | not null default > nextval('download_texts_download_texts_id_seq'::regclass) > downloads_id | integer | not null > download_text | text | not null > Indexes: > "download_texts_pkey" PRIMARY KEY, btree (download_texts_id) > "download_texts_downloads_id_index" UNIQUE, btree (downloads_id) > "download_texts_textsearch_idx" gin (to_tsvector('english'::regconfig, > download_text)), tablespace "large_table_space" > Foreign-key constraints: > "download_texts_downloads_id_fkey" FOREIGN KEY (downloads_id) REFERENCES > downloads(downloads_id) > Tablespace: "large_table_space" > **** > > And here's the query: > > **** > mediacloud=> explain analyze select count(dt.download_texts_id) from > download_texts dt where to_tsvector('english', download_text) @@ > to_tsquery('english', 'stimulus'); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=26161.16..26161.17 rows=1 width=4) (actual > time=153640.083..153640.083 rows=1 loops=1) > -> Bitmap Heap Scan on download_texts dt (cost=3937.41..26146.11 rows=6018 > width=4) (actual time=1957.074..153529.351 rows=72225 loops=1) > Recheck Cond: (to_tsvector('english'::regconfig, download_text) @@ > '''stimulus'''::tsquery) > -> Bitmap Index Scan on download_texts_textsearch_idx > (cost=0.00..3935.90 rows=6018 width=0) (actual time=1048.556..1048.556 > rows=72225 loops=1) > Index Cond: (to_tsvector('english'::regconfig, download_text) @@ > '''stimulus'''::tsquery) > Total runtime: 153642.249 ms > **** > > Nearly all of the time is being spent in the bitmap heap scan, I suspect > because of the work of rereading and rechecking the text of all the matched > entries. Is this indeed what's going on here? Is there any way to make > postgres not do that recheck? > > > Thanks! > > -hal > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-performance по дате отправления:
Предыдущее
От: Hal RobertsДата:
Сообщение: bitmap heap scan recheck on full text search with gin index