Обсуждение: How can I get the query planner to use a bitmap index scap instead of an index scan ?
How can I get the query planner to use a bitmap index scap instead of an index scan ?
От
Mohan Krishnan
Дата:
Hello folks, I have a table of about 700k rows in Postgres 9.3.3, which has the following structure: Columns: content_body - text publish_date - timestamp without time zone published - boolean Indexes: "articles_pkey" PRIMARY KEY, btree (id) "article_text_gin" gin (article_text) "articles_publish_date_id_index" btree (publish_date DESC NULLS LAST, id DESC) The query that I am making has a full text search query and a limit, as follows: When I search for a string which is in my index with a limit and order in the query it is fast: explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'in_index') order by id limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.42..1293.88 rows=10 width=1298) (actual time=2.073..9.837 rows=10 loops=1) -> Index Scan using articles_pkey on articles (cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711 rows=10 loops=1) Filter: (article_text @@ '''in_index'''::tsquery) Rows Removed by Filter: 611 Total runtime: 9.952 ms However if the string is not in the index it takes much longer: explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id limit 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..1293.88 rows=10 width=1298) (actual time=5633.684..5633.684 rows=0 loops=1) -> Index Scan using articles_pkey on articles (cost=0.42..462150.49 rows=3573 width=1298) (actual time=5633.672..5633.672 rows=0 loops=1) Filter: (article_text @@ '''not_in_index'''::tsquery) Rows Removed by Filter: 796146 Total runtime: 5633.745 ms However if I remove the order clause it is fast for either case: explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'in_index') limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=55.69..90.22 rows=10 width=1298) (actual time=7.748..7.853 rows=10 loops=1) -> Bitmap Heap Scan on articles (cost=55.69..12390.60 rows=3573 width=1298) (actual time=7.735..7.781 rows=10 loops=1) Recheck Cond: (article_text @@ '''in_index'''::tsquery) -> Bitmap Index Scan on article_text_gin (cost=0.00..54.80 rows=3573 width=0) (actual time=5.977..5.977 rows=8910 loops=1) Index Cond: (article_text @@ '''in_index'''::tsquery) Total runtime: 7.952 ms explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'not_in_index') limit 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Limit (cost=55.69..90.22 rows=10 width=1298) (actual time=0.083..0.083 rows=0 loops=1) -> Bitmap Heap Scan on articles (cost=55.69..12390.60 rows=3573 width=1298) (actual time=0.065..0.065 rows=0 loops=1) Recheck Cond: (article_text @@ '''not_in_index'''::tsquery) -> Bitmap Index Scan on article_text_gin (cost=0.00..54.80 rows=3573 width=0) (actual time=0.047..0.047 rows=0 loops=1) Index Cond: (article_text @@ '''not_in_index'''::tsquery) Total runtime: 0.163 ms Removing the limit clause has the same effect, although the in index query is noticably slower: explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'in_index') order by id; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=12601.46..12610.40 rows=3573 width=1298) (actual time=106.347..140.481 rows=8910 loops=1) Sort Key: id Sort Method: external merge Disk: 12288kB -> Bitmap Heap Scan on articles (cost=55.69..12390.60 rows=3573 width=1298) (actual time=5.618..50.329 rows=8910 loops=1) Recheck Cond: (article_text @@ '''in_index'''::tsquery) -> Bitmap Index Scan on article_text_gin (cost=0.00..54.80 rows=3573 width=0) (actual time=4.243..4.243 rows=8910 loops=1) Index Cond: (article_text @@ '''in_index'''::tsquery) Total runtime: 170.987 ms explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Sort (cost=12601.46..12610.40 rows=3573 width=1298) (actual time=0.067..0.067 rows=0 loops=1) Sort Key: id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on articles (cost=55.69..12390.60 rows=3573 width=1298) (actual time=0.044..0.044 rows=0 loops=1) Recheck Cond: (article_text @@ '''not_in_index'''::tsquery) -> Bitmap Index Scan on article_text_gin (cost=0.00..54.80 rows=3573 width=0) (actual time=0.026..0.026 rows=0 loops=1) Index Cond: (article_text @@ '''not_in_index'''::tsquery) Total runtime: 0.148 ms The little I can deduce is that overall, a bitmap index scan+bitmap heap scan is overall better for my queries then an index scan. How can I tell the query planner to do that though? -- Mohan
On Fri, Mar 7, 2014 at 6:46 PM, Mohan Krishnan <mohangk@gmail.com> wrote:
Hello folks,
I have a table of about 700k rows in Postgres 9.3.3, which has the
following structure:
Columns:
content_body - text
publish_date - timestamp without time zone
published - boolean
Indexes:
"articles_pkey" PRIMARY KEY, btree (id)
"article_text_gin" gin (article_text)
"articles_publish_date_id_index" btree (publish_date DESC NULLS
LAST, id DESC)
Your indexes are on columns that are not in the list of columns you gave. Can you show us the actual table and index definitions?
-> Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711
rows=10 loops=1)
Filter: (article_text @@ '''in_index'''::tsquery)
...
-> Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual
time=5633.672..5633.672 rows=0 loops=1)
Filter: (article_text @@ '''not_in_index'''::tsquery)
Those estimates are way off, and it is not clear why they would be. Have you analyzed your table recently?
Cheers,
Jeff
Re: How can I get the query planner to use a bitmap index scap instead of an index scan ?
От
Mohan Krishnan
Дата:
On Mon, Mar 10, 2014 at 4:46 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > > On Fri, Mar 7, 2014 at 6:46 PM, Mohan Krishnan <mohangk@gmail.com> wrote: >> >> Hello folks, >> >> I have a table of about 700k rows in Postgres 9.3.3, which has the >> following structure: >> >> Columns: >> content_body - text >> publish_date - timestamp without time zone >> published - boolean >> >> Indexes: >> "articles_pkey" PRIMARY KEY, btree (id) >> "article_text_gin" gin (article_text) >> "articles_publish_date_id_index" btree (publish_date DESC NULLS >> LAST, id DESC) > > > Your indexes are on columns that are not in the list of columns you gave. > Can you show us the actual table and index definitions? Sorry about that, here is the table and the index definitions Table "public.articles" Column | Type | Modifiers ----------------------+-----------------------------+------------------------------------------------------- id | integer | not null default nextval('articles_id_seq'::regclass) title | text | not null content_body | text | publish_date | timestamp without time zone | created_at | timestamp without time zone | not null published | boolean | updated_at | timestamp without time zone | not null category_id | integer | not null article_text | tsvector | Indexes: "articles_pkey" PRIMARY KEY, btree (id) "article_text_gin" gin (article_text) "articles_category_id_index" btree (category_id) "articles_created_at" btree (created_at) "articles_publish_date_id_index" btree (publish_date DESC NULLS LAST, id DESC) "articles_published_index" btree (published) > >> -> Index Scan using articles_pkey on articles >> (cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711 >> rows=10 loops=1) >> Filter: (article_text @@ '''in_index'''::tsquery) > > ... > >> >> -> Index Scan using articles_pkey on articles >> (cost=0.42..462150.49 rows=3573 width=1298) (actual >> time=5633.672..5633.672 rows=0 loops=1) >> Filter: (article_text @@ '''not_in_index'''::tsquery) > > > Those estimates are way off, and it is not clear why they would be. Have > you analyzed your table recently? Yes I have analyzed them and rerun the queries - there is no difference. What more debugging information can should I look at to determine why the estimates are way off ? > Cheers, > > Jeff -- Mohan