Обсуждение: How can I get the query planner to use a bitmap index scap instead of an index scan ?

Поиск
Список
Период
Сортировка

How can I get the query planner to use a bitmap index scap instead of an index scan ?

От
Mohan Krishnan
Дата:
Hello folks,

I have a table of about 700k rows in Postgres 9.3.3, which has the
following structure:

Columns:
 content_body  - text
 publish_date  - timestamp without time zone
 published     - boolean

Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)
    "article_text_gin" gin (article_text)
    "articles_publish_date_id_index" btree (publish_date DESC NULLS
LAST, id DESC)

The query that I am making has a full text search query and a limit, as follows:

When I search for a string which is in my index with a limit and order
in the query it is fast:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'in_index') order by id limit 10;
                                                                QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1293.88 rows=10 width=1298) (actual
time=2.073..9.837 rows=10 loops=1)
   ->  Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711
rows=10 loops=1)
         Filter: (article_text @@ '''in_index'''::tsquery)
         Rows Removed by Filter: 611
 Total runtime: 9.952 ms

However if the string is not in the index it takes much longer:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id limit
10;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1293.88 rows=10 width=1298) (actual
time=5633.684..5633.684 rows=0 loops=1)
   ->  Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual
time=5633.672..5633.672 rows=0 loops=1)
         Filter: (article_text @@ '''not_in_index'''::tsquery)
         Rows Removed by Filter: 796146
 Total runtime: 5633.745 ms

However if I remove the order clause it is fast for either case:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'in_index')  limit 10;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=55.69..90.22 rows=10 width=1298) (actual
time=7.748..7.853 rows=10 loops=1)
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=7.735..7.781 rows=10 loops=1)
         Recheck Cond: (article_text @@ '''in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=5.977..5.977 rows=8910 loops=1)
               Index Cond: (article_text @@ '''in_index'''::tsquery)
 Total runtime: 7.952 ms


explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'not_in_index')  limit 10;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=55.69..90.22 rows=10 width=1298) (actual
time=0.083..0.083 rows=0 loops=1)
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=0.065..0.065 rows=0 loops=1)
         Recheck Cond: (article_text @@ '''not_in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=0.047..0.047 rows=0 loops=1)
               Index Cond: (article_text @@ '''not_in_index'''::tsquery)
 Total runtime: 0.163 ms

Removing the limit clause has the same effect, although the in index
query is noticably slower:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'in_index') order by id;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=12601.46..12610.40 rows=3573 width=1298) (actual
time=106.347..140.481 rows=8910 loops=1)
   Sort Key: id
   Sort Method: external merge  Disk: 12288kB
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=5.618..50.329 rows=8910 loops=1)
         Recheck Cond: (article_text @@ '''in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=4.243..4.243 rows=8910 loops=1)
               Index Cond: (article_text @@ '''in_index'''::tsquery)
 Total runtime: 170.987 ms

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=12601.46..12610.40 rows=3573 width=1298) (actual
time=0.067..0.067 rows=0 loops=1)
   Sort Key: id
   Sort Method: quicksort  Memory: 25kB
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=0.044..0.044 rows=0 loops=1)
         Recheck Cond: (article_text @@ '''not_in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=0.026..0.026 rows=0 loops=1)
               Index Cond: (article_text @@ '''not_in_index'''::tsquery)
 Total runtime: 0.148 ms

The little I can deduce is that overall, a bitmap index scan+bitmap
heap scan is overall better for my queries then an index scan. How can
I tell the query planner to do that though?


--
Mohan


Re: How can I get the query planner to use a bitmap index scap instead of an index scan ?

От
Jeff Janes
Дата:

On Fri, Mar 7, 2014 at 6:46 PM, Mohan Krishnan <mohangk@gmail.com> wrote:
Hello folks,

I have a table of about 700k rows in Postgres 9.3.3, which has the
following structure:

Columns:
 content_body  - text
 publish_date  - timestamp without time zone
 published     - boolean

Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)
    "article_text_gin" gin (article_text)
    "articles_publish_date_id_index" btree (publish_date DESC NULLS
LAST, id DESC)

Your indexes are on columns that are not in the list of columns you gave.  Can you show us the actual table and index definitions?


   ->  Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711
rows=10 loops=1)
         Filter: (article_text @@ '''in_index'''::tsquery)
...
 
   ->  Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual
time=5633.672..5633.672 rows=0 loops=1)
         Filter: (article_text @@ '''not_in_index'''::tsquery)

Those estimates are way off, and it is not clear why they would be.  Have you analyzed your table recently?

Cheers,

Jeff

Re: How can I get the query planner to use a bitmap index scap instead of an index scan ?

От
Mohan Krishnan
Дата:
On Mon, Mar 10, 2014 at 4:46 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Fri, Mar 7, 2014 at 6:46 PM, Mohan Krishnan <mohangk@gmail.com> wrote:
>>
>> Hello folks,
>>
>> I have a table of about 700k rows in Postgres 9.3.3, which has the
>> following structure:
>>
>> Columns:
>>  content_body  - text
>>  publish_date  - timestamp without time zone
>>  published     - boolean
>>
>> Indexes:
>>     "articles_pkey" PRIMARY KEY, btree (id)
>>     "article_text_gin" gin (article_text)
>>     "articles_publish_date_id_index" btree (publish_date DESC NULLS
>> LAST, id DESC)
>
>
> Your indexes are on columns that are not in the list of columns you gave.
> Can you show us the actual table and index definitions?


Sorry about that, here is the table and the index definitions

                                          Table "public.articles"
        Column        |            Type             |
     Modifiers
----------------------+-----------------------------+-------------------------------------------------------
 id                   | integer                     | not null default
nextval('articles_id_seq'::regclass)
 title                | text                        | not null
 content_body         | text                        |
 publish_date         | timestamp without time zone |
 created_at           | timestamp without time zone | not null
 published            | boolean                     |
 updated_at           | timestamp without time zone | not null
 category_id          | integer                     | not null
 article_text         | tsvector                    |

Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)
    "article_text_gin" gin (article_text)
    "articles_category_id_index" btree (category_id)
    "articles_created_at" btree (created_at)
    "articles_publish_date_id_index" btree (publish_date DESC NULLS
LAST, id DESC)
    "articles_published_index" btree (published)

>
>>    ->  Index Scan using articles_pkey on articles
>> (cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711
>> rows=10 loops=1)
>>          Filter: (article_text @@ '''in_index'''::tsquery)
>
> ...
>
>>
>>    ->  Index Scan using articles_pkey on articles
>> (cost=0.42..462150.49 rows=3573 width=1298) (actual
>> time=5633.672..5633.672 rows=0 loops=1)
>>          Filter: (article_text @@ '''not_in_index'''::tsquery)
>
>
> Those estimates are way off, and it is not clear why they would be.  Have
> you analyzed your table recently?

Yes I have analyzed them and rerun the queries - there is no
difference. What more debugging information can should I look at to
determine why the estimates are way off ?

> Cheers,
>
> Jeff



--
Mohan