Slight change in query leads to unexpected change in query plan

Поиск
Список
Период
Сортировка
От Jack Orenstein
Тема Slight change in query leads to unexpected change in query plan
Дата
Msg-id 4A3FFDD0.8020809@hds.com
обсуждение исходный текст
Ответы Re: Slight change in query leads to unexpected change in query plan  (Sam Mason <sam@samason.me.uk>)
Список pgsql-general
I have a table that looks like this:

   create table T(pk int not null, value bytea, ..., primary key(pk))

I want to scan the table in batches of 100. I'll do this by issuing a sequence
of queries like this:

     select *
     from T
     where pk > ?
     and value = ?
     order by pk
     limit 100

After each query, I'll record the last value seen and use that to
drive the next query. The obvious (to me) execution plan is to use the
index, do an index scan, and then filter using the restriction on
value. I have some realistic data (loaded into postgres 8.3.7) and
I've run analyze. I'm not getting a very good execution plan:

ris=# explain
ris-# select *
ris-# from T
ris-# where pk > 1000000000
ris-# and value = 'asdf'::bytea
ris-# order by pk
ris-# limit 100;
                                          QUERY PLAN

---------------------------------------------------------------------------------------------
  Limit  (cost=78352.20..78352.24 rows=16 width=451)
    ->  Sort  (cost=78352.20..78352.24 rows=16 width=451)
          Sort Key: pk
          ->  Bitmap Heap Scan on t  (cost=2091.60..78351.88 rows=16 width=451)
                Recheck Cond: (pk > 1000000000)
                Filter: (value = 'asdf'::bytea)
                ->  Bitmap Index Scan on t_pkey  (cost=0.00..2091.60 rows=91088
width=0)
                      Index Cond: (pk > 1000000000)

But if I remove the value restriction, I get the plan I was hoping for:

ris=# explain
ris-# select *
ris-# from T
ris-# where pk > 1000000000
ris-# order by pk
ris-# limit 100;
                                             QUERY PLAN

--------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..324.99 rows=100 width=451)
    ->  Index Scan using t_pkey on t  (cost=0.00..296027.98 rows=91088 width=451)
          Index Cond: (pk > 1000000000)
(3 rows)

Why is this?

This is an obvious rewrite, e.g.

     select *
     from (select * from T where pk > ? order by pk limit 100) x
     where value = ?

and this produces a good query plan.  But this means that fewer than
100 rows are returned. For reasons too boring to go into, that would
be very inconvenient for my application.

Why does adding the value restriction so radically change the execution plan?

Jack Orenstein

В списке pgsql-general по дате отправления:

Предыдущее
От: Gerry Reno
Дата:
Сообщение: Replication
Следующее
От: Mike Toews
Дата:
Сообщение: Re: Select ranges based on sequential breaks