Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

Поиск
Список
Период
Сортировка
От johno
Тема Slow query with indexed ORDER BY and LIMIT when using OR'd conditions
Дата
Msg-id CACuOPqC9h6vK5rniezSOm2fu44PNp0NrB8kdtapT=xqNzQgidA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions
Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions
Список pgsql-performance
Hi there,

I am trying to optimize a simple query that returns first 100 rows that have been updated since a given timestamp (ordered by timestamp and id desc).  If there are several rows with the same timestamp I need to a second condition, that states that I want to return rows having the given timestamp and id > given id.

The obvious query is 

SELECT * FROM register_uz_accounting_entities
    WHERE effective_on > '2014-07-11' OR (effective_on = '2014-07-11' AND id > 1459)
    ORDER BY effective_on, id
    LIMIT 100

With a composite index on (effective_on, id)

Query plan

"Limit  (cost=4613.70..4613.95 rows=100 width=1250) (actual time=0.122..0.130 rows=22 loops=1)"
"  Buffers: shared hit=28"
"  ->  Sort  (cost=4613.70..4617.33 rows=1453 width=1250) (actual time=0.120..0.122 rows=22 loops=1)"
"        Sort Key: effective_on, id"
"        Sort Method: quicksort  Memory: 30kB"
"        Buffers: shared hit=28"
"        ->  Bitmap Heap Scan on register_uz_accounting_entities  (cost=35.42..4558.17 rows=1453 width=1250) (actual time=0.036..0.083 rows=22 loops=1)"
"              Recheck Cond: ((effective_on > '2014-07-11'::date) OR ((effective_on = '2014-07-11'::date) AND (id > 1459)))"
"              Buffers: shared hit=28"
"              ->  BitmapOr  (cost=35.42..35.42 rows=1453 width=0) (actual time=0.026..0.026 rows=0 loops=1)"
"                    Buffers: shared hit=6"
"                    ->  Bitmap Index Scan on idx2  (cost=0.00..6.49 rows=275 width=0) (actual time=0.017..0.017 rows=15 loops=1)"
"                          Index Cond: (effective_on > '2014-07-11'::date)"
"                          Buffers: shared hit=3"
"                    ->  Bitmap Index Scan on idx2  (cost=0.00..28.21 rows=1178 width=0) (actual time=0.007..0.007 rows=7 loops=1)"
"                          Index Cond: ((effective_on = '2014-07-11'::date) AND (id > 1459))"
"                          Buffers: shared hit=3"
"Total runtime: 0.204 ms"


Everything works as expected. However if I change the constraint to a timestamp/date more in the past (resulting in far more matching rows) the query slows down drastically.

SELECT * FROM register_uz_accounting_entities
WHERE effective_on > '2014-06-11' OR (effective_on = '2014-06-11' AND id > 1459)
ORDER BY effective_on, id
LIMIT 100
 
"Limit  (cost=0.42..649.46 rows=100 width=1250) (actual time=516.125..516.242 rows=100 loops=1)"
"  Buffers: shared hit=576201"
"  ->  Index Scan using idx2 on register_uz_accounting_entities  (cost=0.42..106006.95 rows=16333 width=1250) (actual time=516.122..516.229 rows=100 loops=1)"
"        Filter: ((effective_on > '2014-06-11'::date) OR ((effective_on = '2014-06-11'::date) AND (id > 1459)))"
"        Rows Removed by Filter: 797708"
"        Buffers: shared hit=576201"
"Total runtime: 516.304 ms"


I've tried to optimize this query by pushing down the limit and order by's into explicit subselects.

SELECT * FROM (
   SELECT * FROM register_uz_accounting_entities 
   WHERE effective_on > '2014-06-11'
   ORDER BY effective_on, id LIMIT 100
   ) t1
UNION
  SELECT * FROM (
    SELECT * FROM register_uz_accounting_entities
    WHERE effective_on = '2014-06-11' AND id > 1459
    ORDER BY effective_on, id LIMIT 100
) t2
ORDER BY effective_on, id
LIMIT 100
 
-- query plan
"Limit  (cost=684.29..684.54 rows=100 width=1250) (actual time=2.648..2.708 rows=100 loops=1)"
"  Buffers: shared hit=203"
"  ->  Sort  (cost=684.29..684.79 rows=200 width=1250) (actual time=2.646..2.672 rows=100 loops=1)"
"        Sort Key: register_uz_accounting_entities.effective_on, register_uz_accounting_entities.id"
"        Sort Method: quicksort  Memory: 79kB"
"        Buffers: shared hit=203"
"        ->  HashAggregate  (cost=674.65..676.65 rows=200 width=1250) (actual time=1.738..1.971 rows=200 loops=1)"
"              Buffers: shared hit=203"
"              ->  Append  (cost=0.42..661.15 rows=200 width=1250) (actual time=0.054..0.601 rows=200 loops=1)"
"                    Buffers: shared hit=203"
"                    ->  Limit  (cost=0.42..338.62 rows=100 width=1250) (actual time=0.053..0.293 rows=100 loops=1)"
"                          Buffers: shared hit=101"
"                          ->  Index Scan using idx2 on register_uz_accounting_entities  (cost=0.42..22669.36 rows=6703 width=1250) (actual time=0.052..0.260 rows=100 loops=1)"
"                                Index Cond: (effective_on > '2014-06-11'::date)"
"                                Buffers: shared hit=101"
"                    ->  Limit  (cost=0.42..318.53 rows=100 width=1250) (actual time=0.037..0.228 rows=100 loops=1)"
"                          Buffers: shared hit=102"
"                          ->  Index Scan using idx2 on register_uz_accounting_entities register_uz_accounting_entities_1  (cost=0.42..30888.88 rows=9710 width=1250) (actual time=0.036..0.187 rows=100 loops=1)"
"                                Index Cond: ((effective_on = '2014-06-11'::date) AND (id > 1459))"
"                                Buffers: shared hit=102"
"Total runtime: 3.011 ms"

=> Very fast.

The question is... why is the query planner unable to make this optimization for the slow query? What am I missing?


Thanks in advance.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: 60 core performance with 9.3
Следующее
От: David G Johnston
Дата:
Сообщение: Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions