Why it is using/not using index scan?

Поиск
Список
Период
Сортировка
От Laszlo Nagy
Тема Why it is using/not using index scan?
Дата
Msg-id 4D94B932.3050305@shopzeus.com
обсуждение исходный текст
Ответы Re: Why it is using/not using index scan?  (Merlin Moncure <mmoncure@gmail.com>)
Re: Why it is using/not using index scan?  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-performance
For this query:

select pp.id,pp.product_id,pp.selling_site_id,pp.asin
from product_price pp
where
(pp.asin is not null and pp.asin<>'')
and (pp.upload_status_id<>1)
and pp.selling_site_id in (8,7,35,6,9)
and (pp.last_od < 'now'::timestamp - '1 week'::interval )
limit 5000

Query plan is:

"Limit  (cost=9182.41..77384.80 rows=3290 width=35)"
"  ->  Bitmap Heap Scan on product_price pp  (cost=9182.41..77384.80
rows=3290 width=35)"
"        Recheck Cond: ((last_od < '2011-03-24
13:05:09.540025'::timestamp without time zone) AND (selling_site_id =
ANY ('{8,7,35,6,9}'::bigint[])))"
"        Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND
(upload_status_id <> 1))"
"        ->  Bitmap Index Scan on idx_product_price_last_od_ss
(cost=0.00..9181.59 rows=24666 width=0)"
"              Index Cond: ((last_od < '2011-03-24
13:05:09.540025'::timestamp without time zone) AND (selling_site_id =
ANY ('{8,7,35,6,9}'::bigint[])))"

For this query:

select pp.id,pp.product_id,pp.selling_site_id,pp.asin
from product_price pp
where
(pp.asin is not null and pp.asin<>'')
and (pp.upload_status_id<>1)
and pp.selling_site_id in (8,7,35,6,9)
and (pp.last_od + '1 week'::interval < 'now'::timestamp )
limit 5000

Query plan is:

"Limit  (cost=0.00..13890.67 rows=5000 width=35)"
"  ->  Seq Scan on product_price pp  (cost=0.00..485889.97 rows=174898
width=35)"
"        Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND
(upload_status_id <> 1) AND ((last_od + '7 days'::interval) <
'2011-03-31 13:06:17.460013'::timestamp without time zone) AND
(selling_site_id = ANY ('{8,7,35,6,9}'::bigint[])))"


The only difference is this: instead of (pp.last_od < 'now'::timestamp -
'1 week'::interval ) I have used (pp.last_od + '1 week'::interval <
'now'::timestamp )

First query with index scan opens in 440msec. The second query with seq
scan opens in about 22 seconds. So the first one is about 50x faster.

My concern is that we are working on a huge set of applications that use
thousands of different queries on a database. There are programs that we
wrote years ago. The database structure continuously changing. We are
adding new indexes and columns, and of course we are upgrading
PostgreSQL when a new stable version comes out. There are cases when a
change in a table affects 500+ queries in 50+ programs. I really did not
think that I have to be THAT CAREFUL with writing conditions in SQL. Do
I really have to manually analyze all those queries and "correct"
conditions like this?

If so, then at least I would like to know if there is a documentation or
wiki page where I can learn about "how not to write conditions". I just
figured out that I need to put constant expressions on one side of any
comparison, if possible. But probably there are other rules I wouldn't
think of.

Might it be possible to change the optimizer so that it tries to rally
constant expressions in the first place? That cannot be bad, right?

Thanks,

    Laszlo


В списке pgsql-performance по дате отправления:

Предыдущее
От: Maciek Sakrejda
Дата:
Сообщение: Re: COPY with high # of clients, partitioned table locking issues?
Следующее
От: Landreville
Дата:
Сообщение: Re: Calculating 95th percentiles