Multicolumn index - WHERE ... ORDER BY

Поиск
Список
Период
Сортировка
От Lucas Maystre
Тема Multicolumn index - WHERE ... ORDER BY
Дата
Msg-id 4B333A58.5090408@open.ch
обсуждение исходный текст
Ответы Re: Multicolumn index - WHERE ... ORDER BY  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Hi there,

I've got a small question about multicolumn indexes.

I have a table with ~5M rows (43 bytes per column -  is that relevant?)
(but eventually it could grow up to 50M rows), used to store e-mail
logs. I am trying to build a web frontend to search mails in this table.

I usually want only the last mails processed by my mail system, so
typically all my queries end with:
... ORDER BY time LIMIT 50;

Before that, I have usually have a WHERE clause on a indexed column.
Example of a query I might have:

SELECT id FROM mail WHERE from_address LIKE 'bill%'
ORDER BY time DESC LIMIT 50;

I observed that the ordering adds a significant overhead to my queries -
this seems quite logical, because of the ORDER BY which has to inspect
every row matching the WHERE clause.

The approach taken by the query planner is one of the following:

1) if it thinks there are "not so much" rows containg 'bill' as prefix
of the 'from_address' column, it performs an index scan (or a bitmap
index scan) using my index on 'from_address', then sorts all results
according to the 'time' column.

2) if it thinks there are "many" rows containing 'bill' as prefix of the
'from_address' column, it performs an reverse index scan using my index
on 'time', and looks "sequentially" if the 'from_address' column
contains 'bill' as prefix.

The problem is that "not so much" is in my case approx 10K rows
sometimes. It seems to be pretty costly to perform an (bitmap) index
scan over all these rows. As I only want the first few rows anyway
(LIMIT 50), I thought that there had to be some better solution.

The solution I had in mind was to create a multicolumn index over
'from_address' and 'time':

CREATE INDEX idx_from_time ON mail (from_address, time DESC);

so that it could directly use the 'time' ordering and lookup only the
first 50 rows using the index.

but... it doesn't work :-) i.e. my multicolumn index is never used. So:
- do you guys have any ideas why it doesn't work?
- do you see an alternative solution?

Infos:
- I use PostgreSQL 8.4.2
- I regularly VACUUM and ANALYZE my db. Statistics look OK.
- I'm relatively new to PostgreSQL, so maybe this question is trivial?

Thanks in advance, and happy holidays!

--
lucas maystre
trainee

open systems ag
raeffelstrasse 29
ch-8045 zurich
t: +41 44 455 74 00
f: +41 44 455 74 01
lum@open.ch

http://www.open.ch

В списке pgsql-performance по дате отправления:

Предыдущее
От: Michael Ruf
Дата:
Сообщение: Optimizer use of index slows down query by factor
Следующее
От: Ognjen Blagojevic
Дата:
Сообщение: SATA drives performance