indexed regex select optimisation missing?

Поиск
Список
Период
Сортировка
От Stuart Woolford
Тема indexed regex select optimisation missing?
Дата
Msg-id 99110410093200.00683@test.macmillan.co.nz
обсуждение исходный текст
Ответ на Problem: pq_recvbuf: unexpected EOF of client connection  ("Natalya S. Makushina" <mak@rtsoft.msk.ru>)
Ответы Re: [GENERAL] indexed regex select optimisation missing?  ("Gene Selkov, Jr." <selkovjr@mcs.anl.gov>)
Список pgsql-general
Hello people.

Please bear with me, as I think I may have found either a bug or 'missing
optimisation feature' in postgresql 6.5.2..

 I'm trying to use postgresql 6.5.2 to implement (amongst other things) a
searchable word index, ie: I have a table called 'inv_word_i' which contains
the fields

create table inv_word_i ( key char(10), word char(16), count int ) ;

and index

create index i3 on inv_word_i ( word );

and currently has 1,654,662 entries (1.6 million), now, I currently search
this with:

select key from inv_word_i where word='whatever' order by count desc ;

and this is fast, however, if I use:

select key from inv_word_i where word~'^whatever.*' order by count desc ;

it is very slow.

the explains for both are:
query 1:
Sort  (cost=67.82 rows=1057 width=16)
  ->  Index Scan using i3 on inv_word_i  (cost=67.82 rows=1057 width=16)
query 2:
Sort  (cost=35148.70 rows=353 width=16)
  ->  Index Scan using i3 on inv_word_i  (cost=35148.70 rows=353 width=16)

now, the explain on a query for ~'.*whatever.*' gives a cost of 70000.

now, the documentation says that the index will be used for a regex query that
has the left side tied (by using the ^ start of line match), and it seems to be
doing this, but it then scans the whole rest of the file, not using the obvious
optimisation of stoppping as soon as the non-variable left part of the regex is
no longer matched. I have verified this, as the above ^ based regex search
takes a long time for a word starting with z, and a massive time for one
starting this a, and a linear difference as I progress through the alphabet.

Is there any reason why this optimisation is not used? it seems like a rather
important one to myself, and would speed up queries of this type by a massive
amount (on average, the same amount as using the search to locate the start of
the search scan)

I've downloaded the source, and will start looking into this, but having not
touched the postgresql source before, I'm not holding my breath fort a quick
resolution from myself,

any ideas?

--
------------------------------------------------------------
Stuart Woolford, stuartw@newmail.net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------

В списке pgsql-general по дате отправления:

Предыдущее
От: Scott Perkins
Дата:
Сообщение: PHP4 Review "Brand New" Its Hot!
Следующее
От: Andrew Perrin - Demography
Дата:
Сообщение: Re: [GENERAL] Functions