more] indexed regex select optimisations?

Поиск
Список
Период
Сортировка
От Stuart Woolford
Тема more] indexed regex select optimisations?
Дата
Msg-id 99110813015600.00960@test.macmillan.co.nz
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?  (Charles Tassell <ctassell@isn.net>)
Список pgsql-general
Well, I've improved my regex text searches to actually use the indexes properly
now for the basic case, but I have found another 'problem' (or feature, call it
what you will ;) - to demonstrate:
with locale turned on (the default RPMS are like this):

the following takes a LONG time to run on 1.6 million records:
-------------------------------------
explain select isbn, count from inv_word_i where
word~'^foo'
order by count

Sort  (cost=35148.70 rows=353 width=16)
  ->  Index Scan using i3 on inv_word_i  (cost=35148.70 rows=353 width=16)
-------------------------------------
the following runs instantly, and does (nearly) the same thing:
-------------------------------------
explain select isbn, count from inv_word_i where
word>='foo' and word<'fop'
order by count

Sort  (cost=11716.57 rows=183852 width=16)
  ->  Index Scan using i3 on inv_word_i  (cost=11716.57 rows=183852 width=16)
-------------------------------------
but what about the following? :
-------------------------------------
explain select isbn , sum(count) from inv_word_i where
(word>='window' and word<'windox')
or
(word>='idiot' and word<'idiou')
group by isbn
order by sum(count) desc

Sort  (cost=70068.84 rows=605525 width=16)
  ->  Aggregate  (cost=70068.84 rows=605525 width=16)
        ->  Group  (cost=70068.84 rows=605525 width=16)
              ->  Sort  (cost=70068.84 rows=605525 width=16)
                    ->  Seq Scan on inv_word_i  (cost=70068.84 rows=605525 width=16)
-------------------------------------

this is the fastest way I've found so far to do a multi-word search (window and
idiot as the root words in this case), you note it does NOT use the indexes,
but falls back to a linear scan?!? it takes well over 30 seconds (much much too
long)

I've tried a LOT of different combinations, and have yet to find a way of
getting the system to use the indexes correctly to do what I want, the closest
I've ffound is using a select intersect select method to find all docs
containing both word (what I really want, although the query above is a ranked
or query), but it gets slow as soon as I select more than one field for the
results (I need to line isbn in this case to another database in the final
application)

I assume there is some reason the system falls back to a linear scan in this
case? it seems two index lookups would be much much more efficient..

am I missing something again?

--
------------------------------------------------------------
Stuart Woolford, stuartw@newmail.net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------

В списке pgsql-general по дате отправления:

Предыдущее
От: Stuart Woolford
Дата:
Сообщение: Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?
Следующее
От: Matthias Teege
Дата:
Сообщение: Compiling problems