Re: text_position worst case runtime

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: text_position worst case runtime
Дата
Msg-id 446E2683.7090700@markdilger.com
обсуждение исходный текст
Ответ на Re: text_position worst case runtime  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> 
>>Tom Lane <tgl@sss.pgh.pa.us> writes:
>>
>>>And how much code would those take?  The bottom line here is that we
>>>don't have a pile of complaints about the performance of text_position,
>>>so it's difficult to justify making it much more complicated than it
>>>is now.
> 
> 
>>It seems somewhat contrary to the Postgres design philosophy to assume that
>>all strings are small.
> 
> 
> That is a straw-man argument.  If we try to optimize every single
> function in the system to the Nth degree, we'll end up with a system
> that is unmaintainable (and likely unusably buggy as well).  We've got
> to set limits on the amount of complexity we're willing to accept in
> the core code.
> 
> Note that I have not said "you can't put Boyer-Moore into core".
> What I've said is that the case to justify doing that hasn't been made.
> And handwaving about "design philosophy" isn't the kind of case I'm
> looking for --- common applications in which it makes a real performance
> difference are what I'm looking for.
> 
> At this point we haven't even been shown any evidence that text_position
> itself is what to optimize if you need to do searches in large text
> strings.  It seems entirely likely to me that the TOAST mechanisms would
> be the bottleneck, instead.  And one should also consider other approaches
> entirely, like indexes (tsearch2 anyone?).

In case anyone is following this thread specifically for the biological sequence
data aspect of it, I should mention that I wrote a GiST index for the dna and
protein sequence datatypes.  The performance of the index was inconsistent.  For
certain data, I could get about two orders of magnitude speed increase on
selects, where the select was based on a limited regular expression approximate
match against the data.  But if you change the regular expression (or to a
degree, if you change the data) the performance can drop off to roughly tied
with a sequential scan.  And of course, inserts are far more expensive because
the index has to be kept up to date.

If anyone wants specifics, send me an email and I'll put something together.

mark


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: text_position worst case runtime
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [pgsql-advocacy] OO PostgreSQL Driver