Re: TODO item: Implement Boyer-Moore searching in LIKE queries

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Дата
Msg-id 20160802175657.GA555359@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: TODO item: Implement Boyer-Moore searching in LIKE queries  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: TODO item: Implement Boyer-Moore searching in LIKE queries  (Karan Sikka <karanssikka@gmail.com>)
Re: TODO item: Implement Boyer-Moore searching in LIKE queries  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas wrote:
> On Mon, Aug 1, 2016 at 1:19 PM, Karan Sikka <karanssikka@gmail.com> wrote:
> > Following the patch to implement strpos with Boyer-Moore-Horspool,
> > it was proposed we bring BMH to LIKE as well.
> >
> > Original thread:
> > https://www.postgresql.org/message-id/flat/27645.1220635769%40sss.pgh.pa.us#27645.1220635769@sss.pgh.pa.us
> >
> > I'm a first time hacker and I found this task on the TODO list. It seemed
> > interesting and isolated enough. But after looking at the code in
> > like_match.c, it's clearly a non-trivial task.
> >
> > How desirable is this feature? To begin answering that question,
> > I did some initial benchmarking with an English text corpus to find how much
> > faster BMH (Boyer-Moore-Horspool) would be compared to LIKE, and the results
> > were as follows: BMH can be up to 20% faster than LIKE, but for short
> > substrings, it's roughly comparable or slower.
> >
> > Here are the results visualized:
> >
> > http://ctrl-c.club/~ksikka/pg/like-strpos-short-1469975400.png
> > http://ctrl-c.club/~ksikka/pg/like-strpos-long-1469975400.png
> 
> Based on these results, this looks to me like a pretty unexciting
> thing upon which to spend time.

Uh, a 20% different is "unexciting" for you?  I think it's interesting.
Now, really, users shouldn't be running LIKE on constant strings all the
time but rather use some sort of indexed search, but once in a while
there is a need to run some custom query and you need to LIKE-scan a
large portion of a table.  For those cases an algorithm that performs
20% better is surely welcome.

I wouldn't be so quick to dismiss this.

Of course, it needs to work in all cases, or failing that, be able to
fall back to the original code if it cannot support some corner case.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Harvey
Дата:
Сообщение: Re: MSVC pl-perl error message is not verbose enough
Следующее
От: Shay Rojansky
Дата:
Сообщение: Re: Slowness of extended protocol