Re: strpos() && KMP

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: strpos() && KMP
Дата
Msg-id 12142.1186626260@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: strpos() && KMP  (Pavel Ajtkulov <ajtkulov@acm.org>)
Ответы Re: strpos() && KMP  (Pavel Ajtkulov <ajtkulov@acm.org>)
Список pgsql-patches
Pavel Ajtkulov <ajtkulov@acm.org> writes:
> Tom Lane writes:
>> The difficulty with B-M is the need for a table indexed by character
>> code, which at first glance looks impractical for wchars.  But it seems
>> to me that we could use "wchar % 256" as the table index, meaning that
>> wchars with the same trailing byte share the same table entry.

> hash table?

I'd think the cost of hashing would render it impractical.  Most of the
papers I've seen on this topic worry about getting single instructions
out of the search loop --- a hash lookup will cost lots more than that.
Moreover, you'd lose the guarantee of not-worse-than-linear time,
because hash lookup can be pathologically bad if you get a lot of hash
collisions.

> The main difficulty with BM is verification and understanding "good
> suffix shift" (the second part of BM) (I don't understand it entirely).

Yeah, there seem to be a bunch of variants of BM (many of them not
guaranteed linear, which I'm sure we don't want) and the earliest
papers had bugs.  But a good implementation would be a lot easier
sell because it would show benefits for a much wider set of use-cases
than KMP.

            regards, tom lane

В списке pgsql-patches по дате отправления:

Предыдущее
От: "Brendan Jurd"
Дата:
Сообщение: Re: Default location for docbook stylesheets in Gentoo
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Default location for docbook stylesheets in Gentoo