Re: TODO item: Implement Boyer-Moore searching in LIKE queries

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Дата
Msg-id 8747.1470165235@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: TODO item: Implement Boyer-Moore searching in LIKE queries  (Karan Sikka <karanssikka@gmail.com>)
Ответы Re: TODO item: Implement Boyer-Moore searching in LIKE queries  (Karan Sikka <karanssikka@gmail.com>)
Список pgsql-hackers
Karan Sikka <karanssikka@gmail.com> writes:
> Just for the record, I'm leaning to the side of "not worth it". My
> thoughts are, if you are at a scale where you care about this 20%
> speedup, you would want to go all the way to an indexed structure,
> because the gains you would realize would exceed 20%, and 20% may not be
> a sufficient speedup anyway.

If I'm reading your test case correctly, 20% is actually a rather
impressive number, because it's 20% *overall* gain on queries that will
also involve TOAST fetch and decompress on the source data.  (Decompress
definitely, and I'm guessing those 5K strings don't compress well enough
to avoid getting pushed out-of-line; though it might be worth repeating
the test with chunks of 10K or 20K to be sure.)  So the percentage
improvement in the LIKE test proper must have been a lot more than that.

However, I'm dubious that LIKE patterns with long fixed substrings are a
common use-case, so I'm afraid that this might be quite a lot of work for
something that won't much benefit most users.  I'm also worried that the
setup costs might be enough to make it a net loss in many cases.

There are probably ways to amortize the setup costs, since typical
scenarios involve the same LIKE pattern across many rows, but implementing
that would add even more work.  (Having said that, I've had a bee in my
bonnet for a long time about removing per-row setup cost for repetitive
regex matches, and whatever infrastructure that needs would work for this
too.  And for strpos' B-M-H setup, looks like.  So this might be something
to look into with a suitably wide view of what the problem is.)

Not sure what advice to give you here.  I think this is in the grey zone
where it's hard to be sure whether it's worth putting work into.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Vladimir Sitnikov
Дата:
Сообщение: Re: Slowness of extended protocol
Следующее
От: Chapman Flack
Дата:
Сообщение: Re: AdvanceXLInsertBuffer vs. WAL segment compressibility