Re: UTF8MatchText

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: UTF8MatchText
Дата	20 мая 2007 г. 14:11:34
Msg-id	46505704.8040203@dunslane.net обсуждение исходный текст
Ответ на	Re: UTF8MatchText (Andrew Dunstan <andrew@dunslane.net>)
Ответы	Re: UTF8MatchText (Andrew Dunstan <andrew@dunslane.net>)
Список	pgsql-patches

Дерево обсуждения


I wrote:
>
>
>>
>> It is only when you have a pattern like '%_' when this is a problem
>> and we could detect this and do byte by byte when it's not. Now we
>> check (*p == '\\') || (*p == '_') in each iteration when we scan over
>> characters for '%', and we could do it once and have different loops
>> for the two cases.
>>
>> Other than this part that I think can be optimized I don't see
>> anything wrong with the idea behind the patch. To make the '%' case
>> fast might be an important optimization for a lot of use cases. It's
>> not uncommon that '%' matches a bigger part of the string than the
>> rest of the pattern.
>>
>
>
> Are you sure? The big remaining char-matching bottleneck will surely
> be in the code that scans for a place to start matching a %. But
> that's exactly where we can't use byte matching for cases where the
> charset might include AB and BA as characters - the pattern might
> contain %BA and the string AB. However, this isn't a danger for UTF8,
> which leads me to think that we do indeed need a special case for
> UTF8, but for a different improvement from that proposed in the
> original patch. I'll post an updated patch shortly.
>

Here is a patch that implements this. Please analyse for possible breakage.

cheers

andrew

В списке pgsql-patches по дате отправления:

Предыдущее

От: Heikki Linnakangas
Дата: 20 мая 2007 г., 13:50:36
Сообщение: Re: Seq scans status update

Следующее

От: Andrew Dunstan
Дата: 20 мая 2007 г., 14:22:26
Сообщение: Re: UTF8MatchText

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: UTF8MatchText

Предыдущее

Следующее