Re: Patch: add conversion from pg_wchar to multibyte

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Patch: add conversion from pg_wchar to multibyte
Дата
Msg-id CAPpHfdtFUmViWynMqO5Et4OXmxX-HREhOGLqDoedezXqh6EJMA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Patch: add conversion from pg_wchar to multibyte  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, May 2, 2012 at 5:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
 > Imagine we've two queries:
> 1) SELECT * FROM tbl WHERE col LIKE '%abcd%';
> 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%';
>
> The first query require reading posting lists of trigrams "abc" and "bcd".
> The second query require reading posting lists of trigrams "abc", "bcd",
> "cde", "def", "efg", "fgh", "ghi", "hij" and "ijk".
> We could decide to use index scan for first query and sequential scan for
> second query because number of posting list to read is high. But it is
> unreasonable because actually second query is narrower than the first one.
> We can use same index scan for it, recheck will remove all false positives.
> When number of trigrams is high we can just exclude some of them from index
> scan. It would be better than just decide to do sequential scan. But the
> question is what trigrams to exclude? Ideally we would leave most rare
> trigrams to make index scan cheaper.

True.  I guess I was thinking more of the case where you've got
abc|def|ghi|jkl|mno|pqr|stu|vwx|yza|....  There's probably some point
at which it becomes silly to think about using the index.

Yes, such situations are also possible.

>> Well, I'm not an expert on encodings, but it seems like a logical
>> extension of what we're doing right now, so I don't really see why
>> not.  I'm confused by the diff hunks in pg_mule2wchar_with_len,
>> though.  Presumably either the old code is right (in which case, don't
>> change it) or the new code is right (in which case, there's a bug fix
>> needed here that ought to be discussed and committed separately from
>> the rest of the patch).  Maybe I am missing something.
>
> Unfortunately I didn't understand original logic of pg_mule2wchar_with_len.
> I just did proposal about how it could be. I hope somebody more familiar
> with this code would clarify this situation.

Well, do you think the current code is buggy, or not?

Probably, but I'm not sure. The conversion seems lossy to me unless I'm missing something about mule encoding.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Patch: add conversion from pg_wchar to multibyte
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: proposal: additional error fields