Re: [v9.2] make_greater_string() does not return a string in some cases

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [v9.2] make_greater_string() does not return a string in some cases
Дата
Msg-id 29562.1319920566@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [v9.2] make_greater_string() does not return a string in some cases  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [v9.2] make_greater_string() does not return a string in some cases
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Oct 29, 2011 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ummm ... why do the incrementer functions think they need to restore the
>> previous value on failure?  AFAICS that's a waste of code and cycles,
>> since there is only one caller and it doesn't care in the least.

> Well, it might not be strictly necessary for pg_utf8_increment() and
> pg_eucjp_increment(), but it's clearly necessary for the generic
> incrementer function for exactly the same reason it was needed in the
> old coding.  I suppose we could weaken the rule to "you must leave a
> valid character behind rather than a bunch of bytes that doesn't
> encode to a character", but the cycle savings are negligible and the
> current rule seems both simpler and more bullet-proof.

No, it's *not* necessary any more, AFAICS.  make_greater_string discards
the data and overwrites it with a null instantly upon getting a failure
return from the incrementer.  The reason we used to need it was that we
did pg_mbcliplen after failing to increment, but now we do that before
we ever increment anything, so we already know the length of the last
character.  It doesn't matter whether those bytes are still valid or
contain garbage.

>> I'm also quite distressed that you ignored my advice to limit the number
>> of combinations tried. �This patch could be horribly slow when dealing
>> with wide characters, eg think what will happen when starting from
>> U+10000.

> Uh, I think it will try at most one character in that position and
> then truncate away that character entirely, per my last email on this
> topic (to which you never responded):
> http://archives.postgresql.org/pgsql-hackers/2011-09/msg01195.php

Oh!  You are right, I was expecting it to try multiple characters at the
same position before truncating the string.  This change seems to have
lobotomized things rather thoroughly.  What is the rationale for that?
As an example, when dealing with a single-character string, it will fail
altogether if the next code value sorts out-of-order, so this seems to
me to be a rather large step backwards.

I think we ought to go back to the previous design of incrementing till
failure and then truncating, which puts the onus on the incrementer to
make a reasonable tradeoff of how many combinations to try per character
position.  There's a simple tweak we could make to the patch to limit
that: once we've maxed out a lower-order byte of a multibyte char,
*don't reset it to minimum*, just move on to incrementing the next
higher byte.  This preserves the old property that the maximum number of
combinations tried is bounded by 256 * string's length in bytes.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Mr. Aaron W. Swenson"
Дата:
Сообщение: Re: Add socket dir to pg_config..?
Следующее
От: Hitoshi Harada
Дата:
Сообщение: Re: pgsql_fdw, FDW for PostgreSQL server