Re: [v9.2] make_greater_string() does not return a string in some cases
От | Tom Lane |
---|---|
Тема | Re: [v9.2] make_greater_string() does not return a string in some cases |
Дата | |
Msg-id | 29562.1319920566@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [v9.2] make_greater_string() does not return a string in some cases (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [v9.2] make_greater_string() does not return a string
in some cases
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Oct 29, 2011 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Ummm ... why do the incrementer functions think they need to restore the >> previous value on failure? AFAICS that's a waste of code and cycles, >> since there is only one caller and it doesn't care in the least. > Well, it might not be strictly necessary for pg_utf8_increment() and > pg_eucjp_increment(), but it's clearly necessary for the generic > incrementer function for exactly the same reason it was needed in the > old coding. I suppose we could weaken the rule to "you must leave a > valid character behind rather than a bunch of bytes that doesn't > encode to a character", but the cycle savings are negligible and the > current rule seems both simpler and more bullet-proof. No, it's *not* necessary any more, AFAICS. make_greater_string discards the data and overwrites it with a null instantly upon getting a failure return from the incrementer. The reason we used to need it was that we did pg_mbcliplen after failing to increment, but now we do that before we ever increment anything, so we already know the length of the last character. It doesn't matter whether those bytes are still valid or contain garbage. >> I'm also quite distressed that you ignored my advice to limit the number >> of combinations tried. �This patch could be horribly slow when dealing >> with wide characters, eg think what will happen when starting from >> U+10000. > Uh, I think it will try at most one character in that position and > then truncate away that character entirely, per my last email on this > topic (to which you never responded): > http://archives.postgresql.org/pgsql-hackers/2011-09/msg01195.php Oh! You are right, I was expecting it to try multiple characters at the same position before truncating the string. This change seems to have lobotomized things rather thoroughly. What is the rationale for that? As an example, when dealing with a single-character string, it will fail altogether if the next code value sorts out-of-order, so this seems to me to be a rather large step backwards. I think we ought to go back to the previous design of incrementing till failure and then truncating, which puts the onus on the incrementer to make a reasonable tradeoff of how many combinations to try per character position. There's a simple tweak we could make to the patch to limit that: once we've maxed out a lower-order byte of a multibyte char, *don't reset it to minimum*, just move on to incrementing the next higher byte. This preserves the old property that the maximum number of combinations tried is bounded by 256 * string's length in bytes. regards, tom lane
В списке pgsql-hackers по дате отправления: