On Fri, Sep 23, 2011 at 8:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Anyway, I won't stand in the way of the patch as long as it's modified
>>> to limit the number of values considered for any one character position
>>> to something reasonably small.
>
>> I think that limit in both the old and new code is 1, except that the
>> new code does it more efficiently.
>
>> Am I confused?
>
> Yes, or else I am. Consider a 4-byte UTF8 character at the end of the
> string. The existing code increments the last byte up to 255 (rejecting
> everything past 0xBF), then gives up and truncates that character away.
> So the maximum number of tries for that character position is between 0
> and 127 depending on what the original character was (with at most 63 of
> the incremented values getting past the verifymbstr test).
>
> The proposed patch is going to iterate through all Unicode code points
> up to U+7FFFFF before giving up. Since it's possible that we need to
> increment something further left to succeed at all, this doesn't seem
> like a good plan.
I think you're misreading the code. It does this:
while (len > 0)
{ boring stuff; if (charincfunc(lastchar, charlen)) { more boring stuff; if (we made a greater
string) return it; cleanup; } truncate away last character;
}
I don't see how that's ever going to try more than one character in
the same position.
What may be confusing you is that the old code has two loops: an outer
loop that tests whether we've made a greater string, and an inner loop
that tests whether we've made a validly encoded string at all. In the
new code, at least in the UTF-8 case, the inner loop is GONE
altogether. Instead of iterating until we construct a valid
character, we just use our mad UTF-8 skillz to assemble one, and
return it.
Or else I need to go drink a few cups of tea and look at this again.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company