Re: [v9.2] make_greater_string() does not return a string in some cases
От | horiguchi.kyotaro@oss.ntt.co.jp |
---|---|
Тема | Re: [v9.2] make_greater_string() does not return a string in some cases |
Дата | |
Msg-id | 20111030.021603.01379645.horiguchi.kyotaro@horiguti.oss.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: [v9.2] make_greater_string() does not return a string in some cases (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [v9.2] make_greater_string() does not return a string in some cases
|
Список | pgsql-hackers |
Hello, I feel at a loss what to do... > I thought that code was looking for 0xED/0xF4 in the second position, > but it's actually looking for them in the first position, which makes > vastly more sense. Whee! Anyway, I try to describe another aspect of this code a the present. The switch block in the g_utf8_increnet is a folded code of five individual manipulation according to the byte-length of the sequence. The separation presupposes the input bytes and length formes a valid utf-8 sequence. For a character more than 5 byte length, retunes false. For 4 bytes, the sequence ranges between U+10000 and U+1fffff. If charptr[3] is less than 0xbf, increment it and return true. Else assign 0x80 to charptr[3] and then if charptr[2] is less than 0xbf increment it and return true. Else assign 0x80 to charptr[2] and then, if (charptr[1] is less than 0x8f when charptr[0] == 0xf4) or (charptr[1]is less than 0xbf when charptr[0] != 0xf4) increment it and return true. Else assign 0x80 to charptr[1] and then if charptr[0] is not 0xf4 increment it and return true. Else the input sequence must be 0xf4 0x8f 0xbf 0xbf which represents U+10ffff and this is the upper limit of UTF-8 representation.Restore the sequnce and return false. for 3 bytes, the sequence ranges between u+800 and u+ffff. If charptr[2] is less than 0xbf increment it and reutrn true. Else assign 0x80 to charptr[2] and then, if (charptr[1] is less than 0x9f when charptr[0] == 0xed) or (charptr[1]is less than 0xbf when charptr[0] != 0xed) increment it and return true. The sequence 0xed 0x9f 0xbf represents U+d7ff will incremented to 0xef 0x80 0x80 (U+f000) at the end. Else assign 0x80 to charptr[1] and then if charptr[0] is not 0xef increment it and return true. Else the input sequence must be 0xef 0xbf 0xbf which represents U+ffff and the next UTF8 sequence has the length of 4. Restorethe sequnce and return false. For 2 bytes, the sequence ranges between U+80 and U+7ff. If charptr[1] is less than 0xbf increment it and reutrn true. Else assign 0x80 to charptr[1] and then if charptr[0] is not 0xdf increment it and return true. Else the input sequence must be 0xdf 0xbf which reporesents U+7ff and next UTF8 sequence has the length of 3. Restore thesequence and return false. For 1 byte, the byte ranges between U+0 and U+7f. If charptr[0] is less than 0x7f increment it and return true. Else the input sequence must be 0x7f which represents U+7f and next UTF8 sequence has the length of 2. Restore the sequenceand return false. -- Kyotaro Horiguchi
В списке pgsql-hackers по дате отправления: