Обсуждение: make_greater_string() does not return a string in some cases
Hi ! make_greater_string() does not return a string when some UTF8 strings set to str_const. # Especially UTF8 strings which contains 'BF' in last byte. Because make_greater_string() only try incrementing the last byte of the string, and not try same test for upper bytes. Therefore, some queries which contains "LIKE '<contains 'BF' in last byte>%'" can not perform (Btree's) index-scan. # Or may be nearly full-index-scan. # See follwing example. =============================================================================== '西' (Japanese Letter) : 0xE8A5BF [client : UTF8 ⇔ server : EUC_JP] =# EXPLAIN ANALYZE SELECT * FROM test2 WHERE name LIKE '西%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Index Scan using test2_name on test2 (cost=0.00..8.28 rows=1 width=3) (actual time=0.077..0.078 rows=1 loops=1) Index Cond: ((name >= '西'::text) AND (name < '誠'::text)) <-- Index-scan is chosen Filter: (name ~~ '西%'::text) Total runtime: 0.110 ms (4 rows) [client : UTF8 ⇔ server : UTF8] =# EXPLAIN ANALYZE SELECT * FROM test2 WHERE name LIKE '西%'; QUERY PLAN ---------------------------------------------------------------------------------------------------- Seq Scan on test2 (cost=0.00..1693.01 rows=1 width=4) (actual time=22.598..22.599 rows=1 loops=1) Filter: (name ~~ '西%'::text) <-- Seq-scan is chosen ! Total runtime: 22.626 ms (3 rows) =============================================================================== Attached patch solve above problem. Best regards, -- NTT OSS Center Tatsuhito Kasahara diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index fc3c5b0..fdf58cf 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** make_greater_string(const Const *str_con *** 5542,5552 **** *lastchar = savelastchar; /* ! * Truncate off the last character, which might be more than 1 byte, ! * depending on the character encoding. */ if (datatype != BYTEAOID && pg_database_encoding_max_length() > 1) ! len = pg_mbcliplen(workstr, len, len - 1); else len -= 1; --- 5542,5567 ---- *lastchar = savelastchar; /* ! * Increment the previous character, or truncate off the last character, ! * which might be more than 1 byte, depending on the character encoding. */ if (datatype != BYTEAOID && pg_database_encoding_max_length() > 1) ! { ! int i; ! int cliplen = pg_mbcliplen(workstr, len, len - 1); ! ! for (i = len - 1; i > cliplen; i--) ! { ! if ((unsigned char) workstr[i] < (unsigned char) 255) ! { ! workstr[i]++; ! memset(workstr + i + 1, 1 /* or 0? */, len - i); ! break; ! } ! } ! if (i <= cliplen) ! len = cliplen; ! } else len -= 1;
Tatsuhito Kasahara <kasahara.tatsuhito@oss.ntt.co.jp> writes: > make_greater_string() does not return a string when some UTF8 strings > set to str_const. > # Especially UTF8 strings which contains 'BF' in last byte. The patch you propose for this is really untenable: it will re-introduce many corner cases that we got rid of years ago, for example cases wherein pg_verifymbstr and pg_mbcliplen index off the end of the string because they think the last character occupies more bytes than are there. It's intentional that the existing code doesn't mess with the first byte of a multibyte character (which is the one that determines the character length, in all encodings of interest). Another problem is that if the last character is several bytes long, this coding would cause us to iterate through potentially many millions of character values before giving up and truncating off the last character. In a large number of cases that's just wasted time because there is no chance of getting a larger string without incrementing some character further to the left. So there's a tradeoff that limits how many values we should consider for each character position --- choosing to consider at most 255 is a bit arbitrary, but "all of them" isn't going to work. I don't think that the set of cases that could be improved this way is large enough to justify trying to find solutions to these problems. regards, tom lane
Tom Lane wrote: > The patch you propose for this is really untenable: it will re-introduce > many corner cases that we got rid of years ago, for example cases > wherein pg_verifymbstr and pg_mbcliplen index off the end of the string > because they think the last character occupies more bytes than are > there. > Another problem is that if the last character is several bytes long, > this coding would cause us to iterate through potentially many millions > of character values before giving up and truncating off the last > character. Hmm... OK, I see your points. I have another idea. 1. We prepare new operators ( <,<=,>,=>,= ) for text and bytea. 2. In make_greater_string(), if multi-byte-string was set and using locale-C and could not find greater string, returns bytea which has greater byte-code of last-character. User will get the following result. ======================================================================================================= -- $B@>(B : 0xe8a5bf =# EXPLAIN ANALYZE SELECT * FROM test WHERE name LIKE '$B@>(B%'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Index Scan using test_name on test (cost=0.00..8.28 rows=1 width=4) (actual time=0.022..0.024 rows=1 loops=1) Index Cond: ((name >= '$B@>(B'::text) AND (name < '\\xe8a5c0'::bytea)) Filter: (name ~~ '$B@>(B%'::text) Total runtime: 0.053 ms (4 rows) ======================================================================================================= Is the idea reasonable ? Best regards, -- NTT OSS Center Tatsuhito Kasahara
Tatsuhito Kasahara <kasahara.tatsuhito@oss.ntt.co.jp> writes: > I have another idea. > 1. We prepare new operators ( <,<=,>,=>,= ) for text and bytea. > 2. In make_greater_string(), if > multi-byte-string was set and > using locale-C and > could not find greater string, > returns bytea which has greater byte-code of last-character. > Is the idea reasonable ? Maybe, but it only works for text_pattern_ops indexes not normal ones. Not sure if people will be happy with maintaining a special index just to cover this corner case. I'm not convinced that there's enough of a problem here to be worth sweating over. If we're not able to generate a "greater" string with the current rules, the odds are that the pattern is so close to the end of the index range that a one-sided test is not going to make much difference compared to a two-sided one. regards, tom lane
Hello, Could you let me go on with this topic? It is hard to ignore this glitch for us using CJK - Chinese, Japanese, and Korean - characters on databse.. Maybe.. Saying on Japanese under the standard usage, about a hundred characters out of seven thousand make make_greater_string() fail. This is not so frequent to happen but also not as rare as ignorable. I think this glitch is caused because the method to derive the `next character' is fundamentally a secret of each encoding but now it is done in make_greater_string() using the method extended from that of 1 byte ASCII charset for all encodings together. So, I think it is reasonable that encoding info table (struct pg_wchar_tbl) holds the function to do that. How about this idea? Points to realize this follows, - pg_wchar_tbl@pg_wchar.c has new element `charinc' that holds a function to increment a character of this encoding. - Basically, the value of charinc is a `generic' increment function that does what make_greater_string() does in currentimplement. - make_greater_string() now uses charinc for database encoding to increment characters instead of the code directly writtenin it. - Give UTF-8 a special increment function. As a consequence of this modification, make_greater_string() looks somewhat simple thanks to disappearing of the sequence that handles bare bytes in string. And doing `increment character' with the knowledge of the encoding can be straightforward and light and backtrack-free, and have fewer glitches than the generic method. # But the process for BYTEAOID remains there dissapointingly. There still remains some glitches but I think it is overdo to do conversion that changes the length of the character. Only 5 points out of 17 thousands (in current method, roughly for all BMP characters) remains, and none of them are not Japanese character :-) The attached patch is sample implement of this idea. What do you think about this patch? -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 10b73fb..4151ce2 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -5502,6 +5502,16 @@ pattern_selectivity(Const *patt, Pattern_Type ptype)/* + * This function is "character increment" function for bytea used in + * make_greater_string() that has same interface with pg_wchar_tbl.charinc. + */ +static bool byte_increment(unsigned char *ptr, int len) +{ + (*ptr)--; + return true; +} + +/* * Try to generate a string greater than the given string or any * string it is a prefix of. If successful, return apalloc'd string * in the form of a Const node; else return NULL. @@ -5540,6 +5550,7 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation) int len; Datum cmpstr; text *cmptxt = NULL; + character_incrementer charincfunc; /* * Get a modifiable copy of the prefix string in C-string format, and set @@ -5601,27 +5612,38 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation) } } + if (datatype != BYTEAOID) + charincfunc = pg_database_encoding_character_incrementer(); + else + charincfunc = &byte_increment; + while (len > 0) { - unsigned char *lastchar = (unsigned char *) (workstr + len - 1); - unsigned char savelastchar = *lastchar; + int charlen; + unsigned char *lastchar; + unsigned char savelastbyte; + Const *workstr_const; + + if (datatype == BYTEAOID) + charlen = 1; + else + charlen = len - pg_mbcliplen(workstr, len, len - 1); + + lastchar = (unsigned char *) (workstr + len - charlen); /* - * Try to generate a larger string by incrementing the last byte. + * savelastbyte has meaning only for datatype == BYTEAOID */ - while (*lastchar < (unsigned char) 255) - { - Const *workstr_const; + savelastbyte = *lastchar; - (*lastchar)++; + /* + * Try to generate a larger string by incrementing the last byte or + * character. + */ + if (charincfunc(lastchar, charlen)) { if (datatype != BYTEAOID) - { - /* do not generate invalid encoding sequences */ - if (!pg_verifymbstr(workstr, len, true)) - continue; workstr_const = string_to_const(workstr, datatype); - } else workstr_const = string_to_bytea_const(workstr, len); @@ -5636,26 +5658,17 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation) pfree(workstr); return workstr_const; } - + /* No good, release unusable value and try again */ pfree(DatumGetPointer(workstr_const->constvalue)); pfree(workstr_const); } - /* restore last byte so we don't confuse pg_mbcliplen */ - *lastchar = savelastchar; - /* - * Truncate off the last character, which might be more than 1 byte, - * depending on the character encoding. + * Truncate off the last character or restore last byte for BYTEA. */ - if (datatype != BYTEAOID && pg_database_encoding_max_length() > 1) - len = pg_mbcliplen(workstr, len, len - 1); - else - len -= 1; - - if (datatype != BYTEAOID) - workstr[len] = '\0'; + len -= charlen; + workstr[len] = (datatype != BYTEAOID ? '\0' : savelastbyte); } /* Failed... */ diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c index 5b0cf62..1d6aee0 100644 --- a/src/backend/utils/mb/wchar.c +++ b/src/backend/utils/mb/wchar.c @@ -935,6 +935,85 @@ pg_gb18030_dsplen(const unsigned char *s)/* *------------------------------------------------------------------- + * multibyte character incrementer + * + * These functions accept "charptr", a pointer to the first byte of a + * maybe-multibyte character. Try `increment' the character and return true if + * successed. If these functions returns false, the character is not modified. + * ------------------------------------------------------------------- + */ + +static bool pg_generic_charinc(unsigned char *charptr, int len) +{ + unsigned char *lastchar = (unsigned char *) (charptr + len - 1); + unsigned char savelastchar = *lastchar; + const char *const_charptr = (const char *)charptr; + + while (*lastchar < (unsigned char) 255) + { + (*lastchar)++; + if (!pg_verifymbstr(const_charptr, len, true)) + continue; + return true; + } + + *lastchar = savelastchar; + return false; +} + +static bool pg_utf8_increment(unsigned char *charptr, int length) +{ + unsigned char a; + unsigned char bak[4]; + + memcpy(bak, charptr, length); + switch (length) + { + default: + /* reject lengths 5 and 6 for now */ + return false; + case 4: + a = charptr[3]; + if (a < 0xBF) + { + charptr[3]++; + break; + } + charptr[3] = 0x80; + /* FALL THRU */ + case 3: + a = charptr[2]; + if (a < 0xBF) + { + charptr[2]++; + break; + } + charptr[2] = 0x80; + /* FALL THRU */ + case 2: + a = charptr[1]; + if ((*charptr == 0xed && a < 0x9F) || a < 0xBF) + { + charptr[1]++; + break; + } + charptr[1] = 0x80; + /* FALL THRU */ + case 1: + a = *charptr; + if (a == 0x7F || a == 0xDF || a == 0xEF || a == 0xF7) { + memcpy(charptr, bak, length); + return false; + } + charptr[0]++; + break; + } + + return pg_utf8_islegal(charptr, length); +} + +/* + *------------------------------------------------------------------- * multibyte sequence validators * * These functionsaccept "s", a pointer to the first byte of a string, @@ -1341,48 +1420,48 @@ pg_utf8_islegal(const unsigned char *source, int length) *-------------------------------------------------------------------*/pg_wchar_tbl pg_wchar_table[] = { - {pg_ascii2wchar_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1}, /* PG_SQL_ASCII */ - {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3}, /* PG_EUC_JP */ - {pg_euccn2wchar_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2}, /* PG_EUC_CN */ - {pg_euckr2wchar_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3}, /* PG_EUC_KR */ - {pg_euctw2wchar_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4}, /* PG_EUC_TW */ - {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3}, /* PG_EUC_JIS_2004 */ - {pg_utf2wchar_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4}, /* PG_UTF8 */ - {pg_mule2wchar_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4}, /* PG_MULE_INTERNAL */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN1 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN2 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN3 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN4 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN5 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN6 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN7 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN8 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN9 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN10 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1256 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1258 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN866 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN874 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8R */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1251 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1252 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-5 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-6 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-7 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-8 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1250 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1253 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1254 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1255 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1257 */ - {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8U */ - {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */ - {0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */ - {0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2}, /* PG_GBK */ - {0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2}, /* PG_UHC */ - {0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4}, /* PG_GB18030 */ - {0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3}, /* PG_JOHAB */ - {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2} /* PG_SHIFT_JIS_2004 */ + {pg_ascii2wchar_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_generic_charinc, pg_ascii_verifier, 1}, /* PG_SQL_ASCII*/ + {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_generic_charinc, pg_eucjp_verifier, 3}, /* PG_EUC_JP*/ + {pg_euccn2wchar_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_generic_charinc, pg_euccn_verifier, 2}, /* PG_EUC_CN*/ + {pg_euckr2wchar_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_generic_charinc, pg_euckr_verifier, 3}, /* PG_EUC_KR*/ + {pg_euctw2wchar_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_generic_charinc, pg_euctw_verifier, 4}, /* PG_EUC_TW*/ + {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_generic_charinc, pg_eucjp_verifier, 3}, /* PG_EUC_JIS_2004*/ + {pg_utf2wchar_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_increment, pg_utf8_verifier, 4}, /* PG_UTF8 */ + {pg_mule2wchar_with_len, pg_mule_mblen, pg_mule_dsplen, pg_generic_charinc, pg_mule_verifier, 4}, /* PG_MULE_INTERNAL*/ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN1 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN2 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN3 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN4 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN5 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN6 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN7 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN8 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN9 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_LATIN10 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1256 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1258 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN866 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN874 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_KOI8R */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1251 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1252 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*ISO-8859-5 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*ISO-8859-6 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*ISO-8859-7 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*ISO-8859-8 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1250 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1253 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1254 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1255 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_WIN1257 */ + {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1}, /*PG_KOI8U */ + {0, pg_sjis_mblen, pg_sjis_dsplen, pg_generic_charinc, pg_sjis_verifier, 2}, /* PG_SJIS */ + {0, pg_big5_mblen, pg_big5_dsplen, pg_generic_charinc, pg_big5_verifier, 2}, /* PG_BIG5 */ + {0, pg_gbk_mblen, pg_gbk_dsplen, pg_generic_charinc, pg_gbk_verifier, 2}, /* PG_GBK */ + {0, pg_uhc_mblen, pg_uhc_dsplen, pg_generic_charinc, pg_uhc_verifier, 2}, /* PG_UHC */ + {0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_generic_charinc, pg_gb18030_verifier, 4}, /* PG_GB18030 */ + {0, pg_johab_mblen, pg_johab_dsplen, pg_generic_charinc, pg_johab_verifier, 3}, /* PG_JOHAB */ + {0, pg_sjis_mblen, pg_sjis_dsplen, pg_generic_charinc, pg_sjis_verifier, 2} /* PG_SHIFT_JIS_2004 */};/* returnsthe byte length of a word for mule internal code */ @@ -1459,6 +1538,15 @@ pg_database_encoding_max_length(void)}/* + * fetch maximum length of the encoding for the current database + */ +character_incrementer +pg_database_encoding_character_incrementer(void) +{ + return pg_wchar_table[GetDatabaseEncoding()].charinc; +} + +/* * Verify mbstr to make sure that it is validly encoded in the current * database encoding. Otherwise same as pg_verify_mbstr().*/ diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h index 826c7af..356703a 100644 --- a/src/include/mb/pg_wchar.h +++ b/src/include/mb/pg_wchar.h @@ -284,6 +284,8 @@ typedef int (*mblen_converter) (const unsigned char *mbstr);typedef int (*mbdisplaylen_converter) (constunsigned char *mbstr); +typedef bool (*character_incrementer) (unsigned char *mbstr, int len); +typedef int (*mbverifier) (const unsigned char *mbstr, int len);typedef struct @@ -292,6 +294,7 @@ typedef struct * string to a wchar */ mblen_convertermblen; /* get byte length of a char */ mbdisplaylen_converter dsplen; /* get display widthof a char */ + character_incrementer charinc; /* Character code incrementer if not null */ mbverifier mbverify; /* verifymultibyte sequence */ int maxmblen; /* max bytes for a char in this encoding */} pg_wchar_tbl; @@ -389,6 +392,7 @@ extern int pg_encoding_mbcliplen(int encoding, const char *mbstr,extern int pg_mbcharcliplen(constchar *mbstr, int len, int imit);extern int pg_encoding_max_length(int encoding);extern int pg_database_encoding_max_length(void); +extern character_incrementer pg_database_encoding_character_incrementer(void);extern int PrepareClientEncoding(int encoding);externint SetClientEncoding(int encoding);
On Fri, Jul 8, 2011 at 5:21 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> wrote: > Points to realize this follows, Please add your patch to the next CommitFest. https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] make_greater_string() does not return a string in some cases
От
Kyotaro HORIGUCHI
Дата:
Thanks for your suggestion, I'll do so. At Fri, 8 Jul 2011 23:28:32 -0400, Robert Haas <robertmhaas@gmail.com> wrote: > Please add your patch to the next CommitFest. > > https://commitfest.postgresql.org/action/commitfest_view/open -- Kyotaro Horiguchi NTT Open Source Software Center