Обсуждение: pg_do_encoding_conversion glitch
I have a question about the result contract of pg_do_encoding_conversion().
It can receive non null-terminated string because its arguments are
a char array and a byte length.
And it only returns a string, so the string should be null-terminated.
However, if conversions are not required, the function returns
the input string itself even though it might be not null-terminated.
I checked usages of pg_do_encoding_conversion() and xml_parse()
could cause troubles. Is it a bug? needed to be fixed?
---- [utils/mb/mbutils.c]
unsigned char *
pg_do_encoding_conversion(unsigned char *src, int len, int src_encoding, int dest_encoding)
{ ... if (src_encoding == dest_encoding) return src;
----
---- [utils/adt/xml.c]
static xmlDocPtr
xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, xmlChar * encoding)
{ ... len = VARSIZE(data) - VARHDRSZ; /* will be useful later */ string = xml_text2xmlChar(data);
utf8string = pg_do_encoding_conversion(string, len,
encoding ? xmlChar_to_encoding(encoding) : [It could
beUTF8 to UTF8] --> GetDatabaseEncoding(), PG_UTF8);
----
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> I have a question about the result contract of pg_do_encoding_conversion().
It's poorly designed :-(
As near as I can tell, all uses of the function either pass a
null-terminated string or special-case the result == src case in such a
way that it doesn't matter. However it's certainly not obvious that you
have to do that.
The calls in contrib/sslinfo might be broken --- not sure how much
that module has been tested.
Do you have a proposal for a different API, or do you just want to
improve the comment for the function? Bear in mind that a lot of the
callers do know the string length, and so we shouldn't impose an
unnecessary strlen() operation on those cases.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Do you have a proposal for a different API, or do you just want to > improve the comment for the function? Bear in mind that a lot of the > callers do know the string length, and so we shouldn't impose an > unnecessary strlen() operation on those cases. We already have the following comment, so I think a new comment is not needed. | In the case of no conversion, src is returned. Since Assert() is not available in the case, developers should use the function carefully after all. My patch might be fixed, too... | "Solve a problem of LC_TIME of windows" | http://archives.postgresql.org/message-id/20081104094301.7EE8.52131E4D@oss.ntt.co.jp Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Do you have a proposal for a different API, or do you just want to
>> improve the comment for the function? Bear in mind that a lot of the
>> callers do know the string length, and so we shouldn't impose an
>> unnecessary strlen() operation on those cases.
> We already have the following comment, so I think a new comment is
> not needed.
> | In the case of no conversion, src is returned.
I added a few more lines just to help people out --- in digging around
earlier today, there seemed to be a few places that hadn't been very
careful about this.
regards, tom lane