Correct Allocation of UNICODE string in C
| От | Steffen Macke |
|---|---|
| Тема | Correct Allocation of UNICODE string in C |
| Дата | |
| Msg-id | 3F29017A.1070400@web.de обсуждение исходный текст |
| Список | pgsql-general |
Hello All, I'm struggling with the correct allocation of a UNICODE text in a C function for PostgreSQL. The strings are sometimes truncated, sometimes garbage bytes are added at the end. Is there a code example, that takes a UNICODE (UTF-8) text of unknown length, allocates the PostgreSQL structure and copies the data correctly? You find the function in question below, the full sources are available from http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/dcmms/arabic/ The problem is that the arabic_reshape() function will return texts that are longer or shorter than the original text. In the PostgreSQL sources I just found examples, where texts are copied - no example how to allocate a "fresh" UTF-8 string. Best Regards, Steffen Macke text * shape_arabic(text *t) { glong items_read; glong items_written; long len; long i; text *new_t; text *utf8_t; len = g_utf8_strlen(VARDATA(t), -1); new_t = (text *) palloc(VARHDRSZ+(len*4)+4); VARATT_SIZEP(new_t) = VARHDRSZ+(len*4)+4; utf8_t = (text *) palloc(VARSIZE(t)+4); VARATT_SIZEP(utf8_t) = VARSIZE(t)+4; memset(VARDATA(new_t), 0, (len*4)+4); memset(VARDATA(utf8_t), 0, VARSIZE(utf8_t)-VARHDRSZ); len = len*2; arabic_reshape(&len, VARDATA(t), VARDATA(new_t), ar_unifont); g_ucs4_to_utf8(VARDATA(new_t), VARDATA(utf8_t), -1, &items_read, &items_written); len = g_utf8_strlen(VARDATA(utf8_t), -1); return utf8_t; }
В списке pgsql-general по дате отправления: