Re: PATCH: Allow empty targets in unaccent dictionary
От | David Fetter |
---|---|
Тема | Re: PATCH: Allow empty targets in unaccent dictionary |
Дата | |
Msg-id | 20140421042104.GI24095@fetter.org обсуждение исходный текст |
Ответ на | PATCH: Allow empty targets in unaccent dictionary (Mohammad Alhashash <alhashash@alhashash.net>) |
Список | pgsql-hackers |
Please add this to the next commitfest. https://commitfest.postgresql.org/action/commitfest_view?id=22 Cheers, David. On Sun, Apr 20, 2014 at 01:06:43AM +0200, Mohammad Alhashash wrote: > Hi, > > Currently, unaccent extension only allows replacing one source > character with one or more target characters. In Arabic, Hebrew and > possibly other languages, diacritics are standalone characters that > are being added to normal letters. To use unaccent dictionary for > these languages, we need to allow empty targets to remove diacritics > instead of replacing them. > > The attached patch modfies unaacent.c so that dictionary parser uses > zero-length target when the line has no target. > > Best Regards, > > Mohammad Alhashash > > diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c > old mode 100644 > new mode 100755 > index a337df6..4e72829 > --- a/contrib/unaccent/unaccent.c > +++ b/contrib/unaccent/unaccent.c > @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, char *replaceTo, int r > { > curnode->replacelen = replacelen; > curnode->replaceTo = palloc(replacelen); > - memcpy(curnode->replaceTo, replaceTo, replacelen); > + /* palloc(0) returns a valid address, not NULL */ > + if (replaceTo) /* memcpy() is undefined for NULL pointers*/ > + memcpy(curnode->replaceTo, replaceTo, replacelen); > } > } > else > @@ -105,10 +107,10 @@ initTrie(char *filename) > while ((line = tsearch_readline(&trst)) != NULL) > { > /* > - * The format of each line must be "src trg" where src and trg > + * The format of each line must be "src [trg]" where src and trg > * are sequences of one or more non-whitespace characters, > * separated by whitespace. Whitespace at start or end of > - * line is ignored. > + * line is ignored. If no trg added, a zero-length string is used. > */ > int state; > char *ptr; > @@ -160,6 +162,13 @@ initTrie(char *filename) > } > } > > + /* if no trg (loop stops at state 1 or 2), use zero-length target */ > + if (state == 1 || state == 2) > + { > + trglen = 0; > + state = 5; > + } > + > if (state >= 3) > rootTrie = placeChar(rootTrie, > (unsigned char *) src, srclen, > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Removing dependency to wsock32.lib when compiling code on WIndows