Re: PATCH: Allow empty targets in unaccent dictionary

Поиск
Список
Период
Сортировка
От David Fetter
Тема Re: PATCH: Allow empty targets in unaccent dictionary
Дата
Msg-id 20140421042104.GI24095@fetter.org
обсуждение исходный текст
Ответ на PATCH: Allow empty targets in unaccent dictionary  (Mohammad Alhashash <alhashash@alhashash.net>)
Список pgsql-hackers
Please add this to the next commitfest.

https://commitfest.postgresql.org/action/commitfest_view?id=22

Cheers,
David.
On Sun, Apr 20, 2014 at 01:06:43AM +0200, Mohammad Alhashash wrote:
> Hi,
> 
> Currently, unaccent extension only allows replacing one source
> character with one or more target characters. In Arabic, Hebrew and
> possibly other languages, diacritics are standalone characters that
> are being added to normal letters. To use unaccent dictionary for
> these languages, we need to allow empty targets to remove diacritics
> instead of replacing them.
> 
> The attached patch modfies unaacent.c so that dictionary parser uses
> zero-length target when the line has no target.
> 
> Best Regards,
> 
> Mohammad Alhashash
> 

> diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
> old mode 100644
> new mode 100755
> index a337df6..4e72829
> --- a/contrib/unaccent/unaccent.c
> +++ b/contrib/unaccent/unaccent.c
> @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, char *replaceTo, int r
>          {
>              curnode->replacelen = replacelen;
>              curnode->replaceTo = palloc(replacelen);
> -            memcpy(curnode->replaceTo, replaceTo, replacelen);
> +            /* palloc(0) returns a valid address, not NULL */
> +            if (replaceTo) /* memcpy() is undefined for NULL pointers*/
> +                memcpy(curnode->replaceTo, replaceTo, replacelen);
>          }
>      }
>      else
> @@ -105,10 +107,10 @@ initTrie(char *filename)
>              while ((line = tsearch_readline(&trst)) != NULL)
>              {
>                  /*
> -                 * The format of each line must be "src trg" where src and trg
> +                 * The format of each line must be "src [trg]" where src and trg
>                   * are sequences of one or more non-whitespace characters,
>                   * separated by whitespace.  Whitespace at start or end of
> -                 * line is ignored.
> +                 * line is ignored. If no trg added, a zero-length string is used.
>                   */
>                  int            state;
>                  char       *ptr;
> @@ -160,6 +162,13 @@ initTrie(char *filename)
>                      }
>                  }
>  
> +                /* if no trg (loop stops at state 1 or 2), use zero-length target */
> +                if (state == 1 || state == 2)
> +                {
> +                    trglen = 0;
> +                    state = 5;
> +                }
> +                
>                  if (state >= 3)
>                      rootTrie = placeChar(rootTrie,
>                                           (unsigned char *) src, srclen,

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Removing dependency to wsock32.lib when compiling code on WIndows
Следующее
От: Воронин Дмитрий
Дата:
Сообщение: New functions in sslinfo module