Обсуждение: Memory bug in dsnowball_lexize

Поиск
Список
Период
Сортировка

Memory bug in dsnowball_lexize

От
Mark Dilger
Дата:
Hackers,

In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
malloc (not palloc) to allocate memory, and on memory exhaustion
returns NULL rather than throwing an exception.  In this same
file, 'replace_s' calls 'create_s' and if it gets back NULL, returns
the error code -1.  Otherwise, it sets z->p to the allocated
memory.

In src/backend/snowball/libstemmer/api.c, 'SN_set_current' calls
'replace_s' and returns whatever 'replace_s' returned, which in
the case of memory exhaustion will be -1.

In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
calls 'SN_set_current' and ignores the return value, thereby
failing to notice the error, if any.

I checked one of the stemmers, stem_ISO_8859_1_english.c,
and it treats z->p as an array without checking whether it is
NULL.  This will crash the backend in the above error case.

There is something else weird here, though.  The call to
'SN_set_current' is wrapped in a memory context switch, along
with a call to the stemmer, as if the caller expects any allocated
memory to be palloc'd, which it is not, given the underlying code's
use of malloc and calloc.

There is a comment higher up in dict_snowball.c that seems to
use some handwaving about all this, or perhaps it is documenting
something else entirely.  In any event, I find the documentation
about dictCtx insufficient to explain why this memory handling
is correct.

mark



Re: Memory bug in dsnowball_lexize

От
Tom Lane
Дата:
Mark Dilger <hornschnorter@gmail.com> writes:
> In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
> malloc (not palloc) to allocate memory, and on memory exhaustion
> returns NULL rather than throwing an exception.

Actually not, see macros in src/include/snowball/header.h.

> In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
> calls 'SN_set_current' and ignores the return value, thereby
> failing to notice the error, if any.

Hm.  This seems like possibly a bug, in that even if we cover the
malloc issue, there's no API guarantee that OOM is the only possible
reason for reporting failure.

> There is a comment higher up in dict_snowball.c that seems to
> use some handwaving about all this, or perhaps it is documenting
> something else entirely.  In any event, I find the documentation
> about dictCtx insufficient to explain why this memory handling
> is correct.

Fair complaint --- do you want to propose some new wording that
references what header.h does?

            regards, tom lane



Re: Memory bug in dsnowball_lexize

От
Mark Dilger
Дата:
On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <hornschnorter@gmail.com> writes:
> > In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
> > malloc (not palloc) to allocate memory, and on memory exhaustion
> > returns NULL rather than throwing an exception.
>
> Actually not, see macros in src/include/snowball/header.h.

You are correct.  Thanks for the pointer.

> > In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
> > calls 'SN_set_current' and ignores the return value, thereby
> > failing to notice the error, if any.
>
> Hm.  This seems like possibly a bug, in that even if we cover the
> malloc issue, there's no API guarantee that OOM is the only possible
> reason for reporting failure.

Ok, that sounds fair.  Since the memory is being palloc'd, I suppose
it would be safe to just ereport when the return value is -1?

> > There is a comment higher up in dict_snowball.c that seems to
> > use some handwaving about all this, or perhaps it is documenting
> > something else entirely.  In any event, I find the documentation
> > about dictCtx insufficient to explain why this memory handling
> > is correct.
>
> Fair complaint --- do you want to propose some new wording that
> references what header.h does?

Perhaps something along these lines?

        /*
-        * snowball saves alloced memory between calls, so we should
run it in our
-        * private memory context. Note, init function is executed in long lived
-        * context, so we just remember CurrentMemoryContext
+        * snowball saves alloced memory between calls, which we force to be
+        * allocated using palloc and friends via preprocessing macros (see
+        * snowball/header.h), so we should run snowball in our private memory
+        * context.  Note, init function is executed in long lived
context, so we
+        * just remember CurrentMemoryContext.
         */



Re: Memory bug in dsnowball_lexize

От
Tom Lane
Дата:
Mark Dilger <hornschnorter@gmail.com> writes:
> On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Mark Dilger <hornschnorter@gmail.com> writes:
>>> In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
>>> calls 'SN_set_current' and ignores the return value, thereby
>>> failing to notice the error, if any.

>> Hm.  This seems like possibly a bug, in that even if we cover the
>> malloc issue, there's no API guarantee that OOM is the only possible
>> reason for reporting failure.

> Ok, that sounds fair.  Since the memory is being palloc'd, I suppose
> it would be safe to just ereport when the return value is -1?

Yeah ... I'd just make it an elog really, since whatever it is
would presumably not be a user-facing error.

>> Fair complaint --- do you want to propose some new wording that
>> references what header.h does?

> Perhaps something along these lines?

Seems reasonable, please include in patch covering the other thing.

            regards, tom lane