Обсуждение: Memory bug in dsnowball_lexize
Hackers, In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses malloc (not palloc) to allocate memory, and on memory exhaustion returns NULL rather than throwing an exception. In this same file, 'replace_s' calls 'create_s' and if it gets back NULL, returns the error code -1. Otherwise, it sets z->p to the allocated memory. In src/backend/snowball/libstemmer/api.c, 'SN_set_current' calls 'replace_s' and returns whatever 'replace_s' returned, which in the case of memory exhaustion will be -1. In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize' calls 'SN_set_current' and ignores the return value, thereby failing to notice the error, if any. I checked one of the stemmers, stem_ISO_8859_1_english.c, and it treats z->p as an array without checking whether it is NULL. This will crash the backend in the above error case. There is something else weird here, though. The call to 'SN_set_current' is wrapped in a memory context switch, along with a call to the stemmer, as if the caller expects any allocated memory to be palloc'd, which it is not, given the underlying code's use of malloc and calloc. There is a comment higher up in dict_snowball.c that seems to use some handwaving about all this, or perhaps it is documenting something else entirely. In any event, I find the documentation about dictCtx insufficient to explain why this memory handling is correct. mark
Mark Dilger <hornschnorter@gmail.com> writes:
> In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
> malloc (not palloc) to allocate memory, and on memory exhaustion
> returns NULL rather than throwing an exception.
Actually not, see macros in src/include/snowball/header.h.
> In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
> calls 'SN_set_current' and ignores the return value, thereby
> failing to notice the error, if any.
Hm. This seems like possibly a bug, in that even if we cover the
malloc issue, there's no API guarantee that OOM is the only possible
reason for reporting failure.
> There is a comment higher up in dict_snowball.c that seems to
> use some handwaving about all this, or perhaps it is documenting
> something else entirely. In any event, I find the documentation
> about dictCtx insufficient to explain why this memory handling
> is correct.
Fair complaint --- do you want to propose some new wording that
references what header.h does?
regards, tom lane
On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <hornschnorter@gmail.com> writes:
> > In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
> > malloc (not palloc) to allocate memory, and on memory exhaustion
> > returns NULL rather than throwing an exception.
>
> Actually not, see macros in src/include/snowball/header.h.
You are correct. Thanks for the pointer.
> > In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
> > calls 'SN_set_current' and ignores the return value, thereby
> > failing to notice the error, if any.
>
> Hm. This seems like possibly a bug, in that even if we cover the
> malloc issue, there's no API guarantee that OOM is the only possible
> reason for reporting failure.
Ok, that sounds fair. Since the memory is being palloc'd, I suppose
it would be safe to just ereport when the return value is -1?
> > There is a comment higher up in dict_snowball.c that seems to
> > use some handwaving about all this, or perhaps it is documenting
> > something else entirely. In any event, I find the documentation
> > about dictCtx insufficient to explain why this memory handling
> > is correct.
>
> Fair complaint --- do you want to propose some new wording that
> references what header.h does?
Perhaps something along these lines?
/*
- * snowball saves alloced memory between calls, so we should
run it in our
- * private memory context. Note, init function is executed in long lived
- * context, so we just remember CurrentMemoryContext
+ * snowball saves alloced memory between calls, which we force to be
+ * allocated using palloc and friends via preprocessing macros (see
+ * snowball/header.h), so we should run snowball in our private memory
+ * context. Note, init function is executed in long lived
context, so we
+ * just remember CurrentMemoryContext.
*/
Mark Dilger <hornschnorter@gmail.com> writes:
> On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Mark Dilger <hornschnorter@gmail.com> writes:
>>> In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
>>> calls 'SN_set_current' and ignores the return value, thereby
>>> failing to notice the error, if any.
>> Hm. This seems like possibly a bug, in that even if we cover the
>> malloc issue, there's no API guarantee that OOM is the only possible
>> reason for reporting failure.
> Ok, that sounds fair. Since the memory is being palloc'd, I suppose
> it would be safe to just ereport when the return value is -1?
Yeah ... I'd just make it an elog really, since whatever it is
would presumably not be a user-facing error.
>> Fair complaint --- do you want to propose some new wording that
>> references what header.h does?
> Perhaps something along these lines?
Seems reasonable, please include in patch covering the other thing.
regards, tom lane