Re: Stack overflow issue

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: Stack overflow issue
Дата
Msg-id CAMbWs49H7=jV2oHdx_uzGyGUL_Lg4tS799KaLcHCUWa1VwggXw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Stack overflow issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

On Wed, Aug 31, 2022 at 6:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> The upstream recommendation, which seems pretty sane to me, is to
> simply reject any string exceeding some threshold length as not
> possibly being a word.  Apparently it's common to use thresholds
> as small as 64 bytes, but in the attached I used 1000 bytes.

On further thought: that coding treats anything longer than 1000
bytes as a stopword, but maybe we should just accept it unmodified.
The manual says "A Snowball dictionary recognizes everything, whether
or not it is able to simplify the word".  While "recognizes" formally
includes the case of "recognizes as a stopword", people might find
this behavior surprising.  We could alternatively do it as attached,
which accepts overlength words but does nothing to them except
case-fold.  This is closer to the pre-patch behavior, but gives up
the opportunity to avoid useless downstream processing of long words.
 
This patch looks good to me. It avoids overly-long words (> 1000 bytes)
going through the stemmer so the stack overflow issue in Turkish stemmer
should not exist any more.

Thanks
Richard

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: New strategies for freezing, advancing relfrozenxid early
Следующее
От: John Naylor
Дата:
Сообщение: Re: [PATCH] Optimize json_lex_string by batching character copying