Regex with > 32k different chars causes a backend crash

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Regex with > 32k different chars causes a backend crash
Дата
Msg-id 515C46A0.3090002@vmware.com
обсуждение исходный текст
Ответы Re: Regex with > 32k different chars causes a backend crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
While playing with Alexander's pg_trgm regexp patch, I noticed that the 
regexp library trips an assertion (if enabled) or crashes, when passed 
an input string that contains more than 32k different characters:

select 'foo' ~ (select string_agg(chr(x),'') from generate_series(100, 
35000) x) as nastyregex;

This is because it uses 'short' as the datatype to identify colors. When 
it overflows, -32768 is used as index to the colordesc array, and you 
get a crash. AFAICS this can't reliably be used for anything more 
sinister than crashing the backend.

A regex with that many different colors is an extreme case, so I think 
it's enough to turn the assertion in newcolor() into a run-time check, 
and throw a "too many colors in regexp" error. Alternatively, we could 
expand 'color' from short to int, but that would double the memory usage 
of sane regexps with less different characters.

Thoughts?

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Drastic performance loss in assert-enabled build in HEAD
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Regex with > 32k different chars causes a backend crash