Обсуждение: BUG #4697: to_tsvector hangs on input

Поиск
Список
Период
Сортировка

BUG #4697: to_tsvector hangs on input

От
"Peter Guarino"
Дата:
The following bug has been logged online:

Bug reference:      4697
Logged by:          Peter Guarino
Email address:      peterguarino@earthlink.net
PostgreSQL version: 8.3.3
Operating system:   Suse 10.x
Description:        to_tsvector hangs on input
Details:

Certain strings involving the @ character cause the to_tsvector function to
hang and the postgres server process handling the connection to spin the cpu
at 100%. Attempts to gracefully kill the server process are unsuccessful and
a 'kill -9' becomes necessary. Here is an example of a such a string:
4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4
@D4@D4@D4@D4@D4@D4@D2?C

Re: BUG #4697: to_tsvector hangs on input

От
Heikki Linnakangas
Дата:
Peter Guarino wrote:
> The following bug has been logged online:
>
> Bug reference:      4697
> Logged by:          Peter Guarino
> Email address:      peterguarino@earthlink.net
> PostgreSQL version: 8.3.3
> Operating system:   Suse 10.x
> Description:        to_tsvector hangs on input
> Details:
>
> Certain strings involving the @ character cause the to_tsvector function to
> hang and the postgres server process handling the connection to spin the cpu
> at 100%. Attempts to gracefully kill the server process are unsuccessful and
> a 'kill -9' becomes necessary. Here is an example of a such a string:
> 4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4@D4
> @D4@D4@D4@D4@D4@D4@D2?C

Hmm, it looks like it's not actually completely hung, but there's a
nasty O(n^2) recursion in the parser, making the parsing time
exponentially longer as the string gets longer. If you make the string
shorter, it will finish before you get tired of waiting, but will still
take a long time.

When the parser sees the "@", it goes into "email state". In email
state, it recurses, trying to find out if the string after the "@" is a
valid hostname. That in turn goes into email state, recurses again and
so forth, until you reach the end of the string. Then, the recursion
unwinds back to the first @, moving on to the next character. At the
next "@" the cycle repeats.

Since the recursion only wants to know if the string after "@" is a
valid hostname, we can stop the recursion as soon as we find out that
it's not. The attached patch does that.

Teodor, does this look OK to you?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index ad98fa5..74a8a61 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -620,6 +620,8 @@ p_ishost(TParser *prs)
     TParser    *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
     int            res = 0;

+    tmpprs->wanthost = true;
+
     if (TParserGet(tmpprs) && tmpprs->type == HOST)
     {
         prs->state->posbyte += tmpprs->lenbytetoken;
@@ -1070,6 +1072,7 @@ static const TParserStateActionItem actionTPS_InHost[] = {
 };

 static const TParserStateActionItem actionTPS_InEmail[] = {
+    {p_isstophost, 0, A_POP, TPS_Null, 0, NULL},
     {p_ishost, 0, A_BINGO | A_CLRALL, TPS_Base, EMAIL, NULL},
     {NULL, 0, A_POP, TPS_Null, 0, NULL}
 };

Re: BUG #4697: to_tsvector hangs on input

От
Teodor Sigaev
Дата:
> When the parser sees the "@", it goes into "email state". In email
> state, it recurses, trying to find out if the string after the "@" is a
> valid hostname. That in turn goes into email state, recurses again and
> so forth, until you reach the end of the string. Then, the recursion
> unwinds back to the first @, moving on to the next character. At the
> next "@" the cycle repeats.
True

> Since the recursion only wants to know if the string after "@" is a
> valid hostname, we can stop the recursion as soon as we find out that
> it's not. The attached patch does that.
Committed to HEAD, 8.3 and 8.2, thank you. Previous releases are not affected.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/