On Tue, Jul 12, 2016 at 12:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> david@gravitext.com writes:
>> I am doing some (fuzz) testing of full text queries and managed to
>> generate the following case which causes a SEGFAULT on PostgreSQL
>> 9.6
>> beta1 and beta2:
>> select to_tsquery('!(a & !b) & c') as tsquery
>> This weird query outputs the following on 9.5.2, instead of
>> crashing:
>> "!( !'b' ) & 'c'"
>
> Note that while crashing is certainly not good, the pre-9.6 behavior
> can hardly be called correct either. What happened to 'a'?
'a' is a stopword, dropped by to_tsquery() as described here:
https://www.postgresql.org/docs/9.6/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
> The difference is that while basic tsquery input takes the tokens at
> face value, to_tsquery normalizes each token into a lexeme using the
> specified or default configuration, and discards any tokens that are
> stop words according to the configuration.
...and I believe I want this behavior. Otherwise queries with stopword
in '&' condition will not match anything. In truth I have no reason to
want to support this kind of weird double negative, on any version, and
will also look at filtering it out in my code before calling
to_tsquery().
It might be worth noting that these other slightly different cases are
fine on 9.6:
select to_tsquery('!(apple & !b) & c'); ---> !( 'appl' & !'b' ) & 'c'
select to_tsquery('!(apple & !a) & c'); ---> !'appl' & 'c'\
Clearly a pretty obscure case, but a crash nonetheless.
> Also, it looks like this is specific to to_tsquery; if you just feed
> the same thing to tsqueryin, it seems fine with it:
>
> # select '!(a & !b) & c'::tsquery;
> tsquery
> -----------------------
> !( 'a' & !'b' ) & 'c'
> (1 row)
Against another test table, English search config, I confirmed that 'a
& ball'::tsquery doesn't match anything, but to_tsquery('a & ball')
does.
Thanks,
David