Обсуждение: BUG #14245: Segfault on weird to_tsquery

Поиск
Список
Период
Сортировка

BUG #14245: Segfault on weird to_tsquery

От
david@gravitext.com
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDI0NQpMb2dnZWQgYnk6ICAg
ICAgICAgIERhdmlkIEtlbGx1bQpFbWFpbCBhZGRyZXNzOiAgICAgIGRhdmlk
QGdyYXZpdGV4dC5jb20KUG9zdGdyZVNRTCB2ZXJzaW9uOiA5LjZiZXRhMgpP
cGVyYXRpbmcgc3lzdGVtOiAgIExpbnV4CkRlc2NyaXB0aW9uOiAgICAgICAg
CgpJIGFtIGRvaW5nIHNvbWUgKGZ1enopIHRlc3Rpbmcgb2YgZnVsbCB0ZXh0
IHF1ZXJpZXMgYW5kIG1hbmFnZWQgdG8NCmdlbmVyYXRlIHRoZSBmb2xsb3dp
bmcgY2FzZSB3aGljaCBjYXVzZXMgYSBTRUdGQVVMVCBvbiBQb3N0Z3JlU1FM
IDkuNg0KYmV0YTEgYW5kIGJldGEyOg0KDQpzZWxlY3QgdG9fdHNxdWVyeSgn
IShhICYgIWIpICYgYycpIGFzIHRzcXVlcnkNCg0KVGhpcyB3ZWlyZCBxdWVy
eSBvdXRwdXRzIHRoZSBmb2xsb3dpbmcgb24gOS41LjIsIGluc3RlYWQgb2Yg
Y3Jhc2hpbmc6DQoNCiIhKCAhJ2InICkgJiAnYyciDQoNCkJlbG93IGlzIG15
IGxvZyBvdXRwdXQsIHdoaWNoIGluY2x1ZGVzIGEgc3RhY2sgdHJhY2U6DQoN
Ckp1bCAxMiAxMDowNDowMSBrbGVpbiBrZXJuZWw6IHBvc3RncmVzWzIyMTkx
XTogc2VnZmF1bHQgYXQgMTAgaXAKMDAwMDAwMDAwMDc3NTRjZCBzcCAwMDAw
N2ZmYzY0YjRhOTUwIGVycm9yIDQgaW4gcG9zdGdyZXNbNDAwMDAwKzVmODAw
MF0NCkp1bCAxMiAxMDowNDowMSBrbGVpbiBzeXN0ZW1kWzFdOiBTdGFydGVk
IFByb2Nlc3MgQ29yZSBEdW1wIChQSUQgMjIxOTIvVUlECjApLg0KSnVsIDEy
IDEwOjA0OjAxIGtsZWluIHBvc3RncmVzWzQ4Ml06IExPRzogIHNlcnZlciBw
cm9jZXNzIChQSUQgMjIxOTEpIHdhcwp0ZXJtaW5hdGVkIGJ5IHNpZ25hbCAx
MTogU2VnbWVudGF0aW9uIGZhdWx0DQpKdWwgMTIgMTA6MDQ6MDEga2xlaW4g
cG9zdGdyZXNbNDgyXTogREVUQUlMOiAgRmFpbGVkIHByb2Nlc3Mgd2FzIHJ1
bm5pbmc6CnNlbGVjdCB0b190c3F1ZXJ5KCchKGEgJiAhYikgJiBjJykgYXMg
dHNxdWVyeQ0KSnVsIDEyIDEwOjA0OjAxIGtsZWluIHBvc3RncmVzWzQ4Ml06
IExPRzogIHRlcm1pbmF0aW5nIGFueSBvdGhlciBhY3RpdmUKc2VydmVyIHBy
b2Nlc3Nlcw0KSnVsIDEyIDEwOjA0OjAxIGtsZWluIHBvc3RncmVzWzQ4Ml06
IFdBUk5JTkc6ICB0ZXJtaW5hdGluZyBjb25uZWN0aW9uCmJlY2F1c2Ugb2Yg
Y3Jhc2ggb2YgYW5vdGhlciBzZXJ2ZXIgcHJvY2Vzcw0KSnVsIDEyIDEwOjA0
OjAxIGtsZWluIHBvc3RncmVzWzQ4Ml06IERFVEFJTDogIFRoZSBwb3N0bWFz
dGVyIGhhcyBjb21tYW5kZWQKdGhpcyBzZXJ2ZXIgcHJvY2VzcyB0byByb2xs
IGJhY2sgdGhlIGN1cnJlbnQgdHJhbnNhY3Rpb24gYW5kIGV4aXQsIGJlY2F1
c2UKYW5vdGhlciBzZXJ2ZXIgcHJvY2VzcyBleGl0ZWQgYWJub3JtYWxseSBh
bmQgcG9zc2libHkgY29ycnVwdGVkIHNoYXJlZAptZW1vcnkuDQpKdWwgMTIg
MTA6MDQ6MDEga2xlaW4gcG9zdGdyZXNbNDgyXTogSElOVDogIEluIGEgbW9t
ZW50IHlvdSBzaG91bGQgYmUgYWJsZQp0byByZWNvbm5lY3QgdG8gdGhlIGRh
dGFiYXNlIGFuZCByZXBlYXQgeW91ciBjb21tYW5kLg0KSnVsIDEyIDEwOjA0
OjAxIGtsZWluIHBvc3RncmVzWzQ4Ml06IExPRzogIGFsbCBzZXJ2ZXIgcHJv
Y2Vzc2VzIHRlcm1pbmF0ZWQ7CnJlaW5pdGlhbGl6aW5nDQpKdWwgMTIgMTA6
MDQ6MDEga2xlaW4gcG9zdGdyZXNbNDgyXTogTE9HOiAgZGF0YWJhc2Ugc3lz
dGVtIHdhcyBpbnRlcnJ1cHRlZDsKbGFzdCBrbm93biB1cCBhdCAyMDE2LTA3
LTEyIDEwOjAzOjQ3IFBEVA0KSnVsIDEyIDEwOjA0OjAxIGtsZWluIHN5c3Rl
bWQtY29yZWR1bXBbMjIxOTNdOiBQcm9jZXNzIDIyMTkxIChwb3N0Z3Jlcykg
b2YKdXNlciA4OCBkdW1wZWQgY29yZS4NCiAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgU3RhY2sgdHJhY2Ugb2YgdGhy
ZWFkCjIyMTkxOg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAjMCAgMHgwMDAwMDAwMDAwNzc1NGNkCm5vcm1hbGl6
ZV9waHJhc2VfdHJlZSAocG9zdGdyZXMpDQogICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICMxICAweDAwMDAwMDAwMDA3
NzU2ZTEKbm9ybWFsaXplX3BocmFzZV90cmVlIChwb3N0Z3JlcykNCiAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIzIg
IDB4MDAwMDAwMDAwMDc3NTZkNQpub3JtYWxpemVfcGhyYXNlX3RyZWUgKHBv
c3RncmVzKQ0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAjMyAgMHgwMDAwMDAwMDAwNzc1OWJiCmNsZWFudXBfZmFr
ZXZhbF9hbmRfcGhyYXNlIChwb3N0Z3JlcykNCiAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIzQgIDB4MDAwMDAwMDAw
MDc3NDYxMwpwYXJzZV90c3F1ZXJ5IChwb3N0Z3JlcykNCiAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIzUgIDB4MDAw
MDAwMDAwMDZjYTIxYQp0b190c3F1ZXJ5X2J5aWQgKHBvc3RncmVzKQ0KICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAj
NiAgMHgwMDAwMDAwMDAwN2FkNWE3CkRpcmVjdEZ1bmN0aW9uQ2FsbDJDb2xs
IChwb3N0Z3JlcykNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgIzcgIDB4MDAwMDAwMDAwMDViNzljMQpFeGVjTWFr
ZUZ1bmN0aW9uUmVzdWx0Tm9TZXRzIChwb3N0Z3JlcykNCiAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIzggIDB4MDAw
MDAwMDAwMDViZDI4NQpFeGVjUHJvamVjdCAocG9zdGdyZXMpDQogICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICM5ICAw
eDAwMDAwMDAwMDA1ZDE3MjIKRXhlY1Jlc3VsdCAocG9zdGdyZXMpDQogICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMx
MCAweDAwMDAwMDAwMDA1YjZhNTgKRXhlY1Byb2NOb2RlIChwb3N0Z3JlcykN
CiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgIzExIDB4MDAwMDAwMDAwMDViMmZlZgpzdGFuZGFyZF9FeGVjdXRvclJ1
biAocG9zdGdyZXMpDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICMxMiAweDAwMDAwMDAwMDA2YmJhZjgKUG9ydGFs
UnVuU2VsZWN0IChwb3N0Z3JlcykNCiAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgIzEzIDB4MDAwMDAwMDAwMDZiY2Yx
ZQpQb3J0YWxSdW4gKHBvc3RncmVzKQ0KICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAjMTQgMHgwMDAwMDAwMDAwNmJh
OTc5ClBvc3RncmVzTWFpbiAocG9zdGdyZXMpDQogICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMxNSAweDAwMDAwMDAw
MDA0NmYzNWYKU2VydmVyTG9vcCAocG9zdGdyZXMpDQogICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMxNiAweDAwMDAw
MDAwMDA2NjEyNGMKUG9zdG1hc3Rlck1haW4gKHBvc3RncmVzKQ0KICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjMTcg
MHgwMDAwMDAwMDAwNDcwM2ZmIG1haW4KKHBvc3RncmVzKQ0KICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjMTggMHgw
MDAwN2ZlMTE0ODEyNzQxCl9fbGliY19zdGFydF9tYWluIChsaWJjLnNvLjYp
DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICMxOSAweDAwMDAwMDAwMDA0NzA0OTkgX3N0YXJ0Cihwb3N0Z3JlcykN
Ckp1bCAxMiAxMDowNDowMiBrbGVpbiBwb3N0Z3Jlc1s0ODJdOiBMT0c6ICBk
YXRhYmFzZSBzeXN0ZW0gd2FzIG5vdCBwcm9wZXJseQpzaHV0IGRvd247IGF1
dG9tYXRpYyByZWNvdmVyeSBpbiBwcm9ncmVzcw0KSnVsIDEyIDEwOjA0OjAy
IGtsZWluIHBvc3RncmVzWzQ4Ml06IExPRzogIGludmFsaWQgcmVjb3JkIGxl
bmd0aCBhdAoxLzJGQTNFMUM4OiB3YW50ZWQgMjQsIGdvdCAwDQpKdWwgMTIg
MTA6MDQ6MDIga2xlaW4gcG9zdGdyZXNbNDgyXTogTE9HOiAgcmVkbyBpcyBu
b3QgcmVxdWlyZWQNCkp1bCAxMiAxMDowNDowMiBrbGVpbiBwb3N0Z3Jlc1s0
ODJdOiBMT0c6ICBNdWx0aVhhY3QgbWVtYmVyIHdyYXBhcm91bmQKcHJvdGVj
dGlvbnMgYXJlIG5vdyBlbmFibGVkDQpKdWwgMTIgMTA6MDQ6MDIga2xlaW4g
cG9zdGdyZXNbNDgyXTogTE9HOiAgZGF0YWJhc2Ugc3lzdGVtIGlzIHJlYWR5
IHRvCmFjY2VwdCBjb25uZWN0aW9ucw0KSnVsIDEyIDEwOjA0OjAyIGtsZWlu
IHBvc3RncmVzWzQ4Ml06IExPRzogIGF1dG92YWN1dW0gbGF1bmNoZXIgc3Rh
cnRlZA0KCgo=

Re: BUG #14245: Segfault on weird to_tsquery

От
Peter Geoghegan
Дата:
On Tue, Jul 12, 2016 at 10:58 AM,  <david@gravitext.com> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14245
> Logged by:          David Kellum
> Email address:      david@gravitext.com
> PostgreSQL version: 9.6beta2
> Operating system:   Linux
> Description:
>
> I am doing some (fuzz) testing of full text queries and managed to
> generate the following case which causes a SEGFAULT on PostgreSQL 9.6
> beta1 and beta2:
>
> select to_tsquery('!(a & !b) & c') as tsquery

Interesting discovery. How did you fuzz test?


--
Peter Geoghegan

Re: BUG #14245: Segfault on weird to_tsquery

От
Peter Geoghegan
Дата:
On Tue, Jul 12, 2016 at 11:40 AM, Peter Geoghegan <pg@heroku.com> wrote:
> Interesting discovery. How did you fuzz test?

This appears to be a NULL pointer dereference. Here is a backtrace
with proper debug info:

#0  0x0000000000e45ada in normalize_phrase_tree (node=0x0) at
tsquery_cleanup.c:397
#1  0x0000000000e468f3 in normalize_phrase_tree (node=<optimized out>)
at tsquery_cleanup.c:416
#2  0x0000000000e4687f in normalize_phrase_tree (node=0x0) at
tsquery_cleanup.c:543
#3  0x0000000000e44ce9 in cleanup_fakeval_and_phrase (in=<optimized
out>) at tsquery_cleanup.c:603
#4  0x0000000000e3f528 in parse_tsquery (buf=<optimized out>,
pushval=0x6250002e9490, opaque=<optimized out>, isplain=<optimized
out>) at tsquery.c:695
#5  0x0000000000c8abcf in to_tsquery_byid (fcinfo=<optimized out>) at
to_tsany.c:372
#6  0x0000000000ee0cc6 in DirectFunctionCall2Coll (func=0xc8aac0
<to_tsquery_byid>, collation=1342381084, arg1=12126,
arg2=108095739809240) at fmgr.c:1049
#7  0x000000000093d2a9 in ExecMakeFunctionResultNoSets
(fcache=<optimized out>, econtext=0x6250002ee368, isNull=<optimized
out>, isDone=<optimized out>) at execQual.c:2041
#8  0x000000000093a89c in ExecTargetList (targetlist=0x6250002ef0e0,
tupdesc=<optimized out>, econtext=<optimized out>,
values=0x6250002eefb8, isnull=0x6250002eefd8 "\276~\276\276\276"...,
itemIsDone=0x6250002ef118, isDone=<optimized out>) at execQual.c:5376
#9  0x000000000093a5ab in ExecProject (projInfo=<optimized out>,
isDone=<optimized out>) at execQual.c:5600
***SNIP ***

--
Peter Geoghegan

Re: BUG #14245: Segfault on weird to_tsquery

От
David Kellum
Дата:
On Tue, Jul 12, 2016 at 11:40 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Jul 12, 2016 at 10:58 AM,  <david@gravitext.com> wrote:
>>  The following bug has been logged on the website:
>>
>>  Bug reference:      14245
>>
>>  I am doing some (fuzz) testing of full text queries and managed to
>>  generate the following case which causes a SEGFAULT on PostgreSQL
>> 9.6
>>  beta1 and beta2:
>>
>>  select to_tsquery('!(a & !b) & c') as tsquery
>
> Interesting discovery. How did you fuzz test?

Motivated by the new phrase search support in 9.6, I'm working on a
query language which is lenient to any user input when parsed and can
be transformed and output to PG tsquery syntax.  The fuzz testing is by
randomly permuted fragments in the custom query language.  Using this,
I found and fixed a bunch of issues in my own parser, and identified
lots of characters to treat as whitespace and filter before output to
tsquery, before stumbling on this Postgres crash.

Re: BUG #14245: Segfault on weird to_tsquery

От
Tom Lane
Дата:
david@gravitext.com writes:
> I am doing some (fuzz) testing of full text queries and managed to
> generate the following case which causes a SEGFAULT on PostgreSQL 9.6
> beta1 and beta2:
> select to_tsquery('!(a & !b) & c') as tsquery
> This weird query outputs the following on 9.5.2, instead of crashing:
> "!( !'b' ) & 'c'"

Note that while crashing is certainly not good, the pre-9.6 behavior
can hardly be called correct either.  What happened to 'a'?

Also, it looks like this is specific to to_tsquery; if you just feed
the same thing to tsqueryin, it seems fine with it:

# select '!(a & !b) & c'::tsquery;
        tsquery
-----------------------
 !( 'a' & !'b' ) & 'c'
(1 row)

            regards, tom lane

Re: BUG #14245: Segfault on weird to_tsquery

От
David Kellum
Дата:
On Tue, Jul 12, 2016 at 12:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> david@gravitext.com writes:
>>  I am doing some (fuzz) testing of full text queries and managed to
>>  generate the following case which causes a SEGFAULT on PostgreSQL
>> 9.6
>>  beta1 and beta2:
>>  select to_tsquery('!(a & !b) & c') as tsquery
>>  This weird query outputs the following on 9.5.2, instead of
>> crashing:
>>  "!( !'b' ) & 'c'"
>
> Note that while crashing is certainly not good, the pre-9.6 behavior
> can hardly be called correct either.  What happened to 'a'?

'a' is a stopword, dropped by to_tsquery() as described here:

https://www.postgresql.org/docs/9.6/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
> The difference is that while basic tsquery input takes the tokens at
> face value, to_tsquery normalizes each token into a lexeme using the
> specified or default configuration, and discards any tokens that are
> stop words according to the configuration.

...and I believe I want this behavior.  Otherwise queries with stopword
in '&' condition will not match anything.  In truth I have no reason to
want to support this kind of weird double negative, on any version, and
will also look at filtering it out in my code before calling
to_tsquery().

It might be worth noting that these other slightly different cases are
fine on 9.6:

select to_tsquery('!(apple & !b) & c'); ---> !( 'appl' & !'b' ) & 'c'
select to_tsquery('!(apple & !a) & c'); ---> !'appl' & 'c'\

Clearly a pretty obscure case, but a crash nonetheless.

> Also, it looks like this is specific to to_tsquery; if you just feed
> the same thing to tsqueryin, it seems fine with it:
>
> # select '!(a & !b) & c'::tsquery;
>         tsquery
> -----------------------
>  !( 'a' & !'b' ) & 'c'
> (1 row)

Against another test table, English search config, I confirmed that 'a
& ball'::tsquery doesn't match anything, but to_tsquery('a & ball')
does.

Thanks,
David

Re: BUG #14245: Segfault on weird to_tsquery

От
Tom Lane
Дата:
David Kellum <david@gravitext.com> writes:
> On Tue, Jul 12, 2016 at 12:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Note that while crashing is certainly not good, the pre-9.6 behavior
>> can hardly be called correct either.  What happened to 'a'?

> 'a' is a stopword, dropped by to_tsquery() as described here:

Ah!  OK, so it's probably necessary to have a stopword there in order
to break it.

BTW, all these variants also crash:

select to_tsquery('!(a | !b) & c') as tsquery;
select to_tsquery('!( !b & a) & c') as tsquery;
select to_tsquery('!( !b | a) & c') as tsquery;

            regards, tom lane

Re: BUG #14245: Segfault on weird to_tsquery

От
Noah Misch
Дата:
On Tue, Jul 12, 2016 at 05:11:32PM -0400, Tom Lane wrote:
> David Kellum <david@gravitext.com> writes:
> > On Tue, Jul 12, 2016 at 12:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Note that while crashing is certainly not good, the pre-9.6 behavior
> >> can hardly be called correct either.  What happened to 'a'?
> 
> > 'a' is a stopword, dropped by to_tsquery() as described here:
> 
> Ah!  OK, so it's probably necessary to have a stopword there in order
> to break it.
> 
> BTW, all these variants also crash:
> 
> select to_tsquery('!(a | !b) & c') as tsquery;
> select to_tsquery('!( !b & a) & c') as tsquery;
> select to_tsquery('!( !b | a) & c') as tsquery;

[Action required within 72 hours.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com



Re: [HACKERS] BUG #14245: Segfault on weird to_tsquery

От
Teodor Sigaev
Дата:
> The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
I'm working on it now and believe that fix will be published  today.

> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
>
> [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
>
>

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: BUG #14245: Segfault on weird to_tsquery

От
Teodor Sigaev
Дата:
> select to_tsquery('!(a & !b) & c') as tsquery

Thank you very much for your report, fixed.



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/