Обсуждение: Use of backslash in tsearch2
Bruce Momjian wrote:
>
> I backed out the patch, attached, and it has fixed the regression
> problem. What has me confused is that is looks like it is checking for
> ', then putting \, which doesn't make a lot of sense, but the regression
> output is corrected, so I just don't get it. Here is an example:
>
> test=> SELECT E'''1 \\''2''';
> ?column?
> ----------
> '1 \'2'
>
> My only guess is that the output is somehow a single-quoted string
> itself, and in fact \' should become ''. Is that right? Basically they
> are doing \' in their output, and it should be doing '', but then the
> query above would be wrong and shouldn't be using \'.
As part of the move to support standard-conforming strings and treat
backslash literally, I reviewed the tsearch2 code and found two place
that seemed to use \' rather than '', and generated the attached patch.
('' is standards conforming.) However, when I fixed the code, the
regression tests failed.
Teodor, are the new attached regression results correct? If so, I will
apply the patch and update the expected file.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Index: contrib/tsearch2/query.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v
retrieving revision 1.25
diff -c -c -r1.25 query.c
*** contrib/tsearch2/query.c 19 May 2006 02:38:47 -0000 1.25
--- contrib/tsearch2/query.c 19 May 2006 04:37:35 -0000
***************
*** 748,754 ****
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\'';
in->cur++;
}
COPYCHAR(in->cur,op);
--- 748,754 ----
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\\';
in->cur++;
}
COPYCHAR(in->cur,op);
Index: contrib/tsearch2/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v
retrieving revision 1.18
diff -c -c -r1.18 tsvector.c
*** contrib/tsearch2/tsvector.c 19 May 2006 02:38:47 -0000 1.18
--- contrib/tsearch2/tsvector.c 19 May 2006 04:37:39 -0000
***************
*** 529,535 ****
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\'';
}
while(len--)
*curout++ = *curin++;
--- 529,535 ----
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\\';
}
while(len--)
*curout++ = *curin++;
*** ./expected/tsearch2.out Wed May 31 10:05:31 2006
--- ./results/tsearch2.out Mon Aug 21 20:01:12 2006
***************
*** 59,83 ****
SELECT E'''1 \\''2'''::tsvector;
tsvector
----------
! '1 \'2'
(1 row)
SELECT E'''1 \\''2''3'::tsvector;
tsvector
-------------
! '3' '1 \'2'
(1 row)
SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
! '3' '1 \'2'
(1 row)
SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector;
tsvector
------------------
! '4' ' 3' '1 \'2'
(1 row)
select '''w'':4A,3B,2C,1D,5 a:8';
--- 59,83 ----
SELECT E'''1 \\''2'''::tsvector;
tsvector
----------
! '1 ''2'
(1 row)
SELECT E'''1 \\''2''3'::tsvector;
tsvector
-------------
! '3' '1 ''2'
(1 row)
SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
! '3' '1 ''2'
(1 row)
SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector;
tsvector
------------------
! '4' ' 3' '1 ''2'
(1 row)
select '''w'':4A,3B,2C,1D,5 a:8';
***************
*** 138,144 ****
SELECT E'''1 \\''2'''::tsquery;
tsquery
---------
! '1 \'2'
(1 row)
SELECT '!1'::tsquery;
--- 138,144 ----
SELECT E'''1 \\''2'''::tsquery;
tsquery
---------
! '1 ''2'
(1 row)
SELECT '!1'::tsquery;
***************
*** 336,342 ****
SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery;
tsquery
------------------------------------------
! '1' & '2' & ' 4' & ( '|5' | '6 \' !|&' )
(1 row)
SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a';
--- 336,342 ----
SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery;
tsquery
------------------------------------------
! '1' & '2' & ' 4' & ( '|5' | '6 '' !|&' )
(1 row)
SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a';
======================================================================
Bruce Momjian <bruce@momjian.us> writes:
> As part of the move to support standard-conforming strings and treat
> backslash literally, I reviewed the tsearch2 code and found two place
> that seemed to use \' rather than '', and generated the attached patch.
I thought we had decided that that code should not be changed. It has
nothing to do with SQL literals, and changing it will create unnecessary
backwards-compatibility problems.
regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > As part of the move to support standard-conforming strings and treat > > backslash literally, I reviewed the tsearch2 code and found two place > > that seemed to use \' rather than '', and generated the attached patch. > > I thought we had decided that that code should not be changed. It has > nothing to do with SQL literals, and changing it will create unnecessary > backwards-compatibility problems. I don't remember any comment regarding that. I think it does relate to SQL literals because it is creating a literal inside a literal. Also, at the time this was a core-only discussion and I am hoping from a comment from the tsearch2 folks. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
> Bruce Momjian wrote:
> >
> > I backed out the patch, attached, and it has fixed the regression
> > problem. What has me confused is that is looks like it is checking for
> > ', then putting \, which doesn't make a lot of sense, but the regression
> > output is corrected, so I just don't get it. Here is an example:
> >
> > test=> SELECT E'''1 \\''2''';
> > ?column?
> > ----------
> > '1 \'2'
> >
> > My only guess is that the output is somehow a single-quoted string
> > itself, and in fact \' should become ''. Is that right? Basically they
> > are doing \' in their output, and it should be doing '', but then the
> > query above would be wrong and shouldn't be using \'.
>
> As part of the move to support standard-conforming strings and treat
> backslash literally, I reviewed the tsearch2 code and found two place
> that seemed to use \' rather than '', and generated the attached patch.
> ('' is standards conforming.) However, when I fixed the code, the
> regression tests failed.
>
> Teodor, are the new attached regression results correct? If so, I will
> apply the patch and update the expected file.
Updated patch attached. The previous one was reversed.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Index: contrib/tsearch2/query.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v
retrieving revision 1.26
diff -c -c -r1.26 query.c
*** contrib/tsearch2/query.c 19 May 2006 04:39:47 -0000 1.26
--- contrib/tsearch2/query.c 21 Aug 2006 23:58:20 -0000
***************
*** 748,754 ****
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\\';
in->cur++;
}
COPYCHAR(in->cur,op);
--- 748,754 ----
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\'';
in->cur++;
}
COPYCHAR(in->cur,op);
Index: contrib/tsearch2/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v
retrieving revision 1.20
diff -c -c -r1.20 tsvector.c
*** contrib/tsearch2/tsvector.c 11 Jul 2006 18:26:10 -0000 1.20
--- contrib/tsearch2/tsvector.c 21 Aug 2006 23:58:20 -0000
***************
*** 529,535 ****
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\\';
}
while(len--)
*curout++ = *curin++;
--- 529,535 ----
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\'';
}
while(len--)
*curout++ = *curin++;
>> Teodor, are the new attached regression results correct? If so, I will
>> apply the patch and update the expected file.
Patch isn't full, simple test (values are took from regression.diffs):
# create table tt (tv tsvector, tq tsquery);
# insert into tt values (E'''1 \\''2''', NULL);
# insert into tt values (E'''1 \\''2''3', NULL);
# insert into tt values ( E'''1 \\''2'' 3', NULL);
# insert into tt values ( E'''1 \\''2'' '' 3'' 4 ', NULL);
# insert into tt values ( NULL, E'''1 \\''2''');
# insert into tt values ( NULL, E'''1 \\''2''');
# insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))');
# insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))');
and try dump table and restore:
ERROR: syntax error
CONTEXT: COPY tt, line 5, column tq: "'1 ''2'"
PS I'm not subscribed to -patches, so I post to -hackers
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
> Patch isn't full, simple test (values are took from regression.diffs):
> and try dump table and restore:
> ERROR: syntax error
> CONTEXT: COPY tt, line 5, column tq: "'1 ''2'"
>
Attached cumulative patch fixes problem, but I have some doubts, is it really
needed?
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Вложения
Thanks. Yes, it is need for two reasons. In 8.2 you can set standard_conforming_strings to on, meaning \' is really treated as \ and ', and because some encodings now can't support \' for security reasons, though I don't think tsearch2 supports those multibyte encodings. Anyway, applied to 8.2 only, not backpatched. Thanks. --------------------------------------------------------------------------- Teodor Sigaev wrote: > > Patch isn't full, simple test (values are took from regression.diffs): > > and try dump table and restore: > > ERROR: syntax error > > CONTEXT: COPY tt, line 5, column tq: "'1 ''2'" > > > > Attached cumulative patch fixes problem, but I have some doubts, is it really > needed? > > > -- > Teodor Sigaev E-mail: teodor@sigaev.ru > WWW: http://www.sigaev.ru/ [ application/x-tar is not supported, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +