Обсуждение: TCL_ARRAYS code in libpgtcl is pretty seriously broken

Поиск
Список
Период
Сортировка

TCL_ARRAYS code in libpgtcl is pretty seriously broken

От
Tom Lane
Дата:
There is some code in libpgtcl that purports to convert Postgres array
data values into Tcl lists.  Is anyone prepared to argue that that code
does something useful in its present state?  I can name half a dozen
bugs in it without breathing hard:

1. Blithely assumes that any data value beginning with '{' and ending
with '}' must represent an array value.  Should have some more robust
way of discovering whether a column is array type.  (In fairness, this
might require a FE/BE protocol change, unless arrayness can be
determined from the tuple descriptors provided by the backend, ie,
field type OID, size, and attmod.  Anybody know a way to do that?)

2. Applies a translation that converts all backslash escape sequences
defined for C string constants into their equivalent single characters.
Since neither the backend nor Tcl generate anything close to C-string
escapes, the point of this is difficult to determine.  It does however
result in unexpected output, eg disappearing backslashes.

3. Applies said translation even when processing a non-array data value.

4. Doesn't actually manage to produce a valid Tcl list, if the data
contains anything Tcl considers a special character.  What it *should*
be doing is quoting, not de-quoting.

5. Fails to modify \\, \{, and \} (thus quite unintentionally doing
almost the right thing...) when these sequences appear inside an array
value, because "they will be unescaped by Tcl in Tcl_AppendElement".
But in fact Tcl_AppendElement is not invoked on the results of this
code.

6. Modifies the string returned by libpq *in place*.  This would be a
const-ness violation if we had been more careful about declaring things
const.  More importantly, it means that re-examining the same tuple of
the PGresult will yield a different result.  Not cool.

7. The TCL_ARRAYS code is only invoked in the "-assign" variant of
the pgtcl pg_result statement, not in any of the other paths that allow
tuple values to be examined.  This is presumably an oversight, not
the intended behavior.

8. Does not cope with MULTIBYTE strings.  (But I don't think Tcl does
either, so it's not clear that this can be called a bug.)


I am strongly inclined to rip this code out, because it is responsible
for several behaviors that were correctly called bugs when backslash-
handling was discussed on pgsql-interfaces back in August.  If we don't
rip it out, it needs a complete rewrite.

Unless there is a bulletproof solution to problem #1 (how to tell
whether a field's data type is array), I do not think it is appropriate
for the basic pg_result code to be applying any such transform.  Perhaps
it would be reasonable to invent a separate string-formatting function,
say "pg_arraytolist", that would perform the conversion.  It would then
be the application writer's responsibility to know which fields were
arrays and apply the conversion if he wanted it.

Comments?

            regards, tom lane

Re: [HACKERS] TCL_ARRAYS code in libpgtcl is pretty seriously broken

От
"Thomas G. Lockhart"
Дата:
> 1. Blithely assumes that any data value beginning with '{' and ending
> with '}' must represent an array value.  Should have some more robust
> way of discovering whether a column is array type.  (In fairness, this
> might require a FE/BE protocol change, unless arrayness can be
> determined from the tuple descriptors provided by the backend, ie,
> field type OID, size, and attmod.  Anybody know a way to do that?)
> Comments?

Postgres seems to use a convention that a type name which starts with an
underscore is the array type for the corresponding non-underscore,
non-array type. Also, the typelem field in pg_type is non-zero for array
types.

This isn't a definitive answer and there may be another way to discover
array-ness but it's where I would look. Not sure if you'd be happy
having to do a select on pg_type for every query unless you're doing it
already...

                    - Tom

Re: [HACKERS] TCL_ARRAYS code in libpgtcl is pretty seriously broken

От
Bruce Momjian
Дата:
> There is some code in libpgtcl that purports to convert Postgres array
> data values into Tcl lists.  Is anyone prepared to argue that that code
> does something useful in its present state?  I can name half a dozen
> bugs in it without breathing hard:
>
> 1. Blithely assumes that any data value beginning with '{' and ending
> with '}' must represent an array value.  Should have some more robust
> way of discovering whether a column is array type.  (In fairness, this
> might require a FE/BE protocol change, unless arrayness can be
> determined from the tuple descriptors provided by the backend, ie,
> field type OID, size, and attmod.  Anybody know a way to do that?)
>
> 2. Applies a translation that converts all backslash escape sequences
> defined for C string constants into their equivalent single characters.
> Since neither the backend nor Tcl generate anything close to C-string
> escapes, the point of this is difficult to determine.  It does however
> result in unexpected output, eg disappearing backslashes.
>
> 3. Applies said translation even when processing a non-array data value.
>
> 4. Doesn't actually manage to produce a valid Tcl list, if the data
> contains anything Tcl considers a special character.  What it *should*
> be doing is quoting, not de-quoting.
>
> 5. Fails to modify \\, \{, and \} (thus quite unintentionally doing
> almost the right thing...) when these sequences appear inside an array
> value, because "they will be unescaped by Tcl in Tcl_AppendElement".
> But in fact Tcl_AppendElement is not invoked on the results of this
> code.
>
> 6. Modifies the string returned by libpq *in place*.  This would be a
> const-ness violation if we had been more careful about declaring things
> const.  More importantly, it means that re-examining the same tuple of
> the PGresult will yield a different result.  Not cool.
>
> 7. The TCL_ARRAYS code is only invoked in the "-assign" variant of
> the pgtcl pg_result statement, not in any of the other paths that allow
> tuple values to be examined.  This is presumably an oversight, not
> the intended behavior.
>
> 8. Does not cope with MULTIBYTE strings.  (But I don't think Tcl does
> either, so it's not clear that this can be called a bug.)
>
>
> I am strongly inclined to rip this code out, because it is responsible
> for several behaviors that were correctly called bugs when backslash-
> handling was discussed on pgsql-interfaces back in August.  If we don't
> rip it out, it needs a complete rewrite.
>
> Unless there is a bulletproof solution to problem #1 (how to tell
> whether a field's data type is array), I do not think it is appropriate
> for the basic pg_result code to be applying any such transform.  Perhaps
> it would be reasonable to invent a separate string-formatting function,
> say "pg_arraytolist", that would perform the conversion.  It would then
> be the application writer's responsibility to know which fields were
> arrays and apply the conversion if he wanted it.

I have just started to learn TCL, and have Practical Tcl and TK by Brent
Welch on my desk.

Sounds like our interface needs fixing.  I am sure we currently do not
handle full protocol correctly.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] TCL_ARRAYS code in libpgtcl is pretty seriously broken

От
Tom Lane
Дата:
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:
> This isn't a definitive answer and there may be another way to discover
> array-ness but it's where I would look. Not sure if you'd be happy
> having to do a select on pg_type for every query unless you're doing it
> already...

I think that won't fly for performance reasons.  I know I wouldn't want
my Tcl applications paying an additional frontend-to-backend round trip
for every SELECT result...

I was actually a tad surprised to realize that the column-type info sent
by the backend couldn't answer such a basic question as "is this an
array?".  Something to fix if we ever rev the FE/BE protocol again.
Don't think I'd propose a protocol rev just for this, though.

In the meantime, I'm inclined to take the fallback approach I suggested
yesterday: provide the array-to-list reformatting function as a separate
Tcl statement that the application programmer can decide to invoke.
The app writer is likely to know perfectly well where he needs that
feature anyway.

            regards, tom lane

Re: [HACKERS] TCL_ARRAYS code in libpgtcl is pretty seriously broken

От
Massimo Dal Zotto
Дата:
>
> There is some code in libpgtcl that purports to convert Postgres array
> data values into Tcl lists.  Is anyone prepared to argue that that code
> does something useful in its present state?  I can name half a dozen
> bugs in it without breathing hard:
>
> 1. Blithely assumes that any data value beginning with '{' and ending
> with '}' must represent an array value.  Should have some more robust
> way of discovering whether a column is array type.  (In fairness, this
> might require a FE/BE protocol change, unless arrayness can be
> determined from the tuple descriptors provided by the backend, ie,
> field type OID, size, and attmod.  Anybody know a way to do that?)
>
> 2. Applies a translation that converts all backslash escape sequences
> defined for C string constants into their equivalent single characters.
> Since neither the backend nor Tcl generate anything close to C-string
> escapes, the point of this is difficult to determine.  It does however
> result in unexpected output, eg disappearing backslashes.
>
> 3. Applies said translation even when processing a non-array data value.
>
> 4. Doesn't actually manage to produce a valid Tcl list, if the data
> contains anything Tcl considers a special character.  What it *should*
> be doing is quoting, not de-quoting.
>
> 5. Fails to modify \\, \{, and \} (thus quite unintentionally doing
> almost the right thing...) when these sequences appear inside an array
> value, because "they will be unescaped by Tcl in Tcl_AppendElement".
> But in fact Tcl_AppendElement is not invoked on the results of this
> code.
>
> 6. Modifies the string returned by libpq *in place*.  This would be a
> const-ness violation if we had been more careful about declaring things
> const.  More importantly, it means that re-examining the same tuple of
> the PGresult will yield a different result.  Not cool.
>
> 7. The TCL_ARRAYS code is only invoked in the "-assign" variant of
> the pgtcl pg_result statement, not in any of the other paths that allow
> tuple values to be examined.  This is presumably an oversight, not
> the intended behavior.
>
> 8. Does not cope with MULTIBYTE strings.  (But I don't think Tcl does
> either, so it's not clear that this can be called a bug.)
>
>
> I am strongly inclined to rip this code out, because it is responsible
> for several behaviors that were correctly called bugs when backslash-
> handling was discussed on pgsql-interfaces back in August.  If we don't
> rip it out, it needs a complete rewrite.
>
> Unless there is a bulletproof solution to problem #1 (how to tell
> whether a field's data type is array), I do not think it is appropriate
> for the basic pg_result code to be applying any such transform.  Perhaps
> it would be reasonable to invent a separate string-formatting function,
> say "pg_arraytolist", that would perform the conversion.  It would then
> be the application writer's responsibility to know which fields were
> arrays and apply the conversion if he wanted it.
>
> Comments?
>
>             regards, tom lane

I wrote this code and used it for two years without any problem. All
the bugs you mentioned disappear if you use the proper string output
functions which C-like escapes (code in contrib/string-io). This makes
also possible to distinguish between array and normal attibutes.
The code works fine in this case. I did a lot of testing at the time.
However it is ok to move it into a separate tcl command.

--
Massimo Dal Zotto

+----------------------------------------------------------------------+
|  Massimo Dal Zotto                email:  dz@cs.unitn.it             |
|  Via Marconi, 141                 phone:  ++39-461-534251            |
|  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
|  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+

Re: [HACKERS] TCL_ARRAYS code in libpgtcl is pretty seriously broken

От
Tom Lane
Дата:
Massimo Dal Zotto <dz@cs.unitn.it> writes:
>> [ my gripes about TCL_ARRAYS code snipped ]

> I wrote this code and used it for two years without any problem. All
> the bugs you mentioned disappear if you use the proper string output
> functions which C-like escapes (code in contrib/string-io).

Well, not the bug involving overwriting libpq's PGresult storage.
But still, this explains a good deal.

I guess my gripe here is that the TCL_ARRAYS option is enabled *by
default* in the Postgres distribution, but the code does not work
properly unless one plugs in a contrib function in the backend
(and if the connection between the two is documented anywhere, I sure
didn't see it).  This is not a reasonable default setup.

As a short-term fix I suggest that we just turn off TCL_ARRAYS in
the distributed config.h.in --- does that sound reasonable to you?

In the longer term, there needs to be some way of applying the
TCL_ARRAYS conversion code only when your contrib stuff is being used.
Backing the conversion code out of pg_result and making it a separate
function might do, but that is a low-tech solution that puts the
responsibility on the application programmer to combine the right bits
of frontend and backend functionality.  Perhaps someone can think of a
better way?

Ultimately I would like the text I/O functions to be completely 8-bit
clean and able to deal with arbitrary byte strings.  That is not
something we can hope to shoehorn into 6.4, though.

            regards, tom lane

Re: [HACKERS] TCL_ARRAYS code in libpgtcl is pretty seriously broken

От
Massimo Dal Zotto
Дата:
>
> Massimo Dal Zotto <dz@cs.unitn.it> writes:
> >> [ my gripes about TCL_ARRAYS code snipped ]
>
> > I wrote this code and used it for two years without any problem. All
> > the bugs you mentioned disappear if you use the proper string output
> > functions which C-like escapes (code in contrib/string-io).
>
> Well, not the bug involving overwriting libpq's PGresult storage.
> But still, this explains a good deal.

I didn't realize it could be called more than once for the same tuple.
We must allocate and free a new string for each value. I will write a patch.

> I guess my gripe here is that the TCL_ARRAYS option is enabled *by
> default* in the Postgres distribution, but the code does not work
> properly unless one plugs in a contrib function in the backend
> (and if the connection between the two is documented anywhere, I sure
> didn't see it).  This is not a reasonable default setup.

I submitted the two patches at the same time but one of them was put in the
contribs.

> As a short-term fix I suggest that we just turn off TCL_ARRAYS in
> the distributed config.h.in --- does that sound reasonable to you?

Given the situation I agree that TCL_ARRAYS should be disabled by default.

> In the longer term, there needs to be some way of applying the
> TCL_ARRAYS conversion code only when your contrib stuff is being used.
> Backing the conversion code out of pg_result and making it a separate
> function might do, but that is a low-tech solution that puts the
> responsibility on the application programmer to combine the right bits
> of frontend and backend functionality.  Perhaps someone can think of a
> better way?
>
> Ultimately I would like the text I/O functions to be completely 8-bit
> clean and able to deal with arbitrary byte strings.  That is not
> something we can hope to shoehorn into 6.4, though.

I completely agree on the last thing. It is something I have been asking
from a long time but apparently nobody cared bout it.

--
Massimo Dal Zotto

+----------------------------------------------------------------------+
|  Massimo Dal Zotto                email:  dz@cs.unitn.it             |
|  Via Marconi, 141                 phone:  ++39-461-534251            |
|  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
|  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+