Обсуждение: Array value syntax and escaping

Поиск
Список
Период
Сортировка

Array value syntax and escaping

От
Peter Bex
Дата:
Hello all,

I am currently adding array value handling to the PostgreSQL interface
for the Chicken Scheme compiler[*] and I was wondering if there's a more
detailed documentation for the exact syntax of arrays than the short
natural language explanation in the manual.

There doesn't appear to be a helper function in libpq to escape (and
unescape) string values for use inside array values and I'm concerned
that my homebrew procedures might not be foolproof.

If I understand correctly, the connection's character encoding is also
used to determine how strings sent by "PQsendQueryParams parameters"
(I don't know if there's an unambiguous name for those) are parsed, and
it is conceivable that either through bogus or malicous input a
multibyte array string could be wrongly escaped, which means one array
value could "break out" of its proper position in the array, resulting
in an array of different length or contents than intended.

IMHO, it would be a Good Thing to have helper procedures in libpq.
That would fix the problem once and for all in one place.

I briefly considered "abusing" the PQescapeIdentifier procedure for
escaping since the syntax for literals inside arrays seems to be exactly
like that of SQL identifiers, but I'm not 100% sure about that and I
also think the PQescapeIdentifier procedure shouldn't be overloaded for
this purpose.

Cheers,
Peter

[*] http://chicken.wiki.br/eggref/4/postgresql
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: Array value syntax and escaping

От
Tom Lane
Дата:
Peter Bex <Peter.Bex@xs4all.nl> writes:
> I am currently adding array value handling to the PostgreSQL interface
> for the Chicken Scheme compiler[*] and I was wondering if there's a more
> detailed documentation for the exact syntax of arrays than the short
> natural language explanation in the manual.

There is not, but you could always look at the source for array_out and
array_in if you want definitive answers ;-).  Personally what I'd do is
always double-quote each non-null array element; then the rules reduce
to "backslash any backslashes or double quotes".  If you're working in
a sane client encoding (not SJIS for instance) this is pretty trivial.

> IMHO, it would be a Good Thing to have helper procedures in libpq.

The problem with that is that the necessarily-arbitrary API would
probably add as much or more complexity as would be saved.  If C had
a simple and universally-followed convention for variable-size arrays,
it'd be easier to provide useful helpers ...

> I briefly considered "abusing" the PQescapeIdentifier procedure for
> escaping since the syntax for literals inside arrays seems to be exactly
> like that of SQL identifiers, but I'm not 100% sure about that and I
> also think the PQescapeIdentifier procedure shouldn't be overloaded for
> this purpose.

Well, that wouldn't work anyway, since backslashes aren't special in
identifiers.

            regards, tom lane

Re: Array value syntax and escaping

От
Peter Bex
Дата:
On Thu, Apr 01, 2010 at 04:37:23PM -0400, Tom Lane wrote:
> There is not, but you could always look at the source for array_out and
> array_in if you want definitive answers ;-).  Personally what I'd do is
> always double-quote each non-null array element; then the rules reduce
> to "backslash any backslashes or double quotes".

That's what I'm doing right now.

> If you're working in a sane client encoding (not SJIS for instance)
> this is pretty trivial.

That's a relief :)
OTOH, for a generic library, it's not a good idea to make such assumptions..

> The problem with that is that the necessarily-arbitrary API would
> probably add as much or more complexity as would be saved.  If C had
> a simple and universally-followed convention for variable-size arrays,
> it'd be easier to provide useful helpers ...

What I was proposing is a simple escaper for string values, nothing
more.  Putting array decoration around those is trivial, encoding the
strings is the tricky part.

However, this got me thinking: It could use the same system as
environment and argv values in C, or the new connection procedures of
libpq; a null-terminated list of string pointers.

But then the question is how to encode nested arrays.  I guess it's
feasible to pass the array depth as an extra argument to the escaping
procedure, since we know arrays cannot have variable depths between
elements.  This could really work and wouldn't need to be overly complex.

It wouldn't save a lot of complexity, but it would save some wheel
reinvention in a case where there's room for error, just like PQescape*.
You could argue that PQescape* don't save a lot of complexity either,
yet those are considered a good idea.  I don't see how this is any
different.

> > I briefly considered "abusing" the PQescapeIdentifier procedure for
> > escaping since the syntax for literals inside arrays seems to be exactly
> > like that of SQL identifiers, but I'm not 100% sure about that and I
> > also think the PQescapeIdentifier procedure shouldn't be overloaded for
> > this purpose.
>
> Well, that wouldn't work anyway, since backslashes aren't special in
> identifiers.

Good thing I didn't abuse it, then :)

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth