Обсуждение: Array value syntax and escaping
Hello all, I am currently adding array value handling to the PostgreSQL interface for the Chicken Scheme compiler[*] and I was wondering if there's a more detailed documentation for the exact syntax of arrays than the short natural language explanation in the manual. There doesn't appear to be a helper function in libpq to escape (and unescape) string values for use inside array values and I'm concerned that my homebrew procedures might not be foolproof. If I understand correctly, the connection's character encoding is also used to determine how strings sent by "PQsendQueryParams parameters" (I don't know if there's an unambiguous name for those) are parsed, and it is conceivable that either through bogus or malicous input a multibyte array string could be wrongly escaped, which means one array value could "break out" of its proper position in the array, resulting in an array of different length or contents than intended. IMHO, it would be a Good Thing to have helper procedures in libpq. That would fix the problem once and for all in one place. I briefly considered "abusing" the PQescapeIdentifier procedure for escaping since the syntax for literals inside arrays seems to be exactly like that of SQL identifiers, but I'm not 100% sure about that and I also think the PQescapeIdentifier procedure shouldn't be overloaded for this purpose. Cheers, Peter [*] http://chicken.wiki.br/eggref/4/postgresql -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
Peter Bex <Peter.Bex@xs4all.nl> writes:
> I am currently adding array value handling to the PostgreSQL interface
> for the Chicken Scheme compiler[*] and I was wondering if there's a more
> detailed documentation for the exact syntax of arrays than the short
> natural language explanation in the manual.
There is not, but you could always look at the source for array_out and
array_in if you want definitive answers ;-).  Personally what I'd do is
always double-quote each non-null array element; then the rules reduce
to "backslash any backslashes or double quotes".  If you're working in
a sane client encoding (not SJIS for instance) this is pretty trivial.
> IMHO, it would be a Good Thing to have helper procedures in libpq.
The problem with that is that the necessarily-arbitrary API would
probably add as much or more complexity as would be saved.  If C had
a simple and universally-followed convention for variable-size arrays,
it'd be easier to provide useful helpers ...
> I briefly considered "abusing" the PQescapeIdentifier procedure for
> escaping since the syntax for literals inside arrays seems to be exactly
> like that of SQL identifiers, but I'm not 100% sure about that and I
> also think the PQescapeIdentifier procedure shouldn't be overloaded for
> this purpose.
Well, that wouldn't work anyway, since backslashes aren't special in
identifiers.
            regards, tom lane
			
		On Thu, Apr 01, 2010 at 04:37:23PM -0400, Tom Lane wrote: > There is not, but you could always look at the source for array_out and > array_in if you want definitive answers ;-). Personally what I'd do is > always double-quote each non-null array element; then the rules reduce > to "backslash any backslashes or double quotes". That's what I'm doing right now. > If you're working in a sane client encoding (not SJIS for instance) > this is pretty trivial. That's a relief :) OTOH, for a generic library, it's not a good idea to make such assumptions.. > The problem with that is that the necessarily-arbitrary API would > probably add as much or more complexity as would be saved. If C had > a simple and universally-followed convention for variable-size arrays, > it'd be easier to provide useful helpers ... What I was proposing is a simple escaper for string values, nothing more. Putting array decoration around those is trivial, encoding the strings is the tricky part. However, this got me thinking: It could use the same system as environment and argv values in C, or the new connection procedures of libpq; a null-terminated list of string pointers. But then the question is how to encode nested arrays. I guess it's feasible to pass the array depth as an extra argument to the escaping procedure, since we know arrays cannot have variable depths between elements. This could really work and wouldn't need to be overly complex. It wouldn't save a lot of complexity, but it would save some wheel reinvention in a case where there's room for error, just like PQescape*. You could argue that PQescape* don't save a lot of complexity either, yet those are considered a good idea. I don't see how this is any different. > > I briefly considered "abusing" the PQescapeIdentifier procedure for > > escaping since the syntax for literals inside arrays seems to be exactly > > like that of SQL identifiers, but I'm not 100% sure about that and I > > also think the PQescapeIdentifier procedure shouldn't be overloaded for > > this purpose. > > Well, that wouldn't work anyway, since backslashes aren't special in > identifiers. Good thing I didn't abuse it, then :) Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth