Обсуждение: v3 protocol & string encoding
Couple of quick protocol questions: 1) What encoding is used for strings sent and received during the startup phase? I can set client_encoding to a known value as a parameter in the startup packet, but the protocol spec doesn't appear to say how the startup packet itself and the various strings sent/received during startup (e.g. authentication, error messages) are encoded. 2) At what point in the stream does a client_encoding change take effect -- immediately after the corresponding ParameterStatus message, or at some other point? -O
Oliver Jowett <oliver@opencloud.com> writes: > 1) What encoding is used for strings sent and received during the > startup phase? The startup packet itself will not get any encoding conversion AFAIR, so one way to look at it is that the data therein must be in server encoding. In practice, there are no strings therein that really need conversion anyway. (If you use characters outside 7-bit-ASCII for user or database names, you're going to have much worse problems than just this one.) Any client_encoding received from the client is not going to be applied until after the authentication exchange is complete, so the rest of that is going to be in server encoding as well. The only part of this that seems like it might be an issue is a failure ERROR message would be in server encoding, but the client wouldn't have any good way to know what that is ... > 2) At what point in the stream does a client_encoding change take effect > -- immediately after the corresponding ParameterStatus message, or at > some other point? ParameterStatus is sent when the change is made. regards, tom lane
Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes: > >>1) What encoding is used for strings sent and received during the >>startup phase? > > > The startup packet itself will not get any encoding conversion AFAIR, > so one way to look at it is that the data therein must be in server > encoding. In practice, there are no strings therein that really need > conversion anyway. (If you use characters outside 7-bit-ASCII for user > or database names, you're going to have much worse problems than just > this one.) The encoding of user & database names was my main concern. If they can only be 7-bit ASCII in practice, that's easy.. >>2) At what point in the stream does a client_encoding change take effect >>-- immediately after the corresponding ParameterStatus message, or at >>some other point? > > > ParameterStatus is sent when the change is made. Are the strings in the ParameterStatus encoded with the old or new client_encoding? I need to know the point in the stream to switch encodings. I suppose this is only an issue if there are pairs of encodings where "client_encoding" or the encoding names encode differently in the two encodings. Is it safe to assume that 7-bit ASCII is always encoded unchanged regardless of the encoding in use? -O
Oliver Jowett <oliver@opencloud.com> writes: > The encoding of user & database names was my main concern. If they can > only be 7-bit ASCII in practice, that's easy.. Well, you can *try* using other encodings, but there are enough known problems that I don't think it will work pleasantly unless client and server encodings are the same all the time. >>> 2) At what point in the stream does a client_encoding change take effect >>> -- immediately after the corresponding ParameterStatus message, or at >>> some other point? >> >> ParameterStatus is sent when the change is made. > Are the strings in the ParameterStatus encoded with the old or new > client_encoding? Okay, make that "sent just after the change is made". So it looks like you should receive a string in the new encoding. I can't offhand think of a way to test this though --- are any of the reported settings interesting from an encoding standpoint? > Is it safe to assume that 7-bit ASCII > is always encoded unchanged regardless of the encoding in use? Hm. This is true for all the "backend-safe" encodings but I believe not for all the supported client encodings. Tatsuo might have more of a clue than me about likely failure cases. regards, tom lane
Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes:> >>>>2) At what point in the stream does a client_encoding change take effect >>>>-- immediately after the corresponding ParameterStatus message, or at >>>>some other point? >>> >>>ParameterStatus is sent when the change is made. > >>Are the strings in the ParameterStatus encoded with the old or new >>client_encoding? > > Okay, make that "sent just after the change is made". So it looks like > you should receive a string in the new encoding. I can't offhand think > of a way to test this though --- are any of the reported settings > interesting from an encoding standpoint? This timing makes it harder for a client to recognize a change in client_encoding -- how is it supposed to know to change encoding before interpreting the ParameterStatus message? I'd like to add some robustness to the JDBC driver such that if the user changes client_encoding, the driver throws an error rather than garbling data (it is expecting client_encoding = 'UNICODE'). If the user can set client_encoding such that the driver won't recognize the ParameterStatus message (i.e. the string "client_encoding" does not encode as it would in UNICODE), it's not so useful. I don't know if there is such an encoding, however. >>Is it safe to assume that 7-bit ASCII >>is always encoded unchanged regardless of the encoding in use? > > > Hm. This is true for all the "backend-safe" encodings but I believe > not for all the supported client encodings. Tatsuo might have more of > a clue than me about likely failure cases. By "backend-safe" do you mean "can be used as a database encoding"? If so, it solves my problem, which is handling the switchover from default client_encoding (== database encoding) to UNICODE in the JDBC driver's connection setup code. I can initially use 7-bit ASCII regardless of the actual database encoding, and switch to UNICODE when possible (this is what the current driver does in most cases, I'm just verifying that the assumptions it makes are correct). -O