Hi,
On 2022-07-15 14:01:53 -0700, Jacob Champion wrote:
> On 7/15/22 13:35, Andres Freund wrote:
> >> (And do we want to fix it now, regardless?)
> >
> > Yes.
>
> Cool. I can get on board with that.
>
> >> What guarantees are we supposed to be making for log encoding?
> >
> > I don't know, but I don't think not caring at all is a good
> > option. Particularly for unauthenticated data I'd say that escaping everything
> > but printable ascii chars is a sensible approach.
>
> It'll also be painful for anyone whose infrastructure isn't in a Latin
> character set... Maybe that's worth the tradeoff for a v1.
I don't think it's a huge issue, or really avoidable, pre-authentication.
Don't we require all server-side encodings to be supersets of ascii?
We already have pg_clean_ascii() and use it for application_name, fwiw.
> Is there an acceptable approach that could centralize it, so we fix it
> once and are done? E.g. a log_encoding GUC and either conversion or
> escaping in send_message_to_server_log()?
Introducing escaping to ascii for all log messages seems like it'd be
incredibly invasive, and would remove a lot of worthwhile information. Nor
does it really address the whole scope - consider e.g. the truncation in this
patch, that can't be done correctly by the time send_message_to_server_log()
is reached - just chopping in the middle of a multi-byte string would have
made the string invalidly encoded. And we can't perform encoding conversion
from client data until we've gone further into the authentication process, I
think.
Always escaping ANSI escape codes (or rather the non-printable ascii range) is
more convincing. Then we'd just need to make sure that client controlled data
is properly encoded before handing it over to other parts of the system.
I can see a point in a log_encoding GUC at some point, but it seems a bit
separate from the discussion here.
Greetings,
Andres Freund