Re: Decision by Monday: PQescapeString() vs. encoding violation
От | Andres Freund |
---|---|
Тема | Re: Decision by Monday: PQescapeString() vs. encoding violation |
Дата | |
Msg-id | 5paqlw7soc66ud5ny3uq27qeeufo2pcm276rvv75hjhvns2sme@bvbqnxivxb2m обсуждение исходный текст |
Ответ на | Re: Decision by Monday: PQescapeString() vs. encoding violation (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Decision by Monday: PQescapeString() vs. encoding violation
Re: Decision by Monday: PQescapeString() vs. encoding violation |
Список | pgsql-hackers |
Hi, On 2025-02-15 12:35:45 -0800, Jeff Davis wrote: > I am not suggesting a change, but there's a minor point about the > behavior of the replacement that I'd like to highlight: > > Unicode discusses a choice[1]: "An ill-formed subsequence consisting of > more than one code unit could be treated as a single error or as > multiple errors." > > The patch implements the latter. Escaping: > <7A F0 80 80 41 7A> > results in: > <7A C0 20 C0 20 C0 20 41 7A> > > The Unicode standard suggests[2] that the former approach may provide > more consistency in how it's done, but that doesn't seem important or > relevant for our purposes. I'd favor whichever approach results in > simpler code. It seems completely infeasible to me to to implement the "single error" approach in a minor version. It'd afaict require non-trivial new infrastructure. We can't just consume up to the next byte without a high bit, because some encodings have subsequent bytes that are not guaranteed to have a high bit set. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: