Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
От | Bertrand Drouvot |
---|---|
Тема | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |
Дата | |
Msg-id | Zz9FRrwJRlyGBFPN@ip-10-97-1-34.eu-west-3.compute.internal обсуждение исходный текст |
Ответ на | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
|
Список | pgsql-bugs |
Hi, On Thu, Nov 21, 2024 at 09:21:16AM -0500, Bruce Momjian wrote: > On Thu, Nov 21, 2024 at 07:27:22AM +0000, Bertrand Drouvot wrote: > > + /* > > + * If the original name is too long and we see two consecutive bytes > > + * with their high bits set at the truncation point, we might have > > + * truncated in the middle of a multibyte character. In multibyte > > + * encodings, every byte of a multibyte character has its high bit > > + * set. So if IS_HIGHBIT_SET is true for both NAMEDATALEN-1 and > > + * NAMEDATALEN-2, we know we're in the middle of a multibyte > > + * character. We need to try truncating one more byte back to find the > > + * start of the next character. > > + */ > ... > > + /* > > + * If we've hit a byte with high bit clear (an ASCII byte), we > > + * know we can't be in the middle of a multibyte character, > > + * because all bytes of a multibyte character must have their > > + * high bits set. Any following byte must therefore be the > > + * start of a new character, so we can stop looking for > > + * earlier truncation points. > > + */ > > I don't understand this logic. Why are two bytes important? If we knew > it was UTF8 we could check for non-first bytes always starting with > bits 10, but we can't know that. I think this is because this is a reliable way to detect if the truncation happened in the middle of a character, without needing to know the specifics of the encoding. My understanding is that the key insight is that in any multibyte encoding, all bytes within a multibyte character will have their high bits set. That's just my understanding from the code and Tom's previous explanations: I might be wrong as not an expert in this area. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-bugs по дате отправления: