Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
От | Nathan Bossart |
---|---|
Тема | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |
Дата | |
Msg-id | Zz9pOi3pGF-DnJTp@nathan обсуждение исходный текст |
Ответ на | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
|
Список | pgsql-bugs |
On Thu, Nov 21, 2024 at 11:44:44AM -0500, Bruce Momjian wrote: > On Thu, Nov 21, 2024 at 09:14:23AM -0600, Nathan Bossart wrote: >> Tom provided a concise explanation upthread [0]. My understanding is the >> same as Bertrand's, i.e., this is an easy way to rule out a bunch of cases >> where we know that we couldn't possibly have truncated in the middle of a >> multi-byte character. This allows us to avoid doing multiple pg_database >> lookups. > > Where does Tom mention anything about checking two bytes? Here [0]. And he further elaborated on this idea here [1]. > He is > basically saying remove all trailing high-bit characters until you get a > match, because once you get a match, you are have found the point of > valid truncation for the encoding. Yes, we still need to do that if it's possible the truncation wiped out part of a multi-byte character. But it's not possible that we truncated part of a multi-byte character if the NAMEDATALEN-1'th or NAMEDATALEN-2'th byte is ASCII, in which case we can avoid doing extra lookups. > This text: > > * If the original name is too long and we see two consecutive bytes > * with their high bits set at the truncation point, we might have > * truncated in the middle of a multibyte character. In multibyte > * encodings, every byte of a multibyte character has its high bit > * set. So if IS_HIGHBIT_SET is true for both NAMEDATALEN-1 and > * NAMEDATALEN-2, we know we're in the middle of a multibyte > * character. We need to try truncating one more byte back to find the > * start of the next character. > > needs to be fixed, at a minimum, specifically, "So if IS_HIGHBIT_SET is > true for both NAMEDATALEN-1 and NAMEDATALEN-2, we know we're in the > middle of a multibyte character." Agreed, the second-to-last sentence should be adjusted to something like "we might be in the middle of a multibyte character." We don't know for sure. >> * Try to do multibyte-aware truncation (the patch at hand). > > Yes, I am fine with that, but we need to do more than the patch does to > accomplish this, unless I am totally confused. What more do you think is required? [0] https://postgr.es/m/3976665.1732057784%40sss.pgh.pa.us [1] https://postgr.es/m/158506.1732120196%40sss.pgh.pa.us -- nathan
В списке pgsql-bugs по дате отправления: